Stochastic Modeling of Rainfall Series in Kelantan Using an Advanced Weather Generator

Weather generator is a numerical tool that uses existing meteorological records to generate series of synthetic weather data. The AWE-GEN (Advanced Weather Generator) model has been successful in producing a broad range of temporal scale weather variables, ranging from the highfrequency hourly values to the low-frequency inter-annual variability. In Malaysia, AWE-GEN has produced reliable projections of extreme rainfall events for some parts of Peninsular Malaysia. This study focuses on the use of AWE-GEN model to assess rainfall distribution in Kelantan. Kelantan is situated on the north east of the Peninsular, a region which is highly susceptible to flood. Embedded within the AWE-GEN model is the Neyman Scott process which employs parameters to represent physical rainfall characteristics. The use of correct probability distributions to represent the parameters is imperative to allow reliable results to be produced. This study compares the performance of two probability distributions, Weibull and Gamma to represent rainfall intensity and the better distribution found was used subsequently to simulate hourly scaled rainfall series. Thirty years of hourly scaled meteorological data from two stations in Kelantan were used in model construction. Results indicate that both probability distributions are capable of replicating the rainfall series at both stations very well, however numerical evaluations suggested that Gamma performs better. Despite Gamma not being a heavy tailed distribution, it is able to replicate the key characteristics of rainfall series and particularly extreme values. The overall simulation results showed that the AWE-GEN model is capable of generating tropical rainfall series which could be beneficial in flood preparedness studies in areas vulnerable to flood. Keywords-weather generator; flood; rainfall intensity; probability distribution; northeastern monsoon


INTRODUCTION
Weather generator is a numerical tool that uses existing meteorological records to produce a series of long daily synthetic weather data that is often short and non-continuous [1][2].Various weather generators have been built throughout the years, such as the Weather Generator (WGEN) [3][4], Climate Generator (CLIGEN) [5], Long Ashton Research Station-Weather Generator (LARS-WG) [6].Weather generators are often based on empirical statistical relationships that maintain the autocorrelation and correlation properties of the various variables.According to [7], the simulated time scales range from daily to annual periods.However, one of the drawbacks of daily weather generators is that they underestimate monthly and inter-annual variances due to the lack of consideration in estimating low-frequency component of climate variability.Authors in [8] used a monthly generator (based on first-order autoregressive model) to adjust the low frequency capability based on the daily WGEN model.Even though the results are well simulated, this model is not able to capture the inter-annual variability.In [9], authors introduced a method for pairing of two different time scales modeled stochastic hydrological time series model.Two resembling time series were produced where one preserves important statistical properties on a finer time scale while the other is on a coarser scale of time.The adjustment is then made on a series of finer time scales so that the series is consistent with a series of coarser time scales.The results show that the coupling method can produce a series of daily rainy days which preserves some important statistical properties on daily, monthly and yearly scales.Other studies of weather generator (e.g.[10][11][12][13]) were conducted to address daily weather generator related problems.Authors in [14] compared the performance of different stochastic weather generators for long term climate data simulation.In particular, CLImate GENerator (CLIGEN), Long Ashton Research Station Weather Generator (LARS-WG), and Weather Generators (WeaGETS) were compared in terms of their ability to capture vital statistics features.The observed daily monitoring statistical features and minimum as well as maximum daily air temperatures are well simulated using both CLIGEN and LARS-WG models.These generators can also simulate maximum growth periods and increasing degree days, making them ideal for plant growth simulation.However, WeaGETS model is not quite capable of capturing the descriptive statistics, output value distributions, and evaluate extreme variables.
On the other hand, the Advanced Weather Generator (AWE-GEN) model, developed by authors in [15], has been successful in producing a broad range of temporal scale weather variables from the high-frequency hourly values to the low-frequency inter-annual variability.In Malaysia, the model had demonstrated its ability in producing projections of extreme rainfall events for Peninsular Malaysia [16].In this study, the AWE-GEN model is used to analyze rainfall activities in Kelantan using meteorological data from selected stations.Embedded within the AWE-GEN model is the Neyman Scott Rectangular Pulse (NSRP) model, developed in [17].NSRP model uses parameters to denote the physical rainfall characteristics, thus it is highly crucial to appoint the correct probability distributions representing the parameters to allow reliable results to be produced.In Malaysia, several types of distributions have been tested for rainfall intensity and the results varied according to the models used.For instance, Generalized Pareto [18], Mixed Exponential [19,20] were used.This study compares the performance of two probability distributions, Weibull and Gamma to represent rainfall intensity.The distribution with the best fit will be used to simulate hourly scaled rainfall series.Weibull is a heavy-tailed distribution which could capture the extreme rainfall events, frequently experienced in the studied region, while Gamma distribution has been successfully employed in [15] in AWE-GEN to assess rainfall in Tucson Airport, Arizona, USA.

II. DATA
Malaysia lies within the equatorial belt with high temperatures and rains all year long.Its rainfall distribution is influenced by the monsoon regime and lately, is inconsistent from year to year.There are two main monsoon winds seasons, the Southwest Monsoon (between May to September) and the Northeast Monsoon (from November to March).The east coast of Peninsular continues to be affected by the northeast monsoon which brings more rain.The studied region, Kelantan is located in the north east and is highly susceptible to monsoon flood almost on a yearly basis.In this study, the AWE-GEN model is constructed based on 30 years  of historical meteorological data.Data on hourly scale such as rainfall amount, temperature, relative humidity and wind speed are taken from two selected rainfall stations (see Table I).Figure 1 shows the location of the rainfall stations in this study.

III. MODEL DEVELOPMENT
Within the AWE-GEN model is the NSRP model to assess the intra-annual variability.Work by authors in [20] and [21] indicated that the Neyman Scott methodology is suitable to be used in Malaysia.Parameters describing the physical characteristics of rainfall in the NSRP model are given in Table II.Further description of NSRP model and theoretical formations could be found in [19].As mentioned before, appointing the correct probability distributions in representing the parameters of NSRP model is critical.Hence in this study, the performance of Gamma and Weibull in representing cell intensity is compared.The Gamma distribution that is associated in NSRP is as follows, where  is the scale parameter   0   ,  is the shape parameter   0   and x is the hourly rainfall amount.Meanwhile, the Weibull distribution is as follows, where  and  are the scale and shape parameters, respectively.
The first phase of this study individually fits Gamma and Weibull distributions to the data of the selected stations.Generated rainfall amount of each station are then compared with the observed data.The performance of both distributions are judged using Root Mean Square Error (RMSE), where n is the total number of data, i y is the i-th actual rainfall amount and ˆi y is the simulated rainfall amount.Lower value of RMSE denotes the more efficient distribution at a particular station.The second phase of the study involves generating rainfall series using the identified distribution representing rainfall intensity at each rainfall station.In order to validate the model, the simulated hourly rainfall was divided into two non-overlapping periods.These periods are (i) 1975 to 1989 and (ii) 1990 to 2005.The first period is the reference period where a multiplicative factor is computed based on the simulation output and the high resolution observational data.Next, the changing factors were then used to amend the biases of the simulation output from 1990 to 2005.The revised hourly rainfall is then compared to the observation from the identical period of 1975-1989.
Mean number of cell per storm α, θ Shape and scale parameters of rainfall intensity using Gamma γ, θ Shape and scale parameters of rainfall intensity using Weibull

IV. RESULTS AND DISCUSSION
Table III shows the RMSE values for both distributions at each rainfall station.There is not much difference in the values between Gamma and Weibull, but overall, Gamma has the better fit for rainfall intensity at both stations with a lower value of RMSE.The NSRP model parameters with Gamma representing intensity are displayed in Table IV.Parameter estimates for λ and β denote the estimated storm origin arrival rate and waiting time for cell origin after the storm origin, respectively.There are no significant differences in λ and β at both stations.Meanwhile, μ c represents the mean number of cell per storm and η represents the mean duration of the cell.From the table, mean number of cells per storm, recorded higher values in November to February for both stations, indicating that rainfall activity is heavier during these months.It should be noted that this period is referred to as the northeast monsoon season which causes heavier rainfall in the eastern coast of the peninsular.Kelantan which is located at the north eastern coast is exposed to the northeast monsoon wind during this period.However, the mean duration of the cell, η recorded relatively higher values in July for both stations.It should be noted that July corresponds to the southwest monsoon which takes place from May to August.Hence, η does not give high contribution to the rainfall activity in Kelantan.Increase in shape parameter α implies higher probability occurrence of extreme and heavy rainfall [22].The value of α for stations 4819027 and 5120025 is highest in September and May, respectively.In contrast, both stations have smaller value of α in November, implying the probability of the occurrence of rainfall extremes is small.However, the highest value for scale parameter θ is observed in November, with 16.22 mm/h for station 4819027 and 19.36 mm/h for station 5120025, implying higher rainfall extreme intensities during this month.According to [22], interannual variability in the extreme rainfall is driven by the leading mode of variability in the scale parameter.Second order mode of the variability of rainfall extremes is associated with the leading mode of the variability of shape parameter.Such results are consistent with the findings of this study where the scale parameter contributes more significant to the extreme rainfall activity in Kelantan as compared to the shape parameter.AWE-GEN with Gamma is used to generate 30 years of rainfall series at hourly scale in each station.The results in the form of graphical comparison are shown in Figure 2. The mean, variance, lag-1 autocorrelation and skewness are well simulated for both stations.
Results also indicate that the frequency of non-precipitation as well as the transition probability are quite challenging for both stations to capture.Due to its stations' location at the eastern region of Peninsular Malaysia that receives higher amount of rainfall every year, the mean of dry interval is underestimated at both stations.For stations 4819027 and 5120025, it is underestimated by 0.8 and 0.7 days respectively as confirmed in Figure 3. Similarly, the mean of wet interval is also underestimated but to a lesser degree by 0.2 and 0.3 days, respectively.In tropical climate, the mean is usually preserved but the shape of the distribution can deviate from the observed.The simulated and observed extremes rainfall for time aggregation periods of 1 hour and 24 hours are illustrated in Figure 4.For both stations, there is good match between simulated and observed values, especially for return periods up to 20-30 years.Extreme values of dry intervals are poorly captured by the model.In contrast, extreme values of wet intervals are well captured by the model.V. CONCLUSIONS Overall, the AWE-GEN model is capable of replicating the monthly rainfall series in Kelantan through Gamma distribution.Although Gamma is not a heavy-tailed distribution, it was found to be better in representing the rainfall intensity compared to the Weibull distribution.The estimated parameter results indicate that there are no significant differences in λ and β at both stations.Higher values of μ c and θ suggest that there are heavy rainfall activities in Kelantan, especially during the northeast monsoon season whereas η and α do not contribute significantly to the rainfall activity.Results of this study are valuable, particularly to agricultural and storm water management planning.Specifically, this model could be beneficial in terms of handling issues of insufficiency of hydrological data especially rainfall, at remote stations.

Fig. 1 .
Fig. 1.Map of Peninsular Malaysia with location of rainfall stations

TABLE I .
STATION DATA

TABLE II .
RAINFALL PARAMETERS OF THE NSRP MODEL

TABLE III .
RMSE VALUES OF GAMMA AND WEIBULL DISTRIBUTIONS * indicates lowest RMSE value