Modeling Human Development Index of Bali with Spatial Panel Data Regression

Human development index (HDI) is an index that represents the successfulness of human development in a region. For Bali, one of 34 provinces in Indonesia, the progress of HDI in the period 2010–2017 showed an increasing trend. In the year 2010, the Bali’s HDI is accounted for 70.10, gradually increase to 74.30 in the year 2017. However, in 2017 there are some regions with their HDIs are below of Bali’s HDI, namely Jembrana, Buleleng, Klungkung, Bangli, and Karangasem. The aim of this work is to model the HDI of 9 regencies of Bali so that the main determinant to increase the HDIs especially for the regencies with lower HDIs could be determined. The model consists of one dependent variable (HDI) with three indicators as the independent ones, there are (a) life expectancy, (b) education, and (b) standard of living. By applying spatial panel data analysis, five models were built i.e. CEM, FEM (individual), FEM (time), REM, and spatial autocorrelation FEM to determine the effect of each indicator. The result shows the best model is spatial autocorrelation FEM in which the health index has the biggest influence compare to the others.


I. INTRODUCTION
O NE of some prominent issues in national development program that becomes a top priority for Indonesia as well as other countries worldwide is human development. According to the United Nations Development Program, human development is the freedom of choice for humans in improving their quality of life. The choices referred to cover long and healthy life, gain knowledge, and the ability to access the resources needed to obtain a decent standard of living. In measuring the human development successfulness of a country, UNDP introduced an indicator namely the human development index (HDI) [1].
According to Indonesian Statistics Office, the Human Development Index (HDI) of Indonesia has been increasing from 66.5 that was recorded for the 2010 to 70. 8 for the year 2017 [2]. Basically, HDI is composed of 3 main indices, namely (a) the health index which is represented by the life expectancy; (b) the education index which is depicted by the average length of school and the length of education expectations; and (c) a decent standard of living index which is figured out through the adjusted per capita expenditure. Refers to these indices, HDI is calculated. According to the HDI's value, it can be classified into 4 groups, namely: (a) very high HDI if HDI ≥ 80; (b) high HDI if 70 ≤ HDI < 80; (c) medium HDI if 60 ≤ HDI < 70; and (d) low HDI if HDI < 60.
As one of 34 provinces in Indonesia, the economic growth of Bali Province which is based on two leading sectors, tourism and agriculture, exceeds national economic growth. The same trend was observed in the growth of the HDI. In the period 2010-2017 the HDI of Bali Province increased gradually. Regional Statistics Office of Bali Province noted that the average growth of HDI's Bali in this period was recorded at 0.84 percent per year [3]. In detail, the development of HDI in 9 districts/cities in Bali Province in the period 2010-2017 is shown in Fig. 1.
The importance of the HDI calculation is inseparable from the role of HDI in designing development programs to reduce poverty. The study by Budinirmala et al. [4] in modeling population poverty in Bali using panel data regression showed that each increase in 1 unit of HDI was able to reduce the percentage of poor people as much as 0.250 units. Their research showed HDI has a real effect in reducing the percentage of poor people in Bali Province. Another study in modeling HDI was conducted by Trianggara et al. [5]. Using the spatial panel data method, they examined the factors that influenced the HDI of Central Java Province in the period 2008-2013. Their research justifies the best model in modeling the HDI is the spatial lag fixed effect model with the R 2 as much as 99.54 percent. In addition, the variables that significantly influence the HDI are school participation rates and the percentage of poverty.
Noting the importance of HDI, this research is intended to model HDI of Bali. The model is built by involving 3 compiling indexes, namely the health index (HEA), the education index (EDU), and a decent standard of living index (LIV) with a direction to find out how much the influences of these indices in affecting HDI on 9 districts/cities of Bali. The HDI's data for the period 2010-2017 were used and the model was established by applying spatial panel data analysis techniques.
II. SHORT LITERATURE REVIEW In this study, the causality between the HDI and the three indices is modeled through two approaches, namely: (a) the Panel Data Regression, the regression equation model on panel data that does not take into account the spatial influence between objects; and (b) the Spatial Panel Data Regression that considers the existence of spatial influences. In modeling panel data by applying regression techniques, according to Greene [6], can be categorized into three groups namely (a) Common Effect Model (CEM); (b) Fixed Effect Model (FEM); and (c) Random Effect Model (REM). Of these models, CEM is the simplest one.
1) The Common Effect Model: to model the HDI of 9 regencies/cities in Bali (i = 1, · · · , 9) for the period 2010 (time index t = 1) to 2017 (time index t = 9) by the HEA, EDU, and LIV as the predictors; the CEM can be expressed in following equations: 2) The Fixed Effect Model: According to Gujarati [7], the FEM can be differentiated to FEM with fixed individual effect (FEM I ) and FEM with fixed time effect (FEM T ). Both types can be expressed as follows: 3) The Random Effect Model: In FEM, the intercept of regression line (α i or α t ) is always assumed fixed. However, for REM, the intercept is considered a random variable with a mean value of α. So, for an individual regency, the intercept can be expressed as α i = α I + i ; i is a random term with zero mean and a variance of σ 2 . Noting this assumption, the FEM I of nine regencies/cities in Bali will share a common intercept value α I and an error term i ; and their FEM T will share an α T and an error term t . Refers to these conditions, equation (2) and (3) can be expresses as follows:

B. The Spatial Panel Data Regression
Whenever the spatial differences amongst individual objects are considered in modeling the panel data, the researcher(s) applied the spatial panel data regression. According to Elhorst [8], the standard model for spatial panels data can be expressed in a matrix term as or in stacked form: [9] Y t = X t β + µ + t (7) In the spatial panel data models, Elhorst [9] introduced the spatial weight matrix, W N ×N . This matrix represents the spatial arrangement of the spatial units, and its properties can be read in [9]. For the standard model in equation (7), traditionally we can classify the fixed effect spatial model into 2 types i.e. the fixed effect spatial lag model, and the fixed effect spatial error model, with the mathematical formulation as follows: 1) The Fixed Effect Spatial Lag Model: 2) The Fixed Effect Spatial Error Model: In equation (8), ρ is named as the spatial autoregressive coefficient while in (9) γ is called as the spatial autocorrelation coefficient.

III. RESEARCH METHOD
To model the HDI's of Bali by applying spatial panel data regression, following steps are done: 1) build the CEM, FEM, and REM for the data set without taking into account the spatial heterogeneity may exists; 2) choose the best model from the above models; 3) define the spatial matrix W ; 4) if there were an evidence for spatial dependence, then build the spatial lag model as well as the spatial error model; 5) choose the best model from both models; 6) interpret the final model.

A. Results for Panel Data Models
Without considering the possibility of spatial dependence between units, we made the CEM, FEM, and REM according to equation (1) to equation (5). By assuming the error term in equation (1) µ it ∼ iid (0, σ 2 µ ), we get the CEM or the pooled regression estimator by utilizing the OLS method as follows:  Table I showed all estimators are significant. In addition, the CEM has a very high of R 2 value that demonstrates the predictors highly explain the response variable. However, as suggested by [7], the Durbin-Watson (D-W) statistic must be checked. The D-W statistic for the pooled model is equal to 0.644, less than 1.372 as the lower limit (d L ) of D-W value (70 sample size and 3 regressors). This fact suggests the possibility of autocorrelation and/or spatial autocorrelation in the data matrix [7]. To overcome this problem, we continued to build fixed individual (FEM I ) and fixed time (FEM T ) effects. To estimate the coefficients for FEM I and FEM T , we applied the Least Square Dummy Variable (LSDV) method in estimating   Table II and Table  III.
Both tables showed the significance of the fixed effect models. Despite the slightly greater value of the R 2 for FEM T , FEM I had a very small residual sum squares (RSS) ≈ 0.087 when it was compared to the RSS of FEM T as much as 0.206. Guiding by this result, we decided to choose the FEM with individual effect as a representative model of FEM.
To compete the CEM with FEM I , we applied the Chow test to elaborate the parameter stability of CEM [7]. The hypotheses in this test are: H 0 : α 1 = · · · = α 9 = α H 1 : ∃ α i = α j ; i = j = 1, · · · , 9, i = j The hypotheses are tested by using pooltest function in R plm package [10]. The result gave F value as 10.823 with p−value = 0.000. Based on the Chow test, the null hypothesis is rejected and we concluded the FEM I is a better choice than the CEM to model the data matrix. The last step in searching the best model for the panel data matrix is to build an REM.
The χ 2 statistic from the Hausman test as much as 3.890 with p − value = 0.274. The conclusion is to accept H 0 or the REM is consistent to model the data. From the panel data model, we conclude the best model is the REM with its estimates has been listed on Table IV. Despite of this conclusion, considering the aim of this work is to model the HDI of 9 regencies/cities in Bali and to evaluate the spatial dependences may exist, then we continue to use FEM I in subsequent analysis.

B. Results for Spatial Panel Data Models
The first step in modeling spatially panel data is to build spatial weight matrix (W ). This matrix quantifies the connections between regencies/cities in Bali with its elements could be determined by noting the Bali's map (Fig. 2).
The W 9×9 matrix is built by normalizing the i th row of the binary matrix P 9×9 where p ij = 1 if the edge of region i th shares a common edge with region j th . Refers to Fig. 2, the P and W are: In order to study the spatial effects might exist between regencies, we have to test the significance of ρ coefficient in equation (8) as well the significance of γ in equation (9). The Langrage Multiplier (LM) test can be applied for this task, with the tested hypotheses are [11]: H 0 : ρ = 0; (There is no spatial dependence) H 1 : ρ = 0; (There is a spatial dependence) These hypotheses were tested by using slmtest function in R splm package [12]. The test result demonstrated the LM Statistic = 28.621 with p − value < 0.000. We concluded the rejection of the H 0 and the fixed (individual) spatial autocorrelation (lag) model (FEM I.SAR ) is worth to build. In addition, we also tested the γ coefficient in equation (9) to study the existence of error dependencies amongst regencies. The test hypotheses are: H 0 : γ = 0; (There is no error dependence) H 1 : γ = 0; (There is an error dependence) From this test we got the LM Statistic = 2.068 with p − value = 0.150. We must accept the H 0 hypothesis suggested there is no error dependence amongst observations. Finally, we decided the FEM I.SL is the best model to reveal the spatial effect in the data matrix with the result as follows:

C. Discussion
By utilizing the HDIs data from 9 regencies/city in Bali in the period 2010-2017, the spatially regression analysis shows the best model to represents the spatial dependences between neighborhood regions is the fixed (individual) autocorrelation model (FEM I.SAR ). Mathematically, this model could be expressed as following equation:  Referring LeSage [11], the product of W and vector Y t or W Y t known as a spatial lag and represents an average HDI's of regions at time t defined as neighbors by the matrix W . For example, for the year 2017 the spatial lag for 9 regencies/cities in Bali is calculated as follows: If we made a comparison between the spatial lag and the real HDIs -72.99 · · · 78.32 T and 70.72 · · · 83.01 T , we find 4 regencies namely JBR, BLI, KRG, and BLL have a positive difference while the others have a negative one. The positive spatial lag indicates the regency takes an 'advantage' from its neighbors in forming its HDI whilest the negative lag indicates the regency 'supports' its neighbors.
According Griffith and Arbia [13], the spatial autocorrelation (ρ) has been an interest topic in examining spatial data. They defined ρ as "the tendency for nearby values on a map to be dependent". Both authors wrote a positive ρ has received most of the attention of researchers mostly because rare example of a negative ρ. Our study shows the ρ of the final model equals to 0.153 with negative sign. The sign indicates that a tendency of the HDIs of 9 regencies/cities in Bali are diverge based on its macroeconomic predictors such as the educational index (EDU), the health index (HEA), as well as the standard of living index (LIV). The readers are suggested to read [13] in order to understand the characteristic of spatial autocorrelation.
From Table V we get three predictors of HDI have significantly estimator. The greatest coefficient is found for the health index (HEA) and the lowest is found for the standard of living (LIV) with the values as much 53.555 and 36.308, respectively. All of the estimators have positive sign, demonstrate the positive effects. The increase of EDU, HEA, and LIV will cause the increase of HDI value for each regions in Bali.
In addition, the residual sum of squares (RSS) of the final model which is equal to 0.053 is lower than RSS of the FEM I as much as 0.087 (see Table II). This finding proved the FEM I.SAR capable to increase the model accuracy or to reduce the error as much as | 0.053−0.087 0.087 | × 100% ≈ 39.08%.

V. CONCLUSION
This work with its aim is to model the HDIs of 9 regencies/cities in Bali Province of Indonesia as the response and three indices namely the health index, the education index, and the standard of living index as the regressors, spatially, concludes: 1) The three regressors suggest the possibility of autocorrelation and/or spatial autocorrelation that makes the pool model is not appropriate to apply; 2) The fixed individual effect (FEM I ) demonstrated the better performance when it compared to the fixed time effect (FEM T ). Despite of both models are significant, the RSS of FEM I is smaller than the RSS of FEM T .
3) The LM test proved there is a spatial dependence amongst neighboring regencies/cities. Considering this dependence, the best model for this work is the spatial autocorrelation with fixed (individual) effect or FEM I.SAR .