Application of Correlation and Regression Models in Predicting the Physico-chemical Quality of Groundwater from Insitu Measured Parameters

DOI: http://dx.doi.org/10.24018/ejers.2021.6.6.2575 Vol 6 | Issue 6 | September 2021 19 Abstract — Groundwater is the major source of municipal and private potable water supply for meeting the drinking, domestic, agricultural and industrial requirements on man around the world. The cost of analyzing water quality in the laboratory to ascertain its potability is usually high and sometimes not available. Groundwater samples were collected from fifty (50) spatially referenced bore well locations in Warri and its environs in the dry and wet seasons (November 2019 to January 2020) in the study area. The water samples were analyzed for twenty-six (26) physical, chemical and bacteriological parameters both in the field and laboratory in line with APHA standard procedures for testing water and waste water inorder to evaluate the status of potability of groundwater across Warri, Delta State Nigeria. The data analysis tool in Microsoft Excel was used to explore and study the interrelationship between some conservative parameters measured in the field (pH, EC, TDS, and DO) as independent variables and some cations, anions and heavy metals (Na, Mg, Ca, HCO3, SO4 Cl, Fe, Cd, Cr, Cu and Pb) analysed in the laboratory as dependent variables. The results obtained from the parameters analysed insitu in the field which are cheap to perform and easily affordable were used to check and evaluate and the inter-relationships with some cations, anions and heavy metals. Highly correlated water quality parameters were determined by correlation coefficient (R) values obtained from correlation matrix and related by Regression equations (models). The regression models can be adopted to predict the concentration of these cations, anions and heavy metals before the rigorous laboratory analysis, to serve as a quick check for concentration of most disease-causing pollutants and to save time, money and resources, especially the near absence of AAS for analysing heavy metals in a good number of laboratories. The regression models developed in the study can be used for monitoring the water quality parameters by knowing the concentration of independent parameters obtained in the field alone. There is a relationship between variables which show that one variable actually causes changes in another variable. It was observed that multiple regression models can predict most parameters at 5% level of significance. Significantly positive correlation at 1 and 5% was found between many parameters. This technique studied and calculated the correlation coefficients between various physico-chemical parameters of drinking water and provided an excellent device for the calculation of parameter values within realistic degree of accuracy. The results proved to be easiest, useful, and rapid means for monitoring of water quality with the help of systematic calculations of correlation coefficient. It is recommended to treat groundwater prior to domestic use.


I. INTRODUCTION
Groundwater is the most realistic and the major potable water supply option in Warri. Rapid population growth, urbanization, accelerated pace of industrialization, agricultural activities, crude oil exploration and hydrocarbon related activities which are the dominant activities in Warri have led to increased dependence on groundwater for meeting man's water demand for domestic, drinking, agricultural and industrial needs because of its seemingly potable status [1]. These activities have caused surface water resources to either be fully utilized or now of poor quality. The diminishing surface water resources for exploitation has caused governmental agencies, industrial and private users to resort to groundwater resources for their water supply needs with little or no form of treatment. The quality of groundwater has deteriorated over time as a result of natural and anthropogenic activities. The quality of water depends to some extent, on its physico-chemical composition [2], [3]. Increasing demand and groundwater withdrawal, changes in land use pattern, vast industrial and agricultural effluents entering the hydrological cycle, groundwater recharge due to seasonal variation all affect the quality and quantity of groundwater [4]- [6].
The need for water quality assessment in growing urban cities like the Warri metropolis thus cannot be emphasized enough. During the past few decades, the groundwater is being contaminated and it is deteriorating daily thus causing numerous water quality problems in both groundwater and surface water systems that is affecting large numbers of people. Contaminated water has resulted in epidemics, detrimental health problems and environmental issues [7]. Low-quality drinking water results in 80% of the incidence of many acute and chronic diseases that cause mortality in many communities [8], [9].
Considering the huge groundwater consumption in Warri and its environs and lack of water quality monitoring, the present study is undertaken to assess the physicochemical and bacteriological characteristics of groundwater in and around Warri town in dry and wet seasons using a large number of spatially referenced sampling wells located across various locations of the city. Statistical analysis and characterization of hydro geochemistry of the groundwater and correlation and regression models are explored in obtaining the concentration of some water quality parameters.
II. STUDY AREA Warri (Fig. 1) is a major commercial city in the Niger Delta region of Nigeria. It has a sea port, a refinery and several oil fields and flow stations. It is located in the western end and coastal region of the Nigerian Niger Delta and it is about some 40 kilometres away from the shores of the Atlantic Ocean in Delta State, in Southern Nigeria. It is situated at latitude 5º54´00ʺN and 5º35´00ʺN of the Equator and longitude 5º42´00ʺE and 5º54´00ʺE of the Greenwich Meridian. Fig. 1. Map of Warri and its environs. Source: [4] and [10].
Warri and its environs are situated on a low-lying plain in the continental shelf of West Africa on the Gulf of Guinea and it is comprised of three major litho-stratigraphic units namely: Akata, Agbada and Benin formations. These formations are generally inherently susceptible and vulnerable to a high risk of contamination because of the shallow, unconfined, and unprotected aquifer consisting mainly of unconsolidated sediments [11]. The geological formation consists of more than 90% sands and about 10% shale/clays. The sands range in size from fine-to-medium and coarse-grained unconsolidated sands, with occasional intercalations of gravelly beds that are also poorly-sorted, sub-angular to well-rounded, and bear lignite streaks and wood fragments peat or lenses of plastic clay [12]- [14] with a water table of about 10 metres below ground surface, which however, depends on the season [15]- [19]. This formation contains the most productive and hence most tapped aquifer in the Niger Delta region due to the fact that it is shallow [20]. The average annual rainfall is about 3000mm and occurs mostly due to the south-west monsoon wind [21]. Groundwater and surface water in the study area is under threat of contamination from crude oil exploration and exploitation activities. Being an ancient city, solid waste and effluent disposal systems are not engineered. This has further threatened the quality of groundwater. The near absence of government water schemes has compelled individuals to extract groundwater from large number of boreholes. Though there are no records of the number of boreholes in the city, from physical observations, it could be safely said that almost each building has a borehole and the water extracted are consumed without any form of analysis and treatment.

A. Establishment of Sampling Locations and Water Samples Collection
Groundwater samples were collected from fifty (50) identified boreholes with their UTM coordinates read with a hand-held GPS (GARMIN GPSMAP 76CSx model). The boreholes were all tapping the Somebreiro-Warri Deltaic Plain Sands aquifer to an average depth of 17m. The water samples were collected in new 1.5L capacity high-density PET screw-capped containers during the dry season (November to December, 2019) and wet season (May and July, 2020) and recorded in Table I with their sampling codes for the purpose of geo-referencing. The criteria of selecting sampling points were based on the population density, areas of industrial or anthropogenic activities such as crude oil refining activities, open solid waste dump sites, high-and low-density areas and the river catchment areas. All the drinking water samples were taken from running tap water of residential and commercial areas. Water from the taps were allowed to run for 2 to 3 minutes and the PET containers and stoppers were thoroughly washed with distilled water for three times and once with the water to be sampled before collecting the actual sample. The bottles were filled, allowed to overflow and immediately corked, properly labelled to avoid mix up, placed in an ice block chest and transported to a laboratory within a prescribed period of not more than three hours after collection. Collection, preservation and transportation of the water samples to the laboratory and field and laboratory analysis followed the standard guidelines recommended by [23] for testing water and waste water. The water samples were preserved in refrigerators at 4 ºC to keep them intact until analysis was carried out. As prescribed by [22], at the sampling locations, samples were collected in triplicates. One bottle was filled with water having no acid while a second bottle was filled and acidified by adding few drops of 5% nitric acid (HNO3) to stop the activities of microorganisms and samples for bacteriological quality analysis were collected using autoclave-sterilized sampling bottles to avoid unpredictable changes in characteristics. Thus, the black bottles were air tightened for the analysis of BOD after five days, to prevent photosynthetic oxygen generation. The second white bottles were for microbial analysis and the remaining samples were for the physico-chemical analysis which were stored in ice chest boxes (coolers).

B. Field and Laboratory Analysis of Water Samples
The American Public Health Association [22] recommended standard methods of testing water quality were employed in this research to obtain the concentration of some physico-chemical and bacteriological parameters. This was inclusive of the determination of hydrocarbon constituents in the water samples.

C. Field Analysis
Non conservative sensitive parameters such as temperature, pH, electrical conductivity (EC), pH and dissolved oxygen (DO) which change with storage time [24], were measured in-situ and recorded before samples were transported to laboratory for further physical and chemical analyses. Temperature was measured using a mercury-filled Celsius thermometer, Total Dissolved Solids (TDS) and Electrical Conductivity were estimated with Oakton TDS/Conductivity meter electrical conductivity meter (HI 2315, Hanna Instrument). pH was estimated using a portable pH meter (PHS-25) and the DO with portable DO meter (DO analyser JPG 607) respectively. The procedure was repeated three (3) times and the mean value calculated for each parameter. DO meter was also inserted into the water sample at about 10cm using the oxygen probe handle.

D. Laboratory Analysis
The following standard methods of [22] were adopted in the laboratory for each parametric analysis of the groundwater samples. Chemical Oxygen Demand (COD), nitrate and ammonia have permissible storage time of 24 hours and were therefore analysed immediately as recommended by [25]. Samples were stored in a refrigerator at about 4 ºC [25] for examination of other water quality parameters that experience no change with storage time. However, analyses of those parameters were conducted within a period not more than two (2) weeks. SP2900 Pye-Unicam Atomic Spectrometer (AAS) was used to determine Fe, Cu, Cr, Cd and Pb while UV visible spectrophotometer (Thermo Scientific Spectronic 20D + ) was used to analyse PO4, NO3, SO4 and NH4. The concentration of Na + and K + were determined with a Flame emission analyser. Ca 2+ and Mg 2+ were determined by EDTA Titrimetry. Cland HCO3were also measured by appropriate titrimetric methods. NO3was measured by Colorimetry while SO42-was determined by precipitation using BaCl2 and measurement of absorbency with a spectrophotometer. Iron concentrations were estimated using model Atomic Absorption spectrophotometer. Biochemical Oxygen Demand (BOD) and Chemical Oxygen Demand (COD) were determined using the modified Winkler and KMnO4 methods, respectively.

IV. RESULTS OF ANALYSIS
Results of the field and laboratory analysis of various physico-chemical and bacteriological parameters of the groundwater samples are given in Table III and Table IV. Mean values were taken into consideration as characteristic values to see the differences during the two (2) seasons and the obtained results were compared with WHO standard of water quality parameters (Table II) [26]. The mean and standard deviation in the parameters in both seasons were computed and the seasonal variations for each parameter was obtained.

A. Statistical Analysis
Results of the statistical analysis of water quality parameters of the water samples showing the minimum, maximum, mean and standard deviation for both dry and wet seasons are presented in Table V. The range, mean and standard deviation values revealed considerable variations in most water samples with respect to their chemical composition.

B. Correlation and Multiple Regression Modelling for Domestic Borehole Water Quality
Correlation and multiple regression analysis are useful for interpreting groundwater quality data and relating them to specific hydrogeological processes. These tools are quite useful in characterizing and obtaining firsthand information of the groundwater system than actually going through complex procedures and methods.

C. Correlation Analysis of Water Quality Parameters
The degree of linear association between any two of the water quality parameters (dependent and independent variables) is measured by the simple correlation coefficient (R). Results showing the interrelationship between some measured insitu water quality parameters of sampled domestic boreholes in the dry and wet seasons as independent variables and laboratory measured concentration of ions of sampled boreholes as dependent variables using the data analysis package in Microsoft office excel. The results of the multiple correlation matrix using the cations, anions and heavy metals (Na, K, Ca, Mg, SO4, NO3, Cl, PO4, HCO3, NH4, Fe, Cd, Cr, Cu, and Pb) interchangeably as dependent variables and pH, EC, TDS and DO as independent variables are presented in Table VI (a-o) for both dry and wet seasons.

D. Regression Analysis of Water quality parameters
The result of multiple regression for Na, K, Ca, Mg, SO4, NO3, Cl, PO4, HCO3 and NH4 using correlated significant predictors that were found to have better and higher level of significance in their correlation coefficient are presented in the regression statistics on Tables VII (a and b) for both the dry and wet seasons.
The greater the value of regression coefficient, the better is the fit and more useful the regression variables [27]. The multiple regression equations for prediction of ion concentrations using insitu parameters concentrations as independent variables are given in Table VIII

A. Comparison of Physico-Chemical Parameter Results with WHO Standard for Drinking Water
The obtained results of measured concentrations of each water quality parameter in both dry and wet seasons were compared with the WHO standard values given in Table IV. Also, the numbers of boreholes within and above the recommended values are as presented.
The average values of BOD, ammonia and iron recorded highest in wet compared to the dry season, which could be due to acidification of water by elevated microbial degradation of organic debris and concentrated dissolved solids wet season. As a momentous role of DO amount in water quality of the groundwater, the average concentration of DO was lowest in dry season (directly proportional to temperature) and highest in the wet season (increase in phytoplankton and microbial activity) consequently increase in BOD and COD. DO values vary slightly less at the dry season. It might be due to copious growth of phytoplankton with less water flow, disturbance and uprooting leading to increased generation of oxygen by photosynthetic activities. Total hardness (TH) was recorded comparatively highest in the wet season and lowest in dry season. pH exhibited higher values in dry and lowest in wet season. Application of chemical fertilizers, run off from agricultural field, leaching of industrial/domestic waste and sewage inflow and other anthropogenic sources are the possible point and non-point sources of pH pollution to groundwater. Average phosphate ranged from 0.19 to 0.28 mg/l, Nitrate in the investigated samples were found to be in a range of 0.83 to 1.06 mg/l and sulphate 1.00 to 1.20 mg/l respectively in the dry and wet seasons.
The average range of chloride in the samples was 34.89 to 39.41 mg/l throughout the sampling periods. Concentrations were all below the WHO permissible limits for TH, TDS, BOD, COD, Na, Ca, K, SO4, NO3, Cl, HCO3, NH4, Fe, Cr, Cu and Pb for both seasons. The concentrations of the EC, Turbidity, DO, Total Coliform and PO4 were within standard limits while Temperature and Cd were above the required limits for the dry season and wet seasons [28]. The lowest and the highest levels of the iron detected ranged between 0.21 to 0.22 mg/l. Average Hardness levels were found in the water samples to be below the WHO permitted limit, which is 20.77 mg/l. On the whole, EC, Turbidity, DO, Total Coliform and PO4 were within the set limit for the dry season, when Temperature and Cd were above the limits for the wet season. Also, observed values of Temperature, pH, Mg and Cd were above the limits for the dry season while EC, Turbidity, DO, Total Coliform, Ca, Mg, PO4 and Fe parameters were within the standard limits for the wet season. There is no guideline for TSS. The groundwater quality parameters varied from place to place and season to season and was dependent on both the surface and subsurface characteristics. The presence of open dumps, usage of fertilizer, disposal of industrial wastes, leakages from septic tanks and hydrocarbon contaminants, etc., changes the quality of groundwater.

B. Correlation and Multiple Regression Modelling of Water Quality Parameters
Correlation and regression analysis are quite useful in characterizing the relationship and dependence between the parameters analyzed. This could be used in the study to explain the nature of the dependent variables and how they are influenced by the independent variables. Only correlation coefficients above 0.7 were chosen since these indicates very high positive correlation.

C. Correlation Matrix Analysis
Correlation matrix for different water quality parameters along with the significance level are shown in Table IV (a-o) for both dry and wet seasons. Results of the statistical analysis gives an indication that EC and TDS have significantly high and positive correlation with Na, K, Ca, Mg, SO4, NO3, Cl, PO4 and HCO3 in both seasons but weak and moderate correlation with NH4 in the dry and wet seasons respectively. The R values between EC/TDS and the water quality parameters are: Na (0.8876/ 0. Also, from the correlation results, it is observed that EC and TDS are strongly correlated with a correlation coefficient of one (1), [30]. The relationship is not always linear and is strongly influenced by salinity and material content. The analysis of TDS concentration from EC value can be used to give an overview of water quality.

D. Regression Statistics Analysis
Results of multiple regression model in predicting cations, anions and heavy metals are presented in Table 7a and Table  b. Regression coefficients represent the mean change in the response variable for one unit of change in the predictor variable while holding other predictors in the model constant.
The independent variables such as EC and TDS were significant in predicting values of the dependent variables. The multiple R 2 values indicate the variability in Na, K, Ca, Mg, SO4, NO3, Cl, PO4 and HCO3 in both seasons could be ascribed to the combined effect of EC and TDS. This is in line with a study by [29]. He studied statistical approaches for hydro geochemical characterization of groundwater in west Delhi, India. The study showed good correlation between EC and other water quality parameters. The regression multiple correlations (R) of all dependent variables with some insitu groundwater independent variable parameters obtained were well above 0.7000 and suggests that EC and TDS have strong relationship with dependent variables but with the exception of NH4 which is less than 0.4 and 0.6 in both seasons respectively. pH showed a strong negative relationship with Cd, Cr, Cu and Pb. This is supported in the regression equation (model) obtained for each groundwater parameter.

VI. CONCLUSION
The present study provides significant information on the quality of groundwater which is most important source of water supply in urban as well as rural areas in developing countries. Variations noticed in specific water quality substance among the water samples drawn from various bore holes may be attributed to various land use and land cover factors. The statistical regression analysis model has been found to be a highly useful technique for monitoring drinking water and has a good accuracy. The results of the statistical analysis gave an indication of the interrelationship amongst various parameters. From Tables VI (a-k), EC and TDS are the only predictors for the tested cations and anions in the dry and wet seasons while pH correlates with the heavy metals (Cadmium, Chromium, Copper and Lead) and are presented in Tables VI (l-o). The correlation coefficient (R) values of the predictors show strong correlation values of 0.7000 to 0.9922 in all the cations, anions and heavy metals except for ammonium which is in the range of 0.353 and 573. The correlation values are higher in the wet season. EC had very strong correlation with TDS. The regression models can be used to predict the concentration of anions, cations and heavy metals thereby giving a realistic groundwater situation. The study gives the easiest and rapid method of monitoring the quality of water.

VII. RECOMMENDATION
The groundwater in Warri should be treated before use. It is recommended that water analysis should be carried out periodically to monitor the rate and kind of contamination and to prevent further contamination. It is important to expand awareness among the people to maintain the cleanness of water at their highest quality and purity levels to achieve a healthy life. Suitable strategies to groundwater recharge, controlled groundwater usage, measures to reduce ground water pollution and awareness of the importance of water quality for private bore hole users are recommended.