Data Mining Techniques Applying on Educational Dataset to Evaluate Learner Performance Using Cluster Analysis

 Abstract —Due to the advancement of technology in this digital era, academic institutions are bringing out graduates as well as generating enormous amounts of data from their systems. Hidden information and hidden patterns in large datasets can be efficiently analyzed with data mining techniques. Application of data mining techniques improves the performance of many organizational domains and the concept can be applied in the education sectors for their performance evaluation and improvement. Understanding the business value of the collected data it can be used for classifying and predicting the students’ behavior, academic performance, dropout rates, and monitoring progression and retention. This paper discusses how application of data mining can help the higher education institutions by enabling better understanding of the student data and focuses to consolidate clustering algorithms as applied in the context of educational data mining.


I. INTRODUCTION
Data mining is the knowledge discovery process, which involves discovering hidden patterns, messages and knowledge within large datasets and the process of analyzing outcomes or behaviors in various business domains.Knowledge discovery and data mining can be considered as tools for decision-making as well as organizational effectiveness.Data mining is a systematic process of extracting relevant knowledge using various techniques from large structured and unstructured data.There are a variety of different data mining techniques and approaches available such as clustering, classification, and association rule mining etc.One of the most challenging tasks of the higher educational institutions is the provision and utilization of up to date information for their sustainability, to monitor student performance and to measure institutional effectiveness.In this paper, the researcher analyses and presents the use of data mining techniques in higher education sectors to evaluate learner performance.The available data in higher education institutions can be evaluated using data mining techniques and it will lead into discovering hidden patterns, messages and knowledge.Based on the results from the applied techniques the institution management can take measures to allocate relevant resources more effectively, make effective decisions on educational academic activities to improve students' performance, increase students' learning behavior, Published on November 21, 2018.M. A. Job is Assistant Professor, Arab Open University, Kingdom of Bahrain.(e-mail: m.aniljob@aou.org.bh)increase student's retention rate, and increase students" consistency in various academic activities.

II. LITERATURE REVIEW
Data Mining (DM) is described as a process of discover or extracting interesting knowledge from large amounts of data stored in multiple data sources such as file systems, databases, data warehouses etc. [3], [18].Defining characteristic of data mining is 'Big data' [50].Data mining is defined as "the process of discovering "hidden message," patterns and knowledge within large amounts of data and of making predictions for outcomes or behaviour" [30].'Pattern' is a single record that consists of input and output.Data mining is the process of analyzing data from different perspectives and summarizing it into useful information.Data mining functionalities are classified into two broad categories as descriptive and predictive ones [2] [21].The main functions of data mining are applying various methods and algorithms in order to discover and extract patterns of stored data [48].The implementation of data mining methods and tools for analyzing data available at educational institutions, defined as Educational Data Mining (EDM) [22] is a relatively new stream in the data mining research.Educational data mining is a research area falls under data mining.A survey of the application of data mining techniques to various educational systems is given in Romero and Ventura [41].In an another work of Romero et al. [42], [44] on educational data mining, the application of various data mining techniques on data collected from the activities of students who use Moodle e-learning course management system is discussed.
Brijesh Kumar Baradwaj and Saurabh Pal [9] describes the main objective of higher education institutions is to provide quality education to its students.One way to achieve highest level of quality in higher education system is by discovering knowledge for prediction regarding enrolment of students in a particular course, detection of abnormal values in the result sheets of the students, prediction about students' performance and so on.Romero and Ventura, in 2010 [43] published a paper in IEEE, which listed most common tasks in the educational environment resolved through data mining and some of the most promising future lines.Educational Data Mining community remained focused in North America, Western Europe, and Australia/New Zealand.They mentioned that there is a considerable scope for an increase in educational data mining's scientific Influence.They also suggested developing more unified and collaborative studies [43].There are increasing research interests in using data mining in education.This new emerging field, called Educational Data Mining Techniques Applying on Educational Dataset to Evaluate Learner Performance Using Cluster Analysis Minimol Anil Job Data Mining, concerns with developing methods that discover knowledge from data originating from educational environments [4], [24].
Educational Data Mining uses many techniques such as Decision Trees, Neural Networks, Naïve Bayes, K-Nearest neighbor, and many others.Han and Kamber [21] describes data mining software that allow the users to analyze data from different dimensions, categorize it and summarize the relationships which are identified during the mining process [34].Data mining tools predict future trends and behaviors, allowing institution to make proactive, knowledge-driven decisions [11].The automated, prospective analyses offered by data mining move beyond the analyses of past events provided by retrospective tools typical of decision support systems.Data mining tools can answer institution questions that traditionally were too time consuming to resolve [10], [40].

A. Knowledge Discovery Process
Following are the steps in Data mining process [25], [26]:  Understand application domain  Create target dataset  Data cleaning and transformation  Apply data mining algorithm  Interpret, evaluate and visualize patterns  Manage discovered knowledge Fig. 1 shows data mining as a step in an iterative knowledge discovery process.

B. Educational Data Mining
Educational Data Mining is an emerging discipline, concerned with developing methods for exploring the unique types of data that come from educational settings, and using those methods to better understand students, and the settings which they learn in [28], [35].The emergent field of EDM examines the unique ways of applying data mining techniques education institutions academic problems.Fig. 1 shows the concept of EDM.The major components of a data mining system are data source, data warehouse, data mining engine, knowledge base and pattern evaluation module [30].It shows that our system is using the historical data from the data warehouse server and then training the data after applying various preprocessing techniques [23].Data mining tasks can be done in two different ways; predictive or descriptive.Data processing undergoes two types of functions namely clustering and classification.We can apply clustering in predictive model or classification in descriptive model to the dataset.

C. Classification
Classifying data into a fixed number of groups and using it for categorical variables is known as classification [33].Classification can be classified into two types: Supervised and Unsupervised.When the objects or cases are known in advance is called supervised classification whereas unsupervised classification means the objects or cases are not known in advance.The following algorithm can be used for classification model [19], [1].

D. Clustering
Clustering is grouping similar objects.Clustering is defined as a process of grouping a set of physical or abstract object into a class of similar objects [38].According to Larose [29] cluster does not classify, estimate or predict the value of target variables but segment the entire data into homogeneous subgroups.Heterogeneous population is classified into number of homogenous subgroups or clusters are referred as clustering [6].Furthermore, clustering task is an unsupervised classification.Clustering is a process where the data divides into groups called as clusters such that objects in one cluster are very much similar to each other and objects in different clusters are very much dissimilar to each other.[14].Finding groups of objects such that the objects in a group will be similar or related to one another and different from or unrelated to the objects in other groups.Clustering is the technique of arrangement of similar objects into a class.By this similar objects are placed in a class and many classes with different object sets can be constructed.Various clustering methods like Partitioning method, Hierarchical method, Model-based method, Gridbased method, Density-based method, Constraint based method can also be used and the clustering can be done to the dataset [49][5], [7].Some of the fields has been undergoing clustering techniques in the fields such as data mining, image processing, text mining, machine learning and pattern recognition [25]- [27].Educational data mining is concerned with developing new methods to discover knowledge from educational database in order to analyse student trends and behaviours towards education.In the educational sector, for example, it can be helpful for course administrators and educators for analyzing the usage information and students' activities during a course to get a brief idea of their learning [4].Visualization of information and statics are the two main methods that have been used for this task.Studies show that data mining was first implemented for marketing outside higher education but it has parallel implications and value in higher education [37], [45].
In every higher education institution, marketing is part of student relationship management.Within educational institutions, marketing concerns the service area, enrollment, annual campaign, alumni, and college image.Combined with institutional research, it expands into student feedback and satisfaction, course availability, and faculty and staff hiring.A university service area now includes on-line course offerings, thus bringing the concept of mining course data to a new dimension.Data mining is quickly becoming a mission critical component for the decision making and knowledge management processes [21], [35].
In summary, data mining can be applied in classifying students based on academic achievements, knowledge, gender, age, semester-wise grades and monitoring of progression and regression.

IV. METHODOLOGY
The approach begins from the collection of data of enrolled students, and later the pre-processing procedures are tested to the dataset.The data pre-processing approach is utilized to make the data more deserved for data mining.Then the dataset is classified into a training data set and a testing data set.The dataset which is trained is utilized to compose the classification model.The dataset which is treated as testing dataset is either utilized to examine the assessment of the developed order demonstrate or to contrast the forecasts with the known target values.
To evaluate undergraduate student academic performance; the following data mining algorithms can be used to identify the significant variables that affects and influences the performance of students depending on the significant variables.Since the study is focused on undergraduate students' performance evaluation the learner in the given below table describes an undergraduate student.ANN (Artificial Neural Network)-Neural networks are useful for data mining and decision-support applications due to the fact that people are good at generalizing from experience [45], [8].Neural networks bridge this gap by modeling, on a computer, the neural behavior of human brains.Neural networks are useful for pattern recognition or data classification, through a learning process [17], [32].Neurons work by processing information.They receive and provide information in form of spikes.An artificial neural network is composed of many artificial neurons, which are linked together according to a specific network architecture [47], [36].The objective of the neural network is to transform the inputs into meaningful outputs.[36], [32].
Farthest first algorithm proposed by Hochbaum and Shmoys 1985 has same procedure as k-means, this also chooses centroids and assign the objects in cluster but with max distance and initial seeds are value which is at largest distance to the mean of values.Here cluster assignment is different, at initial cluster we get link with high Session Count, like at cluster-0 more than in cluster-1, and so on.Farthest first actually solves problem of k-centre and it is very efficient for large set of data.In farthest first algorithm we are not finding mean for calculating centroid, it takes centroid arbitrary and distance of one centroid from other is maximum [12].
Fuzzy c-Means Clustering performs clustering by iteratively searching for a set of fuzzy clusters and the associated cluster centres that represent the structure of the data as best as possible [15].The algorithm relies on the user to specify the number of clusters present in the set of data to be clustered.
K-means algorithm is a partitioned clustering approach, which is used when you have unlabeled data (i.e., data without defined categories or groups) [20].The goal of this algorithm is to find groups in the data, with the number of groups represented by the variable K.The algorithm works iteratively to assign each data point to one of K groups based on the features that are provided [24].Data points are clustered based on feature similarity.The results of the K-means clustering algorithm are: the centroids of the K clusters, which can be used to label new data and the labels for the training data (each data point is assigned to a single cluster) The name comes from representing each of k clusters Cj by the mean (or weighted average) cj of its points, the socalled centroid.While this obviously does not work well with categorical attributes, it has a good geometric and statistical sense for numerical attributes [45].The sum of discrepancies between a point and its centroid expressed through appropriate distance is used as the objective function.For example, the L2-norm based objective function, the sum of the squares of errors between the points and the corresponding centroids, is equal to the total intracluster variance The sum of the squares of errors can be rationalized as a (negative of) log likelihood for a normally distributed mixture model and k-means algorithm can be derived from a general probabilistic framework [35], [16].
Hierarchical clustering is a method of cluster analysis which seeks to build a hierarchy of clusters.In data mining hierarchical clustering works by grouping data objects into a tree of cluster [31], [16].Hierarchical techniques produce a nested sequence of partitions, with a single, all-inclusive cluster at the top and singleton clusters of individual objects at the bottom.
X-Means clustering algorithm, an extended K-Means which tries to automatically determine the number of clusters based on BIC scores.Starting with only one cluster, the X-Means algorithm goes into action after each run of K-Means, making local decisions about which subset of the current centroids should split themselves in order to better fit the data.The splitting decision is done by computing the Bayesian Information Criterion (BIC) [38]- [40].
The Markov Cluster Algorithm (MCL)is well-recognized as an effective method of graph clustering [46].It involves changing the values of a transition matrix toward either 0 or 1 at each step in a random walk until the stochastic condition is satisfied.
In data mining, expectation-maximization (EM) is generally used as a clustering algorithm (like k-means) for knowledge discovery [48].EM Algorithm on Clustering Features takes each clustering feature as a data object.It describes a sub cluster of data items more accurate, and so less sensitive to the data summarization procedure.

A. Application of k-Means Clustering algorithm in educational data set 1) K-means:
 Partitional clustering approach  Each cluster is associated with a centroid (center point)  Each point is assigned to the cluster with the closest centroid  Number of clusters, K, must be specified [13] The K-means algorithm accepts the number of clusters to group data into, and the dataset to cluster as input values.It then creates the first K initial clusters (K= number of clusters needed) from the dataset by choosing K rows of data randomly from the dataset.The K-Means algorithm calculates the Arithmetic Mean of each cluster formed in the dataset.The Arithmetic Mean of a cluster is the mean of all the individual records in the cluster.In each of the first K initial clusters, there is only one record.The Arithmetic Mean of a cluster with one record is the set of values that make up that record [20], [4].where N = the total number of students in a cluster and n = the dimension of the data The group assessment in each of the cluster size is evaluated by summing the average of the individual scores in each cluster.The results generated are shown in the following tables.2) Using cluster number, K=3 In Table IV, for k = 3; in cluster 1, for a selected cluster size 24 the overall performance is 67, in cluster 2, for a selected cluster size 18 the overall performance is 48.5 and in cluster 3, for a selected cluster size 30 the overall performance is 58.6.In Fig. 6, the overall performance for a selected cluster size 24 is 67%, in cluster 2, for a selected cluster size 18 the overall performance is 48.5% and in cluster 3, for a selected cluster size 30 the overall performance is 58.6%.We can analyze and conclude that, 24 out of 72 total selected students shows an "Average" performance (67%), 18 out of 72 total selected students shows performance in the criteria of "Poor" performance (48.5%) and while the remaining 30 out of 72 total selected students shows an "Below Average" performance (58.6%).

3) Using cluster number, K=4
Table V shows the sample data, in table 1, for k = 3; in cluster 1, for a selected cluster size 24 the overall performance is 67, in cluster 2, for a selected cluster size 18 the overall performance is 48.5 and in cluster 3, for a selected cluster size 30 the overall performance is 58.6.In Fig. 7, the overall performance for a selected cluster size 8 is 52.5%, in cluster 2, for a selected cluster size 14 the overall performance is 70.5% and in cluster 3, for a selected cluster size 22 the overall performance is 61.4% and for a selected cluster size 28 the overall performance is 45.5%.We can analyze and conclude that, 8 out of 72 total selected students shows a "Below Average" performance (52.5), 14 out of 72 total selected students' shows performance in the criteria of "Good" performance (70.5).22 out of 72 total selected students shows performance in the criteria of "Average" performance (61.4) and while the remaining 28 out of 72 total selected students shows an "Poor" performance (45.5).

4) Using cluster number, K=5
Table VI shows the sample data, in table 1, for k = 5; in cluster 1, for a selected cluster size 7 the overall performance is 51.66, in cluster 2, for a selected cluster size 11 the overall performance is 80.3.While in cluster 3, for a selected cluster size 17 the overall performance is 61.25, in cluster 4, for a selected cluster size 15 the overall performance is 42.9 and in cluster 5, for a selected cluster size 22 the overall performance is 66.5.In Fig. 8, the overall performance for a selected cluster size 7 is 51.66%, in cluster 2 for a selected cluster size 11 the overall performance is 80.3%.In cluster 3, for a selected cluster size 17 the overall performance is 61.25%, for a selected cluster size 15 the overall performance is 61.4%, for a selected cluster size 15 the overall performance is 42.9% and for a selected cluster size 22 the overall performance is 66.5%.We can analyze and conclude that, 7 out of 72 total selected students shows a "Below Average" performance (51.66), 11 out of 72 total selected students' shows performance in the criteria of "Very Good" performance (80.3).17 out of 72 total selected students shows performance in the criteria of "Average" performance (61.25), 15 out of 72 total selected students shows performance in the criteria of "Poor" performance (42.9) and while the remaining 22 out of 72 total selected students shows an "Average" performance (66.5).

VI. CONCLUSION
Data mining is a powerful analytical tool that enables educational institutions to predict and analyze the performance of the students.With the ability to reveal hidden patterns in large databases, higher education institutions can build models that predict with a high degree of accuracy the behavior of population by dividing the available up to date data samples into specific clusters.By acting on these predictive data mining models, educational institutions can effectively address issues such as students' progression, transfers and retention.The newly identified data patterns can also be utilized to monitor the graduates' aspects as well as the alumni facts.In this paper the researcher identified various data mining algorithms that can be used to identify the significant variables that affects and influences the performance of students in higher education institutions.The study was basically focused on the performance evaluation of undergraduate students.At the end, the researcher demonstrated a technique using k-means clustering algorithm and combined with the deterministic model on a data set of undergraduate students' examination marks in four registered courses in one semester.This is done for each student for total number of 72 students.A numerical interpretation of the results for the performance evaluation has been produced as a result.As the result of the analysis, it has been concluded that clustering algorithm serves as a good benchmark to monitor the students' performance in higher institution.The clustering algorithm used in this paper is k-means algorithm.The use of clustering data mining algorithm also helps the institution management to monitor the students' semester by semester performance and the results can be used in identifying measures to improve future academic semester performances.

Fig. 1 .
Fig. 1.Step in an iterative knowledge discovery process

Fig. 3 .
Fig. 3.The McCullogh-Pitts model In 1943, McCulloch and Pitts first presented a mathematical model (M-P model) of a neuron.Since then many artificial neural networks have developed from the well-known M-P model[36],[32].Farthest first algorithm proposed by Hochbaum and Shmoys 1985 has same procedure as k-means, this also chooses centroids and assign the objects in cluster but with max distance and initial seeds are value which is at largest distance to the mean of values.Here cluster assignment is different, at initial cluster we get link with high Session Count, like at cluster-0 more than in cluster-1, and so on.Farthest first actually solves problem of k-centre and it is very efficient for large set of data.In farthest first algorithm we are not finding mean for calculating centroid, it takes centroid arbitrary and distance of one centroid from other is maximum[12].Fuzzy c-Means Clustering performs clustering by iteratively searching for a set of fuzzy clusters and the associated cluster centres that represent the structure of the data as best as possible[15].The algorithm relies on the user to specify the number of clusters present in the set of data to be clustered.K-means algorithm is a partitioned clustering approach, which is used when you have unlabeled data (i.e., data without defined categories or groups)[20].The goal of this algorithm is to find groups in the data, with the number of groups represented by the variable K.The algorithm works iteratively to assign each data point to one of K groups based on the features that are provided[24].Data points are clustered based on feature similarity.The results of the K-means clustering algorithm are: the centroids of the K clusters, which can be used to label new data and the labels for the training data (each data point is assigned to a single cluster)The name comes from representing each of k clusters Cj by the mean (or weighted average) cj of its points, the socalled centroid.While this obviously does not work well with categorical attributes, it has a good geometric and statistical sense for numerical attributes[45].The sum of discrepancies between a point and its centroid expressed through appropriate distance is used as the objective function.For example, the L2-norm based objective function, the sum of the squares of errors between the points and the corresponding centroids, is equal to the total intracluster variance

TABLE I :
SUGGESTED CLUSTERING ALGORITHMS IN EDUCATIONAL DATA MINING

TABLE II :
PERFORMANCE CRITERIA

TABLE IV :
DATA FOR K = 3

TABLE V :
DATA FOR K = 4

TABLE VI :
DATA FOR K = 5 Fig. 8. Overall Performance versus cluster size for k = 5