The Investigation of Student Dropout Prediction Model in Thai Higher Education Using Educational Data Mining: A

The student’s retention rate is one of the challenging issues that represents the quality of the university . A high dropout rate of students affects not only the reputation of the university but also the students’ career in the future . Therefore, there is a need of student dropout analysis in order to improve academic plan and management to reduce students drop out from the university as well as to enhance the quality of the higher education system . Data mining technique provides powerful methods for the analysis and the prediction of the dropout . This paper proposes a model for predicting students’ dropout using the dataset from the representative of the largest public university in the Southen part of Thailand . In this study, data from Faculty of Science, Prince of Songkla University was collected from academic year of 2013 to 2017. The experiment result shows that JRip rule induction is the best technique to generate a prediction model receiving the highest accuracy value of 77 . 30 %. The results high-light the potential prediction model that can be used to detect the early state of dropping out of the student which the university can provide supporting program to improve the student retention rate .


Introduction
Education has been an important factor for developing a country as it produces skill and educated labours who in turn are the key factor for successful economic development. However, one study showed that some students could not pass through the university level [1]. This issue is not only affecting the academic field but also influences the image of the country. Especially in higher education, student retention is a challenging task that represents the efficiency and reliability of the institution. Finding hidden patterns or prediction trend in vast database helps to improve the quality of management decision-making which can allocate resources appropriately with a better understanding of student learning environment. The prediction with high accuracy in students' dropout is beneficial as it helps to identify the students at the risk stage of academic performance. Data mining has been shown the successful benefit in the business domain and it can be a suitable tool to benefit in the educational domain for finding useful information hidden in the huge dataset.
Data mining is a technique to extract valuable data from a larger set of any raw data to discover helpful patterns and relationship. A process of processing data in data mining known as CRISP-DM (Cross Industry Standard Process for Data Mining) model [2]. The CRISP-DM involves six phases including 1) business understanding which is a phase for creating the understanding of project objectives and requirements in order to convert it to data mining problem definition; 2) data understanding that focuses on identifying the data which need to be collected; 3) data preparation as a phase for constructing the final ready-to-use dataset in the modelling tool; 4) modelling which focuses on selecting and applying various modelling techniques to find the best fit model for extracting useful information and pattern in the dataset; 5) evaluation phase for assesseing the achievement of the finding model compare with the project objectives, and 6) the final phase as deployment which carries out the use of the created models [3]. The techniques that were developed and implemented in data mining consist of various methods which include classification, clustering, association rule, and regression. The classification method constructs a model based on the training set of known class labels data to classify unknown objects. Clustering is an approach for grouping records that are similar within a group. Therefore, clustering is similar to a classification method as it is an approach to organize data in classes or groups. However, the difference between classification and clustering is that in the latter class labels are unknown. The association rule assists to discover the relationship between parameters. It studies the frequency of items occurring together in the transaction [4]. Prediction approaches to create a predictive model that gives potential implications of numerical values in the future or increase/decrease trends in time [5]. The advantages of data mining are recognised in the application within educational research.
Educational Data Mining (EDM) is an approach to covert raw data in education system into useful information that can help lecturers or academic staffs to perform corrective actions [6]. The works in Educational Data mining focus on developing, researching, and applying computerized methods to analyze student data. The EDM is an approach to extract useful information from educational systems similar to the extracting knowledge process in data mining. In Educational Data Mining, classification is the most popular task to build students' performance predictive model. Several algorithms in classification technique were applied to predict students' performance, namely Decision Tree, Rule Induction, Neural Network, Naïve Bayes, and Support Vector Machine (SVM) [7]. Decision Tree, a popular technique for prediction for its simplicity and comprehension, is applied to discover valuable data from a small or large data set. Rule Induction is a technique to produce rules in a set of IF-THEN statement [8,9]. Neural Network facilitates the detection of a possible interaction between predictor variable [10]. Naïve Bayes computes a set of probabilities in a given data set [11]. Support Vector Machine, a supervised machine learning algorithm, finds the hyperplane (decision boundaries that help to classify the data points) in an N-dimensional space (N, number of features) that optimally separates the data into two categories [11]. In addition, the EDM methodologically can integrate to other fields like psychology, for example, the integration of psychometric modelling with data mining techniques. Moreover, EDM involves different groups of users which effect to the perspective of using educational information from different angles depending on the users' mission and vision [8].
This study proposes the model to predict student drop out in the university level by using Data Mining. We applied the prediction model by using classification techniques. We investigated the student records of Faculty of Science, Prince of Songkla University, Thailand such as admission method (how students enter to university), major, education status, study term, grade point average, students' high school province, and grade point average at the students' high school. In this paper, the model obtains the rule in the form of IF-THEN statements in which the IF part contains m predicting conditions of attributes, while the THEN part contains a prediction attribute or class. The classification rules were also used to identify the factors that might affect the status of students and this can help the faculty or university to offer personalised help for students in order to reduce dropout rate in university. In several experiments we compared the classification algorithms to find the best algorithm that works well with education data in Thai higher education environment. The structure of the paper is the related works, the methodology applied in this study, and the results. Finally, we present the conclusions and future works.

Related Work
There are several studies conducted on prediction student dropout or student's performance. Márquez-Vera, Moroles, and Soto [13] proposed a method for predicting school failure and dropout of student in a middle-school applying data mining technique namely induction rules and decision tree. Moreover, Ramesh, Parkavi, and Ramar [14] investigated the factors which influence the performance of students in their final examination and found a data mining algorithm for predicting the grade of students. The results from the study showed that parents' occupation has a high impact on predicting student's performance and multilayer perceptron algorithm corresponds most closely to predict grades.
Tekin [15] proposed the model for predicting student's grade point average at graduation. Three algorithms namely Neural Networks (NN), support vector machines (SVM), and Extreme Learning Machine (ELM) were used for generating the predictive model. The results from their study showed that the SVM technique provides accurate prediction at the rate of 97.98%. In addition, Namdeo and Jayakumar [16] proposed the basic approach and concept of rough set theory to predict student's performance in course work. Moreover, Mythili and Shanavas [17] proposed the model to analyse and evaluate students' performance using classification algorithm. Datasets used for analysis consists of 6 attributes namely gender, living locality, parental education, economic status, and class attendance. The results showed that the class attendance is considerably related to student's performance. Ahmad, Ismail, and Aziz [18] proposed a prediction model of students' academic performance based on student's personal data and past academic records. The researchers used decision tree, Naïve Bayes, and rule based classification technique to predict students academic performance. The experiment results showed that the rule based algorithm was a best algorithm to create a model. The knowledge from the predicting model was used to identify and profile the students' level of success in the first semeste. Guarín, Guzmán, and González [19] presented the model which predicts the attrition rate of students at the Universidad Nacional de Colombia in the first enrollment. The classification technique used to identify academic attrition of students were Naïve Bayes and a decision tree. The experimental result showed that the prediction of dropout status was improved when the academic data are added. In addition, Abu-Oda and El-Halees [20] proposed the prediction model of student's dropout from university using decision tree and Naïve Bayes techniques. The collected data related to students' study history, GPA, and high school grade point average. The results showed that decision tree and Naïve Bayes build the model that has the accuracy of predition as 98.14% and 96.86% respectively. The study of Aulck, Velagapudi, Blumenstock, and West [21] proposed a model to predict the student retention in the university using a large, heterogeneous dataset of student's demographics and transcript records. The dominant predictor to indicate whether or not a student will drop out from the university were GPA in Math, English, Chemistry, and psychology courses.
Tran, Dang, Dinh, and Phan [22] presented a study on Predicting Student Performance (PSP) in academic systems using regression and classification methods in order to predict student performance on their courses and also help to predict which courses suited their abilities or preferences. Similarly, Pereira and Zambrano [23] used the classification technique based on decision tree to build a model for profiling student dropout pattern. Watkins, NaPhayap, and Jirasukprasert [24] presented intelligent recommendation system that can predict potential dropout of students, provide suggested programs or activities matching students' preference, and predict the expected GPA and grade point. Likewise, Iam-On and Boongoen [25] presented the results of an investigation on the effectiveness of education data mining technique for detecting the first-year students bearing a risk to fail or drop out, and factors influencing the student performance at Mae Fah Luang University.
Ramanathan, Parthasarathy, Vijayakumar, Lakshmanan, and Ramani [26] introduced the predicting student performance model based on cluster of distributed architecture. The researcher predict the student's performance using the combination of Lion-Wolf algorithm and DBN. Additionally, Burgos, Campanario, Pena, Lara, Lizcano, and Martínez [27] investigated historical student course grade data from Elearning system using data mining technique to predict student drop out from an Elearning course. The prediction model was also used for designing a tutoring action plan in which it led to a reduced dropout rate by 14% compared with previous academic years. In addition, Bhanushali, Khan, Madhia, and Majumdar [28] presented model to identify the student who might dropout based on certain attributes such as semester attendance, test marks, and aggregate CGPA of the student in the previous semesters using decision tree and Naive Bayes algorithm.
In summary, there is evidence that the classification method suits well to predict student's performance or discover factors that affect to students' dropout rate.

Methodology
The proposed method in this study for predicting the student dropout pattern is based on the process of Knowledge Discovery and Data Mining (KDD) [29]. The processing step are data collection, data selection, data preprocessing, data mining and experimentation as shown in Figure 1.

Data Collection and Selection
In this study, we gathered student related data from the Faculty of Science, Prince of Songkla University of 5 years from 2013 to 2017. The data were collected from three different data sources which were stored in an MS Excel file. We concentrated on the historical data related to student academic background, carry path of study, and social factors. A detailed description of the three datasets is shown in In this step, data from different data sources were combined into a single dataset using a python script to integrate data from different files. In total, we have 4,238 records with 7 attributes namely, admission method, major, education status, term of enrollment, grade point average of university, province of high school, and grade point average of high school.

Data preprocessing
This step focuses on preparing data suitable for the data mining process. It consists of data cleaning and data transformation process. Data cleaning is a step to remove missing values, noise or redundant data from the dataset. We deleted the attributes that has a number of records with null data more than 80% using the filter feature of Weka. In addition, we ignored dropout type as "Graduation" because we focus on relation about dropout or continue their study. Data transformation is a step to modify the type of attributes to match the requirement of data mining techniques. In order to facilitate the classification model and patterns extraction, the continued variables were transformed to discrete variables using discretize filter in Weka. Then, we formatted the data to ARFF format for building a predictive model. Finally, we obtained the data that will be analysed as shown in Table 2. Then, the whole dataset was divided randomly into training and testing data files. The attribute 'Entrance' represents the method how students enter to Faculty of Science in which 'First' means the student apply or enter to the faculty through the direct application for a special program offered from the faculty, and 'Admission' means the students did the central examination tests before applying to the faculty with their examination scores. The attribute 'Major' represents the major field that students chose to study; 'EducationStatus' represents students' status at the faculty. The attribute 'Term' represents the current semester term time and 'GPX' represents the current grade point average of students. The attribute 'Province' represents the province of high school where students graduated from. 'HighschoolGPX' represents the grade point average of student from their high school.

Experimental Design
This step describes the experiment of data mining techniques which will be selected to use for building a prediction model of student's dropout. We decided to test with the classification Tree Model and Rule-Induction because these classification techniques support to process a substantial number of predictor variables. In addition, this technique supports non-parametric data and capture nonlinear relationships and complex interactions between predictors and the dependent variables [30]. We performed experiments in order to attain an algorithm that provides the highest classification accuracy. We used Weka Experiment Environment tool to run controlled experiments on our prepared datasets with machine learning algorithm. We compared three decision tree algorithms, namely C4.5 (J48), RandomTree, and REPTree and three rule-induction algorithms, namely OneR, ZeroR, and a rule-based learner (JRip). Each algorithm was evaluated using the default algorithm configuration in WEKA.
C4.5 (J48) is one of the most popular algorithms for the decision tree technique [31]. A decision tree induction of C4.5 algorithm consists of internal nodes, branches, and leaf nodes.The internal node represents a test on attribute and leaf nodes represent class label, where as a branch represents an testing outcome [11]; RandomTree, which produces a random set of data for constructing a decision tree; and REPTree, which uses the regression tree logic to creates multiple trees in different iterations [32]. JRip, where classes are examined in increasing size of data [33]; OneR, which uses the minimum-error attributes for class prediction [34]; and ZeroR, which predicts the mean for numeric values and mode for nominal class, and is considered to be a baseline classifier [35]. We considered the ZeroR classifier as a baseline and as an indicator of the predictive power compared with the other algorithms.
All classification algorithms were run on the dataset containing academic information of student e.g. grade point average, major, and high school grade point average using the 10-fold cross-validation for estimating the generalisation performance. The 10-fold cross-validation operation divides the dataset into 10 roughly equal sets. For each set, it trains the model using nine remaining sets and calculates the test error by classifying the given set. Then, the results from the 10 test sets are averaged. The statistical significance of differences in the performance of OneR and other learners is tested with the two-sided paired t-tester in Weka's Experimenter, using a significance level of 5%.
We first tested the results to identify the best algorithm for classifying the class. Table 3 ranks the schemes according to the total number of significant wins (>) and losses (<). The first column (>-<) is the number of wins against the number of losses. In Table 3 is also illustrated that JRip performed the best classified data against the other algorithms. Then, we tested the accuracy of each algorithm to perform the prediction. The results showed that the JRip algorithm achieved a classification of 38.97% (+/-0.53%) which is statistically significant better than ZeroR at 30.14% (+/-0.53%) as shown in

Data mining model
The results from the experiment showed that JRip algorithm is the best algorithm to classify the student drop out status from our datasets. Therefore, we selected JRip to build the prediction model.

Data mining model evaluation
In this section, we evaluate the purposed model. The classification metrics used for testing the performance of our purposed classifier is the statistical measurement from the confusion matrix: true positive rate (TP) and false positive rate (FP). The true positive rate is the pridction to be true is actually true. The false positive rate is the proportion of records which were classified as true but actually false. The true positive rate is equivalent to recall. The precision is the proportion of the records which truly have class x with all records which were defined as class x. The Fmeasure is a combined measure of precision and recall.

Results and Discussion
In this section, we present prediction model rules generated by the algorithm JRip with the dataset of student record of Faculty of Science. The model is in a form of decision rules as shown below:  Table 5 shows the performance of the JRip algorithm using statistical measurement which calculated from the confusion matrix, and the rates of true positive class. From our investigation, we found that the main attributes influencing student dropout are mostly concerning the academic aspect, specifically the term they are studying in and a low average grade. In addition to a low average grade, the method of entering the university and the major of the students also affect the student dropout. These results are aligned with previous results of Pereira and Zambrano [23] in which the dominant factors of student dropout from the university are related to academic factors such as a low average in grades, the semester of the program, and the faculty which the student belongs. Moreover, the location of the high school that students graduated from also related to the students' dropout. Similar was stated by Rahman and Dash [36] who found that the location of student's residence either rural or urban is related with the discipline they chose to study.

Conclusion and Future Work
In this study, we investigated the student record of Faculty of Science, Prince of Songkla University. We analysed data from 2013 to 2017. In total, we have 4,238 records with 7 attributes such as admission method, major, education status, term of enrollment, grade point average of university, province of high school, and grade point average of high school in order to find a model that can predict student dropout. Among all data mining classifiers JRip showed the best accuracy, with 77.30%, when compared all 6 classifiers. This model can help the administrative staff to plan the strategy to support the student in order to increase their retention in the faculty. As future works, we intend to expand the experiment with additional data such as the demographical information of the student, learning behavior with e-learning system, psychological factors. In addition, we will apply different data mining techniques such as association rules for investigating the relationships between socioeconomic and academic factors.