A Prediction Model - Universiti Teknologi Malaysia

 A Prediction Modelfor the Factors Impacting Student Academic Performancebased on Classification AlgorithmsUmirul Adilla M. N. IhsanUniversiti Teknologi MalaysiaRazak Faculty of Technology and Informaticsumiruladilla@_______________________________________________________________________________AbstractThe paper aims to find the key factor that impacts student academic performance (SAP) using a prediction model using 3 classification algorithms: logistic regression, decision tree and random forest. As a result, academic performance is mainly affected by students' frequent hand-raising during class.Keywords: Prediction model, machine learning, classification, academic performance.__________________________________________________________________1. IntroductionThe application of prediction model has benefited higher educational institution in managing their student data, helping students with course enrolment, keeping track of examination marks and so forth. Based on countless improvements and enhancement in scholastic field achieved with the development of prediction model, educational data mining (EDM) within schools are significant as EDM done within higher educational institutions (Marsh & Farrell, 2015). School student data are increasing in volume and the data need proper management as well. Therefore, the use of LMS, ITS or AEHS system is considered a smart plan to manage student data. Past researchers had once stated that higher scores in school data management is highly correlated with better student performance (Bloom, Lemos, Sadun, & Van Reenen, 2015). The time has come for prediction model to be introduced within schools with the intention to help the teachers monitor their students’ learning behaviour. School students need proper monitoring with their study as well because they are not only growing in numbers but also have diverse learning behaviour and show different attitude towards study. Past researcher urged for the students’ behaviour and attitude to be learnt because these are important traits and it affects student academic performance (SAP) (Mccoy, 2016). Therefore, Kalboard360 an e-learning system enriched with X-API (Experience API) is suggested to be implemented within schools to capture factors impacting SAP such as learning behaviour of students.__________________________________________________________________* Corresponding author. umiruladilla@graduate.utm.my It is impossible to ensure improvement for every factor which is in the system all at once. One factor should be the focus of an improvement so that proper monitoring can be done, staggered progress can be made and satisfying results can be achieved. In order to do so, a prediction model needs to be developed. The algorithms chosen as predictors are what will help in predicting which factor needs to be triggered to motivate school student to study. The best predictor needs to be found in order to know the first factor that needs to be concentrate on. In addition to that, the algorithms must be suitable with the data. After result is obtained, the key factor impacting SAP can be used by teachers to identify at-risk students, promote students to self-reflect and be aware of what and how they are doing in studies (Park & Jo, 2015). Research proves how the school system is crucial in supporting schools and teachers in utilizing student data when they used the factors collected to conduct SAP prediction (Farrell, 2015). Furthermore, this paper emphasizes what teachers can improve their teaching skills by studying the factors affecting SAP. Teachers are then able to enrich their teaching sources by providing latest and informative reads once the key factor impacting SAP prediction are exposed. In order to encounter the problems, this Short Communication recording the results of the complete small investigation using prediction model involving high volume school data is proposed. The proposed Short Communication is supporting these objectives:To identify the factors included in the prediction model.To determine the best predictor among the chosen classification algorithm.To evaluate the factors impacting SAP.The next two sections, Methodology and Discussions, of this paper will discuss how these three objectives of the experiment are being realized.2. MethodologyResearch require the three objectives to be realized. Therefore, few steps are to be taken into action and are discussed as below which include importing, cleaning and discretization of the school data into Jupyter Notebook. Other methods also include classification of collected factors, prediction method in order to discover which algorithm gives the highest accuracy, and ranking of impacting factors to know which factors should be the center of focus when improving student academic performance.2.1. Importing DataThe first step to be conducted is importing dataset into Python Jupyter Notebook. Importing data is important as it needs to be read and learned by the application. Only after that will coding be used to engineer the data. This will later reveal the key factor impacting SAP. Figure 1 below is the display of view in the dashboard.Figure 1. Dataset imported into Jupyter Notebook2.2. Data PreprocessingSecond step is data pre-processing where concern is focused on transforming the collected data from Kalboard360 into suitable format. The process gives aid in identifying and excluding all cases with missing data during performance. Data preprocessing is of the function to detect if there are any missing values in the dataset which needs to be taken off, ignored or edited if any. Turned out the student dataset used for this project does not have missing values as proven in Figure 2 below.Figure 2. No missing values detected2.2.1. Data CleaningCleaning of data involves removal of absurd, redundant, abnormal or noisy data to increase the reliability of results collected from the three classification algorithms later. For this project, data is cleaned from abnormal detailing such as inappropriate spelling, acronyms and letters capitalization. This will ease the task involving machine learning. After the cleaning, factors are reduced to 16 with only 366 instances to be used as sample.2.2.2. Data DiscretizationDiscretization transform the school data from numerical values to nominal values (Amrieh et al., 2015). The data used for this project undergoes discretization to change numerical values to nominal values. Before discretization, the data uses numbers to represent which level the students are in. After discretization, data written as 1 is now High, 2 is now Medium and 3 is now Low level. This way the data is separated into nominal intervals: High level, Medium level and Low level according to the grades which the student achieved.2.3. Classification methodObjective 1: To identify the factors included in the prediction model.The finalized factors are further classified into four categories which are Demographic: Nationality, Gender, Place of Birth, Relation; Academic: Stage ID, Section ID, Semester and Subject; Behavioural: Raised Hands, Visited Resources, Discussion, Student Absence-days and Announcement View; and Social: Parent Answering Survey and Parent-School Satisfaction.2.3.1. Demographic FactorsDemographic factors are the factors such as nationality, gender, the place where the students are born, and students’ ties with family. The description is explained in Table 1. The factors significance is the student population are of different nationality, gender, place of birth and under the supervision of different parents. This creates diversity and allow comparisons to be made. Table 1. Demographic FactorsFactorsDescriptionNationalittyStudent nationality.GenderFemale or male student.Place of BirthPlace where student was born (Jordan, Kuwait, Lebanon, Saudi Arabia, Iran, USA).2.3.2. Academic Background FactorsAcademic background factors include factors such as StageID, SectionID, Semester and Subject which are further detailed in Table 2. The grades which students achieved are valuable data since it is used by teachers to better comprehend and make assumptions of student performance towards the end of a term. This will also allow for corrective action to be taken by teachers for their next examination (Bonde & Kirange, 2018). Table 2. Academic Background FactorsFactorsDescriptionStage IDStudent education level: Lower, Middle, HighSection IDSection student belongs to: A, B, C.SemesterSemester student is enrolling: First or SecondSubjectSubjects taken (Maths, English, IT, Arabic, Science or Quran.2.3.3. Behavioural FactorsBehavioural factors are the factors labelled RaisedHands, VisitedResources, Discussion, StudentAbsenceDays and AnnouncementView as described in Table 3. These are the student interaction with Kalboard360. A recent study has proven that student commitment gives impacts to their GPAs (Li, Allen, & Casillas, 2017). The more committed the student towards study, the more they are likely to perform well academically.Table 3. Behavioural FactorsFactorsDescriptionRaisedHandsTimes of student raise hands in class.VisitedResourcesTimes of student visited online resources.DiscussionTimes of student joins online discussion.StudentAbsenceDaysDays of student being absent: Above-7 or Below-7AnnouncementViewTimes of student viewing online announcement.2.3.4. Social FactorsSocial factors consist of only two factors which are ParentAnsweringSurvey and ParentSchoolSatisfaction. Factors are further explained in Table 4. A study revealed that family involvement was associated with increases in school grades from middle school to high school students (Wang, Hill, & Hofkens, 2014). Past research involving 39 reviewed papers also mention the association of parenting styles with academic outcomes including achievement, motivation and school behaviours (Masud, Thurasamy, & Ahmad, 2015).Table 4. Social FactorsFactorsDescriptionRelationParent who is responsible for student.ParentAnsweringSurveyParent responses to school survey or not (Yes or No).ParentSchoolSatisfactionThe degree of parent satisfaction towards school: Good or Bad.2.4. SAP Prediction ModelPrediction is aimed to predict unknown factors based on the stored history data for the same factors in new dataset. The prediction model needs to have limited labelled data. The data offers some prior knowledge regarding the factors that is to be predicted (Algarni, 2016). The prediction type used for this project is classification. Classification uses existing knowledge to build a learning model and then use it as categorical factors for new data. Classification algorithms here represent logistic regression, random forest and decision tree. The three algorithms are chosen to be applied and impact the student academic performance.Figure 3. Prediction Model Development ProcessFigure 3 shows the process of developing a prediction model prior to this research significant of identifying the factor impacting SAP. It starts with data importing into Jupyter Notebook dashboard which later progress to the application of classification algorithms: DT, RF and LR. Accuracy results are obtained, and the best predictor is revealed. After that, the variables representing the factors impacting SAP are evaluated. Lastly, a key factor which is to be the centre of focus for SAP improvement is discovered.2.4.1. Decision TreeDecision trees (DT) is a method for supervised learning technique. According to the previous chapter, DT is the most frequently used classifier for SAP prediction. DT is to be applied in this project for its reliability proven by many researchers. The goal of this prediction is determined by learning simple decision rules from the prediction factors. As agreed by most researchers, DT is known for its simplicity and is an algorithm which is easy to handle when dealing with small or large construct of data (Natek & Zwilling, 2014).2.4.2. Random ForestRandom forest (RF) is an ensemble trees generated by bootstrapping samples of the training set. It determines random feature selection in tree induction. RF is a machine learning can do both classification and regression tasks. RF is the extension of the algorithm DT which are trained on different parts of the same training set (Mahboob et al., 2017). It improves the classification rate and overcome overfitting. Thus, suitable for finding the most powerful features that could influence the SAP prediction.2.4.3. Logistic RegressionLogistic regression (LR) is a more complex algorithm compared to the two mentioned algorithms. It was used as a model to detect struggling students who are at risk of failing course (Xing, Guo, Petakovic, & Goggins, 2014). LR, besides being recognized as one of the classification methods, is also a part of statistics. It fits the school data into logistic function. LR is considered robust, easy to engineer and customize and reliable for meaningful interpretation (Liu, Li, & Liang, 2013). This is one of the algorithms which are rarely speaks of in EDM.3. DiscussionsObjective 2: To determine the best predictor among the chosen classification algorithm.In order to uncover the best predictor, the accuracy reading of random forest, decision tree and logistic regression are compared. The best predictor would be the algorithm with the highest reading of accuracy. After the dataset is cleaned and imported into Jupyter Notebook, it is split into test and training. The training set contains a known output for the model to learn and generalize the data. Test set is to test the model’s prediction.Figure 4. Splitting Dataset into Test and Training SetFigure 4 above shows the coding for splitting test and training set. ratio is set to 70:30. Meanwhile, Figure 5 below shows coding for the three algorithms.Figure 5. Coding for Decision Tree, Random Forest and Logistic RegressionAmong the three predictors, Figure 6 shows that random forest gives highest accuracy reading which is as high as 84%. Thus, making it the best predictor for student academic performance prediction model. The frequently used decision tree only gives out 74% accuracy while logistic regression gives out the lowest reading with 71% accuracy.Figure 6. Accuracy Results4. ResultsObjective 3: To evaluate the factors impacting student academic performance.The observation of accuracy readings given out by the predictors does not just end there. Instead, the best predictor should be investigated and the key factor contributing to such accuracy should be given attention. This is because the key factor is where the SAP success lies. The key factor should be used to trigger study interest of students and improve teaching skills of teachers.In order to investigate the best predictor (random forest), a code to rank the factors applied with random forest is run using Python as in Figure 7. The code ranks the factors from the one which affects SAP the most to the factors affecting SAP the least as going down the list. This gives teachers the knowledge to plan on which factor should be triggered first in order to improve SAP, which factor should be given attention next and which factor only needs to be maintained and does not require changes.Figure 7. Python coding to evaluate factors in Random ForestThe highest factor to rank in the measure is the trigger to student academic performance (SAP) success. This key factor acts as a catalyst to ensure good student academic performance when teachers or instructors decided to improve it. Besides, teachers are encouraged to ensure the understanding of their students so they can justify their conclusions, communicate and listen to opinions, and ask questions that leads to betterment (Ing, Webb, Franke, & Turrou, 2019).Figure 8. Result of measured factors of Random Forest algorithm.From Figure 8, it can be observed that the variable Raised Hands ranked the highest when measured. This means students who performed well academically usually raised hands during class. The youngsters have the right to have a say in matters of importance to them (Griebler, Rojatz, Simovska, & Forster, 2017) so, they participate in questioning and answering during class with their teachers. This must have helped to better understand their subjects.5. ConclusionAs a conclusion, among the three predictors it is found that random forest wins as best algorithm when forming student academic performance (SAP) prediction model. With 84% of accuracy, it highlights that the key to ensure good SAP is to get the students to raise hands and participate in class. Students who raised hands during class to ask questions, give answers and voice out their opinions are proven to be those who tend to excel in subjects taken. Obviously, the students’ attitude when it comes to studying also affect their performance (Mccoy, 2016).AcknowledgmentsI would like to express my gratitude towards Universiti Teknologi Malaysia (UTM) for funding my masters’ degree and providing access to many resources used for this research. I have gained so much knowledge and understanding regarding my field of study after a lot of reading done. Next, I would like to thank my commited supervisor, Puan Yazriwati for encouraging me to start writing my research, for her guidance, advice and motivation.6. ReferencesMarsh, J. A., & Farrell, C. C. (2015). How leaders can support teachers with data-driven decision making: A framework for understanding capacity building. Educational Management Administration and Leadership 43: 2 (2015) 269–289.Bloom, N., Lemos, R., Sadun, R., & Van Reenen, J. (2015). Does Management Matter in schools? Economic Journal 125: 584 (2015) 647–674.Mccoy, L. P. (2016). Effect of Demographic and Personal Variables on Achievement in Eighth-Grade Algebra 98: 3 (2016) 131–135.Park, Y., & Jo, I. (2015). Development of the Learning Analytics Dashboard to Support Students ’ Learning Performance Learning Analytics Dashboards ( LADs ) 21: 1 (2015) 110–133.Farrell, C. C. (2015). Designing School Systems to Encourage Data Use and Instructional Improvement: A Comparison of School Districts and Charter Management Organizations. Educational Administration Quarterly 51: 3 (2015) 438–471.Amrieh, E. A., Hamtini, T., & Aljarah, I. (2015). Preprocessing and analyzing educational data set using X-API for improving student’s performance. 2015 IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies, AEECT 2015.Bonde, S. N., & Kirange, D. K. (2018). Survey on Evaluation of Student’s Performance in Educational Data Mining. 2018 Second International Conference on Inventive Communication and Computational Technologies (ICICCT) (2018) 209–213.Li, Y., Allen, J., & Casillas, A. (2017). Relating psychological and social factors to academic performance: A longitudinal investigation of high-poverty middle school students. Journal of Adolescence 56 (2017) 179–189.Wang, M. Te, Hill, N. E., & Hofkens, T. (2014). Parental Involvement and African American and European American Adolescents’ Academic, Behavioral, and Emotional Development in Secondary School. Child Development 85: 6 (2014) 2151–2168.Masud, H., Thurasamy, R., & Ahmad, M. S. (2015). Parenting styles and academic achievement of young adolescents: A systematic literature review. Quality and Quantity 49: 6 (2015) 2411–2433.Algarni, A. (2016). Data Mining in Education. International Journal of Advanced Computer Science and Applications 7: 6 (2016) 456–461.Natek, S., & Zwilling, M. (2014). Student data mining solution-knowledge management system related to higher education institutions. Expert Systems with Applications 41: 14 (2014) 6400–6407.Mahboob, T., Irfan, S., & Karamat, A. (2017). A machine learning approach for student assessment in E-learning using Quinlan’s C4.5, Naive Bayes and Random Forest algorithms. Proceedings of the 2016 19th International Multi-Topic Conference, INMIC 2016.Xing, W., Guo, R., Petakovic, E., & Goggins, S. (2014). Participation-based student final performance prediction model through interpretable Genetic Programming: Integrating learning analytics, educational data mining and theory. Computers in Human Behavior 47 (2014) 168–181.Liu, D., Li, T., & Liang, D. (2013). Incorporating logistic regression to decision-theoretic rough sets for classifications. International Journal of Approximate Reasoning 55 (2013) 197–210.Ing, M., Webb, N. M., Franke, M. L., & Turrou, A. C. (2019). Student participation in elementary mathematics classrooms?: the missing link between teacher practices and student achievement 90: 3 (2019) 341–356.Griebler, U., Rojatz, D., Simovska, V., & Forster, R. (2017). Effects of student participation in school health promotion?: a systematic review (2017) 195–206. ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download