Predicting academic success in higher education: literature review and best practices

Alyahyan, Eyman; Düştegör, Dilek

doi:10.1186/s41239-020-0177-7

Review article
Open access
Published: 10 February 2020

Predicting academic success in higher education: literature review and best practices

International Journal of Educational Technology in Higher Education volume 17, Article number: 3 (2020) Cite this article

116k Accesses
216 Citations
9 Altmetric
Metrics details

Abstract

Student success plays a vital role in educational institutions, as it is often used as a metric for the institution’s performance. Early detection of students at risk, along with preventive measures, can drastically improve their success. Lately, machine learning techniques have been extensively used for prediction purpose. While there is a plethora of success stories in the literature, these techniques are mainly accessible to “computer science”, or more precisely, “artificial intelligence” literate educators. Indeed, the effective and efficient application of data mining methods entail many decisions, ranging from how to define student’s success, through which student attributes to focus on, up to which machine learning method is more appropriate to the given problem. This study aims to provide a step-by-step set of guidelines for educators willing to apply data mining techniques to predict student success. For this, the literature has been reviewed, and the state-of-the-art has been compiled into a systematic process, where possible decisions and parameters are comprehensively covered and explained along with arguments. This study will provide to educators an easier access to data mining techniques, enabling all the potential of their application to the field of education.

Introduction

Computers have become ubiquitous, especially in the last three decades, and are significantly widespread. This has led to the collection of vast volumes of heterogeneous data, which can be utilized for discovering unknown patterns and trends (Han et al., 2011), as well as hidden relationships (Sumathi & Sivanandam, 2006), using data mining techniques and tools (Fayyad & Stolorz, 1997). The analysis methods of data mining can be roughly categorized as: 1) classical statistics methods (e.g. regression analysis, discriminant analysis, and cluster analysis) (Hand, 1998), 2) artificial intelligence (Zawacki-Richter, Marín, Bond, & Gouverneur, 2019) (e.g. genetic algorithms, neural computing, and fuzzy logic), and 3) machine learning (e.g. neural networks, symbolic learning, and swarm optimization) (Kononenko & Kukar, 2007). The latter consists of a combination of advanced statistical methods and AI heuristics. These techniques can benefit various fields through different objectives, such as extracting patterns, predicting behavior, or describing trends. A standard data mining process starts by integrating raw data – from different data sources – which is cleaned to remove noise, duplicated or inconsistent data. After that, the cleaned data is transformed into a concise format that can be understood by data mining tools, through filtering and aggregation techniques. Then, the analysis step identifies the existing interesting patterns, which can be displayed for a better visualization (Han et al., 2011) (Fig. 1).

Recently data mining has been applied to various fields like healthcare (Kavakiotis et al., 2017), business (Massaro, Maritati, & Galiano, 2018), and also education (Adekitan, 2018). Indeed, the development of educational database management systems created a large number of educational databases, which enabled the application of data mining to extract useful information from this data. This led to the emergence of Education Data Mining (EDM) (Calvet Liñán & Juan Pérez, 2015; Dutt, Ismail, & Herawan, 2017) as an independent research field. Nowadays, EDM plays a significant role in discovering patterns of knowledge about educational phenomena and the learning process (Anoopkumar & Rahman, 2016), including understanding performance (Baker, 2009). Especially, data mining has been used for predicting a variety of crucial educational outcomes, like performance (Xing, 2019), retention (Parker, Hogan, Eastabrook, Oke, & Wood, 2006), success (Martins, Miguéis, Fonseca, & Alves, 2019; Richard-Eaglin, 2017), satisfaction (Alqurashi, 2019), achievement (Willems, Coertjens, Tambuyzer, & Donche, 2018), and dropout rate (Pérez, Castellanos, & Correal, 2018).

The process of EDM (see Fig. 2) is an iterative knowledge discovery process that consists of hypothesis formulation, testing, and refinement (Moscoso-Zea et al., 2016; Sarala & Krishnaiah, 2015). Despite many publications, including case studies, on educational data mining, it is still difficult for educators – especially if they are a novice to the field of data mining – to effectively apply these techniques to their specific academic problems. Every step described in Fig. 2 necessitates several decisions and set-up of parameters, which directly affect the quality of the obtained result.

This study aims to fill the described gap, by providing a complete guideline, providing easier access to data mining techniques and enabling all the potential of their application to the field of education. In this study, we specifically focus on the problem of predicting the academic success of students in higher education. For this, the state-of-the-art has been compiled into a systematic process, where all related decisions and parameters are comprehensively covered and explained along with arguments.

In the following, first, section 2 clarifies what is academic success and how it has been defined and measured in various studies with a focus on the factors that can be used for predicting academic success. Then, section 3 presents the methodology adopted for the literature review. Section 4 reviews data mining techniques used in predicting students’ academic success, and compares their predictive accuracy based on various case studies. Section 5 concludes the review, with a recapitulation of the whole process. Finally, section 6 concludes this paper and outlines the future work.

Academic success definition

Student success is a crucial component of higher education institutions because it is considered as an essential criterion for assessing the quality of educational institutions (National Commission for Academic Accreditation &amp, 2015). There are several definitions of student success in the literature. In (Kuh, Kinzie, Buckley, Bridges, & Hayek, 2006), a definition of student success is synthesized from the literature as “Student success is defined as academic achievement, engagement in educationally purposeful activities, satisfaction, acquisition of desired knowledge, skills and competencies, persistence, attainment of educational outcomes, and post-college performance”. While this is a multi-dimensional definition, authors in (York, Gibson, & Rankin, 2015) gave an amended definition concentrating on the most important six components, that is to say “Academic achievement, satisfaction, acquisition of skills and competencies, persistence, attainment of learning objectives, and career success” (Fig. 3).

Despite reports calling for more detailed views of the term, the bulk of published researchers measure academic success narrowly as academic achievement. Academic achievement itself is mainly based on Grade Point Average (GPA), or Cumulative Grade Point Average (CGPA) (Parker, Summerfeldt, Hogan, & Majeski, 2004), which are grade systems used in universities to assign an assessment scale for students’ academic performance (Choi, 2005), or grades (Bunce & Hutchinson, 2009). The academic success has also been defined related to students’ persistence, also called academic resilience (Finn & Rock, 1997), which in turn is also mainly measured through the grades and GPA, measures of evaluations by far the most widely available in institutions.

Review methodology

Early prediction of students’ performance can help decision makers to provide the needed actions at the right moment, and to plan the appropriate training in order to improve the student’s success rate. Several studies have been published in using data mining methods to predict students’ academic success. One can observe several levels targeted:

Degree level: predicting students’ success at the time of obtention of the degree.
Year level: predicting students’ success by the end of the year.
Course level: predicting students’ success in a specific course.
Exam level: predicting students’ success in an exam for a specific course.

In this study, the literature related to the exam level is excluded as the outcome of a single exam does not necessarily imply a negative outcome.

In terms of coverage, section 4 and 5 only covers articles published within the last 5 years. This restriction was necessary to scale down the search space, due to the popularity of EDM. The literature was searched from Science Direct, ProQuest, IEEE Xplore, Springer Link, EBSCO, JSTOR, and Google Scholar databases, using academic success, academic achievement, student success, educational data mining, data mining techniques, data mining process and predicting students’ academic performance as keywords. While we acknowledge that there may be articles not included in this review, seventeen key articles about data mining techniques that were reviewed in sections 4 and 5.

Influential factors in predicting academic success

One important decision related to the prediction of students’ academic success in higher education is to clearly define what is academic success. After that, one can think about the potential influential factors, which are dictating the data that needs to be collected and mined.

While a broad variety of factors have been investigated in the literature with respect to their impact on the prediction of students’ academic success (Fig. 4), we focus here on prior-academic achievement, student demographics, e-learning activity, psychological attributes, and environments, as our investigation revealed that they are the most commonly reported factors (summarized in Table 1). As a matter of fact, the top 2 factors, namely, prior-academic achievement, and student demographics, were presented in 69% of the research papers. This observation is aligned with the results of The previous literature review which emphasized that the grades of internal assessment and CGPA are the most common factors used to predict student performance in EDM (Shahiri, Husain, & Rashid, 2015). With more than 40%, prior academic achievement is the most important factor. This is basically the historical baggage of students. It is commonly identified as grades (or any other academic performance indicators) that students obtained in the past (pre-university data, and university-data). The pre-university data includes high school results that help understand the consistency in students’ performance (Anuradha & Velmurugan, 2015; Asif et al., 2015; Asif et al., 2017; Garg, 2018; Mesarić & Šebalj, 2016; Mohamed & Waguih, 2017; Singh & Kaur, 2016). They also provide insight into their interest in different topics (i.e., courses grade (Asif et al., 2015; Asif et al., 2017; Oshodi et al., 2018; Singh & Kaur, 2016)). Additionally, this can also include pre-admission data which is the university entrance test results (Ahmad et al., 2015; Mesarić & Šebalj, 2016; Oshodi et al., 2018). The university-data consists of grades already obtained by the students since entering the university, including semesters GPA or CGPA (Ahmad et al., 2015; Almarabeh, 2017; Hamoud et al., 2018; Mueen et al., 2016; Singh & Kaur, 2016), courses marks (Al-barrak & Al-razgan, 2016; Almarabeh, 2017; Anuradha & Velmurugan, 2015; Asif et al., 2015; Asif et al., 2017; Hamoud et al., 2018; Mohamed & Waguih, 2017; Mueen et al., 2016; Singh & Kaur, 2016; Sivasakthi, 2017) and course assessment grades (e.g. assignment (Almarabeh, 2017; Anuradha & Velmurugan, 2015; Mueen et al., 2016; Yassein et al., 2017); quizzes (Almarabeh, 2017; Anuradha & Velmurugan, 2015; Mohamed & Waguih, 2017; Yassein et al., 2017); lab-work (Almarabeh, 2017; Mueen et al., 2016; Yassein et al., 2017); and attendance (Almarabeh, 2017; Anuradha & Velmurugan, 2015; Garg, 2018; Mueen et al., 2016; Putpuek et al., 2018; Yassein et al., 2017)).

Table 1 Most influential factors on the prediction of students’ academic success

Predicting academic success in higher education: literature review and best practices

Abstract

Introduction

Academic success definition

Review methodology

Influential factors in predicting academic success

Data mining techniques for prediction of students’ academic success

Degree level

Year level

Course level

Data mining process model for student success prediction

Data collection

Initial preparation of data

Data selection

Data cleaning

Derivation of new variables

Statistical analysis

Data preprocessing

Data transformation

Imbalanced datasets

Feature selection

Data mining implementation

Data mining models

Data mining tools

Results evaluation

Conclusion

Availability of data and materials

Abbreviations

References

Acknowledgments

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords