Extracting topological features to identify at-risk students using machine learning and graph convolutional network models

Technological advances have significantly affected education, leading to the creation of online learning platforms such as virtual learning environments and massive open online courses. While these platforms offer a variety of features, none of them incorporates a module that accurately predicts students’ academic performance and commitment. Consequently, it is crucial to design machine learning (ML) methods that predict student performance and identify at-risk students as early as possible. Graph representations of student data provide new insights into this area. This paper describes a simple but highly accurate technique for converting tabulated data into graphs. We employ distance measures (Euclidean and cosine) to calculate the similarities between students’ data and construct a graph. We extract graph topological features (GF) to enhance our data. This allows us to capture structural correlations among the data and gain deeper insights than isolated data analysis. The initial dataset (DS) and GF can be used alone or jointly to improve the predictive power of the ML method. The proposed method is tested on an educational dataset and returns superior results. The use of DS alone is compared with the use of DS+GF\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$DS + GF$$\end{document} in the classification of students into three classes: “failed”,“at risk”, and “good”. The area under the receiver operating characteristic curve (AUC) reaches 0.948 using DS, compared with 0.964 for DS+GF\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$DS + GF$$\end{document}. The accuracy in the case of DS+GF\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$DS + GF$$\end{document} varies from 84.5 to 87.3%. Adding GF improves the performance by 2.019% in terms of AUC and 3.261% in terms of accuracy. Moreover, by incorporating graph topological features through a graph convolutional network (GCN), the prediction performance can be enhanced by 0.5% in terms of accuracy and 0.9% in terms of AUC under the cosine distance matrix. With the Euclidean distance matrix, adding the GCN improves the prediction accuracy by 3.7% and the AUC by 2.4%. By adding graph embedding features to ML models, at-risk students can be identified with 87.4% accuracy and 0.97 AUC. The proposed solution provides a tool for the early detection of at-risk students. This will benefit universities and enhance their prediction performance, improving both effectiveness and reputation.


Introduction
Recent technological developments have had significant impacts on education (Chen et al., 2020;Rodríguez-Hernández et al., 2021) resulting in the development of several online learning platforms such as Tutee, Intelligent Tutor, and Learning Partner (Hwang et al. 2020). These technology-assisted educational tools provide a new paradigm for the education field, offering the potential to monitor students' educational progress and predict their performance. The countries of the Organization for Economic Co-operation and Development have reported an alarming average dropout ratio of approximately 33% for undergraduate students (Ettorre et al. 2022). Such a considerable student dropout ratio leads to significant economic losses. Despite the recent development of many technology-assisted educational platforms, many higher education institutions are suffering from poor student performance (Barbosa Manhães et al., 2015). To address the issue of student dropouts, many automated systems have been developed to predict students' academic performance in higher education institutions (Ahadi et al., 2015;Rastrollo-Guerrero et al., 2020). Predicting student performance in the early stages of a course, particularly for at-risk students, can help the students, instructors, policymakers, and institutes in multiple ways. In particular, this enables the identification of students who will struggle to pass their courses and are at risk of dropping out (Albreiki et al., 2021b). Furthermore, predicting student performance with adequate accuracy can also help with the selection of students to receive grants and scholarships (Rodríguez-Hernández et al., 2021).
To carry out such experiments, several methodological approaches for predicting student academic performance have been developed, including correlation, regression, structural equation modeling, and analysis of variance (Albreiki et al., 2021a;Golino & Gomes, 2014). These traditional methods are based on certain assumptions, such as the independence of observations. However, such assumptions are not fulfilled by the actual data. The violation of these assumptions leads to type 1 and type 2 errors in predicting students' academic performance leading towards biased results (Nimon, 2012).
Recently, artificial intelligence-based techniques, including ML, have achieved promising results in predictive tasks across various fields. Several methods have been developed for predicting student academic performance based upon ML techniques such as artificial neural networks and support vector machines (Ahmad & Shahzadi, 2018;Lau et al., 2019). Existing approaches, however, have been observed to lack accuracy in predicting students' academic performance due to their dependency on multiple factors, both academic and non-academic (Shahiri & Husain, 2015).
Systems based on ML techniques, on one hand, have resulted in significantly improved predictions of students' academic performance. However, many critical challenges exist in developing workable ML methods for this task because of the complex nature of the enormous amounts of real-world data available from different technology-assisted learning platforms. The development of high-accuracy student academic performance prediction systems therefore remains a challenging task, requiring solutions to issues concerning data quality, quantity, and complexity.
To delve deeper, knowledge graph algorithms can help improve the classification and prediction performance by incorporating basic knowledge about the data.
Knowledge graph methods mainly focus on data correlation and can discover topological features that provide insights into interconnected data instances. Novak (1991) introduced the idea of knowledge graphs as concept maps. Later, he used this framework to organize and interconnecting collected knowledge with existing knowledge. Many researchers have used maps to assess technology-based learning platforms (Trumpower et al. 2014;Valsamidis et al., 2012). This approach has been extended to enhance the underlying structure, resulting in a knowledge graph. While it is behind the scope of this paper to discuss the knowledge graph itself in detail, however a knowledge graph represents a network of entities and demonstrates the associations between them. The association-related information is visualized as a graph structure known as a knowledge graph. There are three main components of a knowledge graph: nodes, edges, and labels. A node represents a logical or physical entity. The association between nodes is represented by edges. Knowledge graph models can be applied in different domains, through generative graph models, knowledge graph construction/inference , or network embedding.
From the studies mentioned above, it is clear that most of the current research is primarily concerned with employing basic features and ML approaches to predict students' performance with reasonable accuracy. ML and deep learning (DL) approaches with massive amounts of student data from various technology-assisted educational platforms provide acceptable predictions results of students' academic achievement and identification of at-risk students. However, these approaches currently struggle to extract additional useful features that can understand complex data structures and reflect connections among students' features. Knowledge graphs can uncover feature correlations and be used with promising ML and DL algorithms to predict and enhance student academic achievement. These techniques can be combined to detect at-risk students with better prediction results.
This study proposes a hybrid approach based on knowledge graphs and ML for predicting student academic performance, particularly for at-risk students. From the above discussion, it evident that the traditional method have many flaws and are not well suite for the state of the art students' performance evaluation. Furthermore, the ML techniques where provides a promising results, comes with its own challenges. There seems to be a debate on which techniques works well, and what is needed to be done. In light of these, this study proposed a novel method to extract topological features, by combing the ML and GCN model to predict and identify the poor performing or at risk students. The remainder of this paper is organized as follows. Section "Literature review" presents a literature review and identifies several gaps in existing research. Section "Methodology" states the research objectives and explains the methodology of the proposed approach, including descriptions of the dataset, knowledge representation, feature extraction, and research design. Section "Experimental results" presents the results given by the various experimental settings, before "Discussion and future work" section provides a comprehensive discussion of the results and a comparative analysis with state-of-the-art methods. Finally, "Conclusions" section concludes the paper and identifies areas for future work.

Literature review
The modern online learning platforms have resulted in massive amounts of educational data. The collected data can be analyzed to resolve critical issues in the educational field, such as the student dropout ratio (Mubarak et al., 2022), improved learning platforms Ferguson (2012), and the tracking of students' academic performance. Several efforts have been made in this direction, including the development of a course recommendation system , student behavior prediction system (Mubarak et al., 2021), user intention analysis system (Zhang et al., 2017), at-risk student prediction system (Mubarak et al., 2020), and knowledge tracing (Yang et al. 2020).
Recently, ML and data mining techniques have been successfully applied to the prediction of students' performance in higher education (Aleem & Gore, 2020). These techniques play a vital role in identifying many trends in educational data related to student performance and the teaching-learning process. Aleem and Gore (2020) concluded that there is no single technique that can meet all requirements of the educational system, such as predicting students' academic performance. Instead, limited technologies exist that can be integrated with current e-learning platforms to help students, instructors, and institutions to assess students' performance, particularly at-risk students. Yadav et al. (2012) analyzed ML-based predictive models for student retention assessment. They concluded that decision trees provide better performance and more interpretable output in understanding student retention in educational institutes. Their experimental results demonstrate the effectiveness of ML-based predictive models for predicting student retention with adequate accuracy and identifying at-risk students. Predictive models thus improve the student dropout ratio in educational institutes. However, the dataset was relatively small and geographically restricted to a region in India. The dataset comprises 432 participants and it has been taken from the institute records from 1997 to 2012 with a gap between 2000 to 2009. Such data may not indicate some viable features that may correctly predict student performance. The paper does not explain the full technique in adequate detail. Kolo and Adepoju (2015) used decision trees and ordinal regression approaches using IBM Statistical Package for Social Studies. Mainly the prediction of either a participant will "pass" or "fail is highlighted in this work. To do so, features that were considered are financial status, pass fails status, motivation to learn, and gender. The focus of the work is also to identify factors that affect the performance of the students. The data set is quite small i.e., three years data of second-year data structure course participants. As in the case of Kolo and Adepoju (2015), the dataset is geographically restricted to Nigerian colleges of education.
Similarly, Dhanalakshmi et al. (2016) has explored opinion mining using supervised learning algorithms. ML and Natural language processors are the focus of the research work. The author has mainly provided a comparative analysis of SVM, Naive Bayes, K nearest neighbor and Neural network classifier for binomial classifications and to find the polarity of the students from their provided feedback. The data set here used is from Middle east college, Oman, and is not diverse in nature. Furthermore, it is not clear from the results if the students' progress can be predicted.
The study of Mesarić and Sebalj (2016) involved dividing students into two groups based on their academic performance in high school and during the first year of their current studies. Decision trees were created using various algorithms, and the most effective one was selected based on its classification rate and statistical significance. While the REPTree algorithm had the highest classification rate, it was not as successful at accurately categorizing students from both groups. The most influential factors in the study were the total points earned on a state exam, points earned in high school, and points earned on a Croatian language exam.
The current ML approaches suggest that there is a positive connection between student engagement and performance. The recent work of Moubayed et al. (2020), personalizes the learning experience and strives to engage the students and keep them motivated in order to retain them. The k-means algorithm is used to cluster students based on twelve engagement metrics. Furthermore, two main categories are established which are based on the interaction of the students and their respective efforts. Qualitative analysis is performed in order to identify those students who may need help, or otherwise, they may be considered at risk of dropout.
With the availability of massive data from online learning platforms, graph-based techniques are a promising method of analyzing data structures and identifying correlations in terms of nodes and edges. Here, the nodes define unlabeled and labeled data sets, whereas edges represent the similarity between various nodes (Zha et al., 2009).
In order to cope with the state-of-the-art learning systems, where a massive number of students are enrolled from different parts of the world and the learning experience is different than the conventional method. There is a need for a more advanced and robust system to be deployed. Several researchers have used DL methods to predict student performance and dropout ratios, Karimi et al. (2020) has highlighted the low completion rate of online courses and also relates it to the unconventional teacher-to-student relation, especially in evaluation. They proposed a Deep Online Performance Evaluation (DOPE) and make use of knowledge graphs and advanced graph neural networks to predict the students' performance for each course. The dataset has total participants of 32,953 and each course ranges from 35-40 weeks long. The system uses the features such as distinction, pass, fail, and withdrawn and translate it into their respective interpretation of the system. In the same direction Fei and Yeung (2015) used the long shortterm memory model to extract relevant features from a student questionnaire, video lectures, and problems. They used a very reliable dataset from Coursera 1 and Edx. 2 The research work mainly deals with dropout prediction or identifying students at risk of being dropped out. Dropout prediction has been approached using simple ML methods like support vector machines and logistic regression, which use features related to student activities such as watching lecture videos and participating in forum discussions on a massive online open course (MOOC) platform. However, temporal models using recurrent neural networks with long short-term memory cells have been found to be more effective in predicting dropout, based on experiments conducted on MOOCs offered on Coursera and edX. These results outperformed both baseline methods and other proposed methods by a significant margin. Whitehill et al. (2017) also proposed a neural network system. They strongly suggest that in order to get a fair result the system should be trained on one dataset and tested on a different one. Training and testing the system on the same dataset may increase the accuracy by several percentages. Their dataset is based on 528,349 participants from HarvardX. Feng et al. (2019) proposes Context-aware Feature Interaction Network (CFIN) to model and to predict users' dropout behavior. Experiments on two large datasets show that the proposed method achieves better performance than several state-of-the-art methods. The proposed method model has been deployed on a real system to help improve user retention. The dataset is closer to 700,000 enrollments.
DL representations of data samples have been widely used with knowledge graphs in recent years. A knowledge graph mainly focuses on entities and their associations, as represented in the form of a graph. There has been significant progress in the knowledge graph area specifically, which predicts the strong research interests in the subject area, as highlighted in Luo and Fang (2018) and Lin et al. (2015). Knowledge graphs learn embedded information that can be used in different applications such as association extraction, similarity computation, and link prediction.
Many researchers have integrated DL techniques with knowledge graphs to improve model predictions and classification effectiveness. Knowledge graphs can be integrated with DL methods in two ways. The first is to integrate the semantic information extracted from the graph into DL and ML. In this way, a discrete knowledge graph, represented as a continuous vector of expert knowledge, is applied to the DL method.
Previous reports on the application of knowledge graphs and DL to predict student academic performance, including at-risk students. Another such example includes the work of Gaur et al. (2021), this article discusses how knowledge, represented in a knowledge graph, can be integrated into DL methods using a strategy called knowledgeinfused learning. The use of this approach is demonstrated through a case study in the field of education. By incorporating domain knowledge and a student's historical knowledge state into the model, it becomes possible to trace academic weaknesses back to their root causes, including any underlying pre-requisite concepts that may be impacting a student's performance.
Knowledge graphs have been successfully employed in MOOC platforms (Zheng et al., 2017). They have been used in different education-related domains, teaching and classroom resources, education management, and educational technologies. In terms of teaching and classroom resources, the K12EduKG system has been developed based on knowledge graphs using the K-12 educational subjects (Chen et al., 2018). This system is based on specific educational knowledge from the Chinese mathematics curriculum. The developers of K12EduKG identified knowledge concepts and associations based on probabilistic association rules and a conditional random field model. Su and Zhang (2020) suggested a knowledge graph-based method for accommodating educational big data. Their knowledge graph incorporated an online encyclopedia and subject teaching resources. Similarly, Zhao et al. (2019) used knowledge graphs to build a system called mathGraph. This was initially developed through crowdsourcing to consider dissimilar mathematical objects and their operations. It can be concluded that the combination of ML and DL methods, along with access to large amounts of student data from educational technology platforms, enables accurate predictions of academic performance and identification of at-risk students. However, current approaches have difficulty extracting relevant features that can understand complex data structures and represent correlations between them. Knowledge graphs, on the other hand, are able to effectively extract these correlations and can be integrated with ML and DL methods to improve performance prediction. Although there has been limited research on using advanced techniques such as ML, DL, and knowledge graphs to identify and predict the academic performance of at-risk students, the potential benefits of integrating these approaches make it a promising avenue for further study.

Research objectives
This study aims to integrate graph theory with a prediction system to improve the accuracy of students' performance predictions and help identify hidden structures and similarities between different student behaviors. It is anticipated that the proposed solution will be of benefit to universities, as it will enable them to accurately predict performance and subsequently implement remedial plans to address the factors associated with low performance and dropout, thereby maintaining their reputation for delivering academic excellence. To address the main goal, we formulate the following tasks: • Incorporate structural correlations between students and extract additional features by converting the tabulated data to graph data. This will allow graph features to be extracted and combined with the original features to improve the ML classification results. • Combine the extracted graph features with a GCN to identify at-risk students.
• Implement graph embedding methods to represent entities, and develop ML models to identify at-risk students with better accuracy.

Research design
The proposed method is designed to identify at-risk students using knowledge graphs and conventional ML methods, as illustrated in Fig. 1.
For the first objective of our framework, we explore possibilities for converting tabulated data into a graph as it is suggested by . We start by building an adjacency matrix for a graph considering students as nodes. First, we explore the distance norms that can be used to represent the link between data points (students). Different distance metrics or norms, such as Euclidean, Manhattan, cosine, correlation, and Chebyshev, are used to calculate the similarity between the data points and generate the graph's edge weights. We extract the new graph topological features (GF) from the preprocessed dataset (see Table 4) and add them to the original dataset (DS). From DS, we use all the features collected prior to the midterm exam (MT). Next, we employ advanced ML techniques to the combined DS & GF features to identify at-risk students as early as possible. Students are classified as Good, At-Risk, and Failed based on their total grades in the course. We employ multiclassification prediction models using stateof-the-art ML methods such as the XGB classifier, LightGBM, SVM linear, ExtraTrees, Random Forest, and multilayer perceptron. Five-fold cross-validation is used to generalize the true error rate at the population level. Feature selection methods are applied to find the most informative features based on the model output. Finally, the performance of the proposed models is compared using different sets of input features (historical data only, mixed data including historical data and performance in course) with and without the newly calculated topological features.
For the second objective, we combine the extracted graph features with the original features and use a GCN to identify at-risk students. The GCN is a type of convolutional neural network that can work directly on graphs and take advantage of their structural information. The core of the GCN model is the graph convolution layer. This layer is similar to a conventional dense layer, augmented by the graph adjacency matrix A to use node information, such as the topological information created in the first step. In conjunction with the GCN, graph topological features are expected to provide more meaningful information, leading to highly accurate node classification.
For the third objective, we construct a knowledge graph from the dataset and extract the graph embedded features to further improve the performance of the classification models. First, we propose (subject, predicate, object) tuples to create a semantic network of the students. From the knowledge graph, we extract the complex relations among features and efficiently store the entities and their relationships. We utilize the Neo4j database to store and process the created knowledge graph. To improve the classification accuracy, we then generate embedded vectors from the graph nodes using the node2vec algorithm. Finally, we combine the extracted embedded vectors with the original and topological features extracted earlier, and feed them to the ML classification models.

Data collection and data preprocessing
For this study, educational data were collected from a variety of sources, including the Banner system, which contains information about students, instructors who taught programming courses, and documents manually extracted from the Ministry of Education's portal. The main data used in this study are related to programming courses offered at the College of Information Technology (CIT) at United Arab Emirates University (UAEU). These courses are a requirement for graduation at the university for CIT students. The courses may be taken as electives by students from other colleges. The data presented here represent student performance in programming courses (fall and spring) from academic year 2016/2017 until 2020/2021. We added demographics, course registration, and campus information to the data. Before data analysis and classification, there were 730 records and 44 features in the original dataset. After removing inconsistent rows and features using univariate feature processing, the final dataset consisted of 649 samples and 38 features. The courses were not directed or specially designed for the experiments described in this paper. We constructed different non-overlapping datasets based on the features of the data. One dataset used in this study includes records of 230 students enrolled in "Object-oriented programming. " We collected data on students' prior performance and demographics, as well as their enrollment. Figure 2 shows the features of the dataset.
The data preprocessing included six phases. First, the course assessment files, student data (from Banner), and manually extracted documents were synthesized. Second, the compiled data were cleaned to remove any unnecessary entries. Third, the data were homogenized (structure unification) to remove inconsistencies in file structures due to different instructors teaching courses. In the next step, missing data values were treated using an imputation technique in which the average values of coursework components were assigned to missing entries. Following data aggregation, it was necessary to apply standardization to convert the categorical data to numerical values, integrate the data into the same CSV file, and normalize the data by applying min-max normalization (rescaling to [0, 1]). As a final step, we added a column based on rules and significant milestones in student performance. Using their total grade (TG) performance, we divided the students into three main categories: Good ( TG ≥ 70% ), At-Risk ( 60% < TG < 70% ), and Failed ( TG ≤ 60% ). To ensure the generality of our model, we used other datasets to validate our findings. Dataset details can be found in Table 1.

Knowledge and graph representation
Knowledge graphs can be developed automatically using ML and graph mining techniques, eliciting new insights into a particular area. Knowledge graphs reveal information in structures and eliminate data abstraction, making it easier to understand a given field. This research presents a simple, but extremely accurate, way of converting tabulated data into graphs and graph features, allowing significant improvements in ML classification. The graph features can help identify hidden structures between different student behaviors that are hidden to standard classification algorithms. Let us consider a dataset D that consists of n students s i , i = 1, n . Every student st i is represented by m attributes F i , A i ∈ {checkpoints, historical features}, i = 1, m (see Fig. 2 for more details). All attributes are defined and can be represented as continuous or categorical values, as listed in Table 2. Students are graded according to the course or program requirements. We use the total grade in the course to categorize students as Good, At-Risk, or Failed. Therefore, g i ∈ C , C={Good, At-Risk, Failed}, i = 1, n.
The data representation (features) input to ML algorithms have a significant impact on the classification performance. Additional features may improve the model's outcome. We propose to capture the topological relations between students by using a graph representation of the data. We consider our data as a set of points in m-dimensional space. A measure of proximity for any two data points s i and s j is the distance between them, calculated by one of the formulae listed in Table 3. Using these formulae, we can convert the dataset into a graph by creating an adjacency matrix. In this instance, the adjacency matrix A = {a i,j : a i,j = d i,j ∈ R, i, j = 1, n} is representative of a weighted graph G having n nodes. To create an unweighted graph, we clip the adjacency matrix cells with the threshold threshold i = 1 n n j=1ȧ i,j .  We propose an approach that uses topological features extracted from a graph representation of the dataset. These topological features can be integrated with original features to enhance the useful graph information and find correlations between instances. Table 4 presents definitions of the topological features used in this research. These features are employed with the ML classification models to improve the accuracy of the classifier.

Employing graph convolutional network
Recently, graph-based DL models, particularly GCNs, have achieved excellent performance compared with several ML-based methodologies in solving complex problems (Zhang et al., 2019). Moreover, GCNs have attained a novel ability to understand graphbased representations and provide robust performance in many complex and unsolved problems (Zhang et al., 2019). This study aims to utilize the power of GCNs with the combination of graph-based topological features to conduct node classification. According to the training idea of Kipf and Welling (2016), the objective of a GCN is to learn the features of a graph G = (V , E) from a description of the graph structure as an adjacency matrix A and a feature descriptor f v for each node v, as summarized in a feature matrix S.  (Newman, 2001) Summation of edges connected to the node 12 Eccentricity (Harary & Norman, 1953) Maximum distance from node v to all other nodes in G 13 Hub (Kleinberg et al., 2011) Number of highly authoritative nodes a node v is pointing to 14 Authority (Kleinberg et al., 2011) Amount of valuable information that a node v carries 15 PageRank (Page et al., (1999) Importance of a node v in the graph G 16 Closeness centrality (Sabidussi, 1966) Time it takes to move from node v to other nodes in the graph G 17 Betweenness centrality (Brandes, 2001) Sum of the fraction of all-pairs shortest paths that pass through a node v 18 Information centrality (Brandes & Fleischer, 2005) Current-flow closeness centrality based on effective resistance between nodes in a network 19 Harmonic centrality (Marchiori & Latora, 2000) Sum of the reciprocal of the shortest path distances from node v to all other nodes in G 20 Eigenvector centrality (Bonacich, 1987) Connectivity or transitive influence of a node v Using the above set of inputs, the GCN produces a unique output Y, which is a l × o feature matrix in which o is the number of output features per node, Y = {y a,b : z a,b ∈ R, a = 1, l, b = 1, o} ). The graph-level outputs can be enhanced by adding pooling layers (Duvenaud et al., 2015). Thus, each of the neural network layers can be described as a nonlinear function of the form where P (0) = S , P (Q) = Y , and Q is the number of layers. Thus, models differ only in terms of the chosen f (·, ·) and its parameters. Hence, the GCN is exploited to model the binary classifiers. The (processed) graph and the relevant features are used to assemble the adjacency matrix A, feature matrix S, and degree matrix Kipf and Welling (2016). Moreover, an identity matrix I is included in A to establish the self-connections Ã = A + I . Then, the output matrix is normalized by exploiting the degree matrix as The degree matrix can be described as � = {φ i,i = n j=1ã i,j }. Thus, the input to the GCN model consists of the feature matrix S and the normalized adjacency matrix A . In this research, we use a four-layer GCN model with weight matrices W 1 , W 2 , W 3 , W 4 . At the start of the training, the weight matrices are initialized with random values between 0 and 1. During the training, these weight matrices are optimized using a backpropagation-based error correction algorithm (the Adam optimization function). Hence, the output of our proposed GCN model can be described as follows: where ϕ(·) is the ReLU activation function and σ (·) is the softmax activation function. The final layer provides the prediction for each node. At the end of every GCN layer, the dropout operation is applied with a rate of 0.3.

Evaluation measures
To assess the quality of the outcomes given by the classification methods, we calculated the sensitivity, specificity, area under the receiver operating characteristic (ROC) curve (AUC), accuracy, and balanced accuracy metrics. A confusion or error matrix was constructed for each predictive model to show how well it distinguished between classes. The ROC curve and its AUC were used to evaluate the performance of the classifiers and summarize the trade-off between the true positive rate (TPR) and false positive rate (FPR) using different probability thresholds. We define The overall accuracy of the model is defined as where TP, TN, FP, FN are the numbers of true positives, true negatives, false positives, and false negatives representing the confusion matrix of the classification model, respectively. All models were trained using k-fold cross-validation. The metrics were calculated for each fold separately, and then the averaged values were used as the final measure.

Experimental results
This section describes experimental work undertaken to improve the classification accuracy by converting tabulated data structures into graph data structures. The use of graphs is expected to capture more significant correlations between instances, which are typically ignored in classification. We evaluate the benefits of adding graph-related features to the original features using conventional ML methods, a GCN model, and knowledge graph embeddings. We conducted experiments with the initial dataset features (DS) only, and then with both DS and the graph topological features GF. These combined features were ensembled with the GCN to achieve superior results. Finally, the graph embeddings were exploited to achieve state-of-the-art performance. The results are described in detail below. In summary, the proposed method of adding graph-related features to the original features produces better results than the current state-of-the-art ML classification using only the original features in the dataset.

ML classification model that integrates graph features with original features to improve performance
We used five different norms to calculate adjacency matrices for the graphs and extract graph topological features (see Table 3). The extracted features were combined with features from the original dataset (DS) and fed to the classification models. The best performance was obtained using the Euclidean and cosine norms. These two metrics were therefore used for further analysis. Table 5 presents the results given by the multi-classification ML methods fed with the original dataset features only and with the graph topological features (GF) combined with DS. We employed six different classifiers to the features from datasets D1, D2, D3. The mean accuracy of dataset D1 increased from 73.0% to 76.7% under the Euclidean metric and from 73.0% to 74.8% under the cosine metric. Similarly, the mean AUC for D1 increased from 0.894 to 0.913 and to 0.898 for the Euclidean and cosine metrics, respectively. For dataset D2, the mean accuracy increased from 75.8% to 79.5% and to 82.5% under the Euclidean and cosine metrics, respectively, while the mean AUC increased from 0.907 to 0.925 and to 0.942, respectively. The accuracy of dataset D3 remained the same at 88.8% under the Euclidean norm, but increased to 91.0% when using the cosine distance metric. Finally, the mean AUC for D3 decreased from 0.962 to 0.960 with the Euclidean norm and increased from 0.962 to 0.966 with the cosine norm. Hence, the prediction performance typically improves with the addition of GF to the dataset; the exception is for dataset D3 with the Euclidean distance metric. For the next experiments, we focused on dataset D2, as this contained historical and checkpoint features. When sufficient information about the students is included, the (5) Accuracy = TP + TN TP + TN + FP + FN , prediction results are expected to improve. As a result, Table 6 describes the classification results when historical features are added to D2. That is, D2 now consists of • checkpoints, • historical features, • graph topological features (see Table 4).
From Table 6, the mean accuracy improves overall, even in the case where only DS is used, i.e., 84.5% accuracy. Moreover, the accuracy of DS + GF increases from 84.5% to 87.3% for the cosine metric and from 84.5% to 84.7% for the Euclidean metric. When DS is combined with GF, the mean AUC increases from 0.948 to 0.957 for the cosine metric and from 0.948 to 0.967 for the Euclidean metric. Thus, the overall result of adding GF to DS with historical checkpoints is a significant improvement in classification accuracy.

Improving the classification model performance using a GCN
The next set of experiments used the D2 dataset with historical features included and GF added. We employed a GCN to evaluate the classification performance. The overall prediction results improve significantly. Figure 3 shows that, with the GCN model, the accuracy increases from 87.3% to 88.2% for the cosine metric and from 84.7% to 85%  Table 7 provides an overall comparison of the models. The GCN model trained on a combination of DS and GF features provides a strong classification model for predicting at-risk students at the early stages of a course.

Entity representations and graph embedded vectors
By extracting knowledge from the acquired data, a student-related knowledge graph can be constructed. The main task is to construct an abstract ontology. For this, we use student dataset D2, as it contains both historical and checkpoint features. In dataset D2, we define 14 entities and 17 relationships. Table 8 defines each of these 14 types. The relationships between the entities are shown in Fig. 4.
The knowledge graph utilizes a graph structured data model or topology to integrate data. The general structure of the knowledge graph consists of a network of entities, their semantic types, properties, and relationships. In this research, the relationships among these entities are defined (i.e., a student is enrolled on a course, that student is taught by a particular instructor, has a particular high school GPA, and the total grade is linked to these checkpoints). The knowledge graph extracts the complex relations among features. To efficiently store the entities and their relationships, we utilize the Neo4j database. Neo4j is a high-performance NoSQL graph database, and is an embedded disk-based JAVA persistence engine that supports massive data storage and rapid graph enquiries. The Neo4j database uses a corresponding key-value pair structure to store entities and relations, which can then be visualized using different queries. To improve the classification accuracy, we generate the embedded vectors from the graph nodes using the node2vec algorithm. These graph embedding features are then integrated with the graph features and the results are analyzed. By combining the graph embedding (GE) features, we can observe how the performance of the employed models improves compared with the results of the basic DS and GF features (see Table 7). With the cosine metric, there is a 3.43% boost in accuracy and a 2.3% boost in AUC. Similarly, the Euclidean norm produces a 2.13% improvement in accuracy and a significant 2.83% boost in AUC. Therefore, our proposed method of combining the initial dataset features with graph topological features and graph embedding features substantially boosts the overall performance, indicating that these features are essential for accurate predictions of student performance and should not be neglected. The performance may be further enhanced by employing dimensionality reduction or feature selection techniques.

Discussion and future work
Traditional feature engineering techniques in ML can be replaced by powerful representation learning methods. In recent years, representation learning on graphs has steadily improved on tasks like node classification and connection prediction for graph-structured data (Hamilton et al., 2017). Current studies suggest that topology Academic_Standing A student's academic standing refers to their current status. They either have a good academic standing or they are on probation is a promising approach for improving classification performance (Bhatti et al., 2018;Dey et al., (2017;Hofer et al., (2017). Our work supports this statement, as topological features and graph embeddings have significantly improved the performance of classification models (see Table 7). The conversion of tabulated data to graph data as a means of extracting valuable features and improving the performance of a classification model was also conducted by . The authors used the scalar product to extract the relations between nodes. Following their approach, we tested various distance norms. We hypothesized that educational datasets contain information about student performance, and so a distance metric may adequately catch hidden links between two instances (students). The proposed method was tested on three educational datasets and produced superior results. Furthermore, a comparison was made between various settings of input features, such as DS alone, DS + GF, and DS + GF + GE . The resultant model was able to classify the students into classes of Failed, At-Risk, and Good. With inputs of DS and DS + GF , the model reached AUC scores of 0.948 and 0.964, respectively. Similarly, the accuracy increased from 84.5% to 87.3% with the addition of GF to DS. Adding GF improved the performance by 2.019% in terms of AUC and 3.261% in terms of accuracy. Moreover, the GCN model incorporating graph topological features enhanced the prediction performance by 0.5% in terms of accuracy and 0.9% in terms of AUC for the cosine metric, and by 3.7% in terms of accuracy and 2.4% in terms of AUC for the Euclidean norm.
The proposed model identified at-risk students with 87.4% accuracy and 0.970 AUC under the cosine matric when graph embedding features were added. Similarly, adding graph embedding features resulted in 86.3% accuracy and 0.975 AUC under the Euclidean norm. The proposed solution may serve as a tool for the early detection of at-risk students. This will benefit universities and allow them to make better predictions of performance, thus improving their effectiveness and reputations. Overall, the proposed model can be applied to track students' performance. This may provide decision-makers and instructors with feedback about at-risk students failing a course, allowing stakeholders to decide the responses that may augment the final outcomes of the course.
We conducted a comparative analysis of different baseline methods reported in the literature with the one proposed in this paper. Our method achieves superior results to these state-of-the-art approaches. Starting from the random forest model developed by Mubarak et al. (2022), which has an accuracy of 79.00%, the SVM offers a slight improvement of 79.10%. Mubarak et al. (2022) also developed a graph neural network that achieved 84.00% accuracy. However, our proposed method boosts the accuracy to 84.50% using only the initial dataset features. When the initial graph features are combined with graph topological features, the accuracy increases to 87.30%, which is a significant boost. Finally, with our proposed ensemble method of GCN with DS and GF, the accuracy is further improved to 88.20%. We have also achieved a significant boost in terms of AUC; however, as the baseline methods only provided results in terms of accuracy, we simply claim that our proposed method achieves state-of-the-art accuracy that outperforms current methods.