Skip to main content
  • Research article
  • Open access
  • Published:

Students’ perceptions of using ChatGPT in a physics class as a virtual tutor


The latest development of Generative Artificial Intelligence (GenAI), particularly ChatGPT, has drawn the attention of educational researchers and practitioners. We have witnessed many innovative uses of ChatGPT in STEM classrooms. However, studies regarding students’ perceptions of ChatGPT as a virtual tutoring tool in STEM education are rare. The current study investigated undergraduate students’ perceptions of using ChatGPT in a physics class as an assistant tool for addressing physics questions. Specifically, the study examined the accuracy of ChatGPT in answering physics questions, the relationship between students’ ChatGPT trust levels and answer accuracy, and the influence of trust on students’ perceptions of ChatGPT. Our finding indicates that despite the inaccuracy of GenAI in question answering, most students trust its ability to provide correct answers. Trust in GenAI is also associated with students’ perceptions of GenAI. In addition, this study sheds light on students’ misconceptions toward GenAI and provides suggestions for future considerations in AI literacy teaching and research.


Generative Artificial Intelligence (GenAI) has overhauled the landscape of educational practices. GenAI is a subclass of machine learning (ML) algorithms that can learn from text, images, audio, and video to produce new content based on trained data (Kasneci et al., 2023). Unlike other supervised algorithms, known as conditional models, GenAI produces artifacts with a wide variety and complexity. GenAI increased its prominence in the zeitgeist of the world in November of 2022 when OpenAI released the third major version of their chatbot, Chapt GPT (GTP-3). The release shocked the world with its capability to produce human-like text and conversations (Hu, 2023); in just two months the platform gained 100 million users and generated a plethora of headlines worldwide. GPT (Generative Pre-trained Transformer) models are trained using a large amount of publicly available online digital content. The data used to train the GPT-3 model came from various sources: the Common Crawl dataset, the expanded WebText dataset, two internet-based books corpora, and English Wikipedia (Brown et al., 2020). Since ChatGPT models were trained based on a large corpus of text data with a complicated language model (more than 175 billion parameters), ChatGPT can comprehend human language and respond to complex and varied prompts meanwhile maintaining contextual coherence in conversations (OpenAI, 2022).

Due to its ability to perform a wide range of tasks, educators have suggested that ChatGPT can be used as a tool to support teaching across a wide range of subjects, including programming (Sun et al., 2022), engineering (Qadir, 2023), journalism and media (Pavlik, 2023), nursing education (O’Connor & ChatGPT, 2023), and business (Alshater, 2022). Beyond subject-specific applications, ChatGPT has been proposed to be able to support teachers in creating syllabi and curricula, be used for flipped classrooms (Lo, 2023), and support adaptive learning, automated essay grading, and personalized tutoring (Baidoo-Anu & Ansah, 2023). Despite its competence in supporting teaching and learning, a literature review has revealed varying performance levels of ChatGPT across different subjects. Notably, it has demonstrated outstanding performance in economics, and satisfactory performance in programming, but falls short of expectations in mathematics (Lo, 2023).

Current knowledge about how students perceive GenAI and how it can be used for teaching and learning remains limited. It is imperative to examine GenAI from the student’s perspective to understand how and what pedagogical solutions are needed to minimize the challenges that GenAI introduces while maximizing its potential for teaching and learning. In this study, we particularly tested one GenAI—ChatGPT—in an authentic physics class to understand student perceptions toward GenAI. We implemented ChatGPT in the classroom by utilizing its tutoring assistant potential as suggested by Baidoo-Anu and Ansah (2023). In STEM education, an instructor usually needs to teach a substantial number of students with various levels of proficiency and understanding (Karabenick, 2003). Consequently, students’ questions are more likely to be left unresolved leading to increased confusion. In this study, we are particularly interested in investigating how students perceive ChatGPT as a virtual tutor. To address this inquiry, we aim to answer the following research questions:

  1. 1.

    What is the accuracy of ChatGPT for addressing physics problems?

  2. 2.

    Do students’ trust in ChatGPT’s answers differ by the accuracy of the answers?

  3. 3.

    To what extent do students’ trust level influence their perception of ChatGPT?


AI in education

AI has been widely used in education for various purposes prior to the emergence of ChatGPT. Many intelligent tutor systems have been developed to monitor students’ learning processes, analyze their performance, and provide immediate personalized instructions and feedback. Some dialogue-based Intelligent tutor systems, such as AutoTutor, not only track students’ knowledge status and engage them in adaptive conversations but also detect and respond to students’ emotional states, such as confusion or frustration (Graesser, 2016). In addition to intelligent tutor systems, AI has played other roles in helping students learn. For instance, following the learning-by-teaching paradigm, AI has been used to simulate virtual teachable agents or tutees, enabling students to act as “tutors” to enhance their learning by teaching the AI agent, as exemplified by the SimStudent chatbot (Matsuda et al., 2013). AI has also been employed to assist teachers in grading and assessing students’ homework. For instance, automated writing evaluation (AWE) systems, such as the Writing Pal (W-Pal), were developed to alleviate teachers’ workload by helping them evaluate essays and generate feedback (McNamara et al., 2013).

The latest breakthrough in AI, particularly ChatGPT, represents a critical advancement of GenAI. Unlike previous conversational AI chatbots, ChatGPT is built on deep learning and can learn and generate human-like responses (Sahoo et al., 2023). What makes it more distinguishable from other GenAI is its unique ability to not only provide a response but also generate related content based on subsequent questions and prompts derived from its initial responses (Sun et al., 2022). It has been used as a writing assisting tool to aid students, especially English as a second language (ESL) students, in receiving feedback on their writing (Su et al., 2023). ESL instructors also use ChatGPT as a proofreading tool to help students improve the grammatical accuracy of their writing (Schmidt-Fajlik, 2023). Beyond language support, ChatGPT has served as a pedagogical tool to foster students’ critical thinking skills in a physics class (Bitzenbauer, 2023). Accordingly, many discussions have been held on its potential to transform traditional teaching practices (e.g., Adiguzel et al., 2023; Baidoo-Anu & Owusu Ansah, 2023).

Despite the widespread integration of AI in education, studies have shown that students hold mixed attitudes toward AI. On one hand, students acknowledge the benefits of AI and believe it can be used as a learning tool to provide personalized and immediate learning support (Chan & Hu, 2023). They also recognize the potential impact of AI on their disciplines and future careers (Bisdas et al., 2021; Buabbas et al., 2023). On the other hand, research has revealed students’ concerns about AI, including its accuracy and transparency (Chan & Hu, 2023), ethical considerations (Gillissen et al., 2022), and the potential for job displacement (Gong et al., 2019). Understanding students’ perceptions of AI in the educational context is essential for the effective integration of AI tools and technologies in STEM education.

Accuracy and accountability of AI

GenAI models, such as ChatGPT, are trained on the existing datasets and heavily rely on the information they receive (Chatterjee & Dethlefs, 2023). It is crucial to recognize that the objectivity and accuracy of artificial intelligence are contingent on the quality of the training data (Mhlanga, 2023). If the training data contain any biases, factual distortions, or misleading information, AI could perpetrate those biases and misinformation by generating discriminatory and false results. Even worse, due to the linguistic proficiency of GenAI, it can disguise the misinformation as truth or facts and make them appear scientifically reasonable (Sallam, 2023). Another factor that worsens this situation is the lack of transparency on how AI selects and analyzes data. ChatGPT has been reported as being unable to correctly choose the references to support the information and answers provided (Lim, et al., 2023). This lack of clarity makes it challenging for users to assess the trustworthiness and objectivity of the information provided.

Moreover, since most of the AI models are based on algorithms to select and process data, there is a risk that these algorithms may prioritize data from specific demographic groups or cultures, leading to skewed data and potential biases. A notable example is highlighted by Buolamwini and Gebru (2018), who found that facial recognition algorithms demonstrated imbalanced recognition capabilities across genders and skin tones, with darker-skinned females being the most misclassified group. AI biases extend beyond demographics and cultures. Ferrara (2023) reviewed previous literature on AI biases and identified four additional types of biases: linguistic biases, temporal biases, confirmation biases, and ideological and political biases. Linguistic biases occur when the AI algorithm focuses only on data in English or a few other dominant languages, potentially resulting in decreased proficiency in detecting and analyzing non-native English speakers’ writing. Temporal AI biases arise when the algorithm only focuses on data from limited periods. Confirmation AI biases refer to the tendency of AI algorithms to provide information that aligns with users’ own beliefs or perspectives while excluding alternative perspectives. Lastly, ideological and political biases occur when AI algorithms favor only certain political perspectives or ideologies. Understanding and addressing these biases are critical for ensuring fair and unbiased AI systems.


Despite the potential weaknesses, flaws, and biases of AI, students may encounter challenges in recognizing these issues and could overly rely on or blindly trust AI for important decisions or interactions. Even worse, students may hold misconceptions about AI, such as the belief that “AI is infallible and can be 100% objective” (Bewersdorff et al., 2023, p.9), or the notion that “AI is a human mind but in a machine” (Mertala et al., 2022, p.6).

One factor contributing to these misconceptions is anthropomorphism, which involves attributing human-like characteristics to AI, such as feelings, mental states, or behavioral characteristics (Airenti, 2015; Epley et al., 2007). On one hand, anthropomorphism increases students’ perceived social connection with AI and their willingness to adopt AI technology (Cao et al., 2022). On the other hand, anthropomorphism may mislead students into believing that AI systems are both trustworthy and capable of performing any task.

Particularly, “Warmth” and “Competence” are two perceived anthropomorphic features of AI that may cause this misconception. “Warmth” refers to the “perceived friendliness, helpfulness, sincerity, trustworthiness, and morality” (Pizzi et al, 2023, p.1375) of AI systems. A kind and caring AI system can increase emotional trust in AI (Aaker et al., 2012). Due to perceived “Warmth”, people are more likely to establish and maintain effective connections with the AI system (Gonzalez-Jiminez, 2018). “Competence” refers to the perceived problem-solving abilities of AI. A highly intelligent or competent AI system increases people’s rational trust and makes them believe that AI can truly help them achieve their goals. It is worth noting that both “Warmth” and “Competence” influence people’s trust in AI as well as their skepticism toward AI (Pizzi et al., 2023). These anthropomorphic features may lead to the misconception of a “super AI”—an AI that possesses human consciousness and can automatically solve problems in any area (Kaplan & Haenlein, 2019).

In this study, we explore students’ perceptions of ChatGPT used as a virtual tutor in a physics class and particularly their perception and trust in using ChatGPT for answering their questions. Understanding these dynamics can better assist researchers and educators in developing strategies and educational interventions for preparing students to face the prevalent use of GenAI.



The study took place in an introductory college-level physics class at a public university located in the south of the United States. A total of 40 students enrolled in the class agreed to participate in this study, of which 36 were self-identified as female (90%), 3 were male (8%), and 1 was non-binary (3%). The majority of participants were Caucasian/White (n = 29, 73%) followed by African American/Black (n = 9, 23%) with 1 Asian, and 1 Native American. The average age was 21, ranging from 19 to 38. Nearly all participants had no prior experience using ChatGPT except three participants who had used it for testing or occasional use.


Approval to conduct this study was granted an exemption from the Institutional Review Board (IRB) of the university where the study was carried out before its implementation. The class was taught in a 16-week semester covering three physics concepts with each concept being assessed by an exam. This study was implemented for the second exam consisting of 50 multiple-choice questions measuring participants’ understanding of light, radioactivity, and related information. The participants took the exam in an in-person class session and the responses were graded as correct or incorrect. The participants then had a week and a half to complete a makeup exam assignment to regain lost credits from the exam by “chatting” with ChatGPT. After they completed the assignment, the participants were instructed to complete an end-of-study survey asking about their experience in learning with ChatGPT for this assignment. To prohibit any misconceptions about AI or physics that might be introduced by ChatGPT used in this study, at the end of the study, the instructor reviewed commonly incorrectly answered questions and asserted that AI, particularly ChatGPT, can be error-prone.


Make-up exam assignment The makeup exam assignment was designed as a fillable form and a maximum of 10 questions were allowed for the participants to make up for their exam. Appendix A shows a fillable form for one question. The first page of the assignment provided a 4-step instruction on how to complete the assignment including how to create an account for ChatGPT and how to complete the assignment for each question that they answered incorrectly in their exam (Fig. 1). The participants were also informed that ChatGPT is an Artificial Intelligence system so they can make mistakes and the participants were specifically requested to think critically about the answers provided by ChatGPT. The exam key was released prior to the assignment of the make-up exam.

Fig. 1
figure 1

Instruction on how to complete the makeup exam assignment

For each question, the participants needed to follow a specific order when “chatting” with ChatGPT; however, they also had a certain level of flexibility. As shown in Fig. 2, first, the participants were asked to ask the original question to ChatGPT and get an answer from it; once ChatGPT provided an answer, the participants needed to check if their original answers (i.e., incorrect answers) were the same as ChatGPT’s answers. If the answers were consistent, the participants were asked to decide if they agreed or disagreed with ChatGPT’s answers and provide the rationales for their decisions; if the answers were inconsistent, the participants would need to tell ChatGPT their original answers to the questions (i.e., incorrect answers) and then would need to again decide if they agree or disagree with ChatGPT’s responses, once again providing the rationales for their decisions. Aside from the provided prompts, students were allowed to ask any questions that they had for ChatGPT during the entire conversation.

Fig. 2
figure 2

Flowchart of how students interact with ChatGPT

Survey The end-of-study survey was administered online via Qualtrics. The survey consisted of 13 Likert-scale items for 3 subscales, with responses ranging from 1 (strongly disagree) to 5 (strongly agree). Demographic information was also collected through the survey. At the end of the survey, the participants were required to draw a visual representation of their perception of ChatGPT. Drawing as a research method has been extensively employed to comprehend students’ perceptions of scientists and has proven to be an effective approach that surpasses the limitations of verbal communication and provides in-depth insights (Finson, 2002).

Table 1 shows the measured concepts of the Likert-scale items, the number of items for each subscale, a sample item from each scale, and Cronbach’s α of each subscale. The subscales of perceived usefulness and perceived ease of use were from Davis’s (1989) scales and were slightly modified to fit the current study. The items measuring participants’ continuous intention to use ChatGPT in the future were modified from items developed by Davis (1989) and published in Falode’s (2018) study. The reliabilities for the subscales used in this study were all at an acceptable level (Hair, 2009) and ranged from 0.851 to 0.931.

Table 1 Subscales of the survey and the corresponding reliabilities

Analyses and results

A total of 362 questions composed of 46 unique questions were asked to ChatGPT by the 40 participants. The overall accuracy of the answers provided by ChatGPT was 85%, with 307 questions correctly answered and 55 questions (15%) incorrectly answered. The participants agreed to a total of 325 answers consisting of both incorrect and correct answers. Among the incorrectly answered questions, the participants agreed with 30 answers without criticizing their accuracy. The participants were encouraged to ask additional questions beyond the default prompts, resulting in a total of 209 additional questions. Interestingly, ChatGPT occasionally modified its original answers when faced with further inquiries. The participants reported 41 events when ChatGPT changed its answers, including changing incorrect to correct answers (n = 7) and changing correct to incorrect answers (n = 34).

Participants’ trust in ChatGPT was calculated by the percentage of the questions that they agreed upon given the total number that they asked. To differentiate the trust levels of ChatGPT in the participants, a K-Means clustering analysis was performed in Python 3.11 and the Elbow method was adopted to determine the most optimal number of clusters in the K-Means method. The elbow method is one of the most widely used methods to determine the number of clusters that should be set in K-Means (Hancer & Karaboga, 2017). It plots the sum of squared errors (SSE) for different values of k and the SSE declines when k increases. The value of k at which improvement in SSE declines the most is called the elbow point. The optimal number of clusters will be set to the elbow k. As shown in Fig. 3, the elbow was calculated as 3 in this study. Therefore, participants’ trust in ChatGPT was categorized into 3 levels.

Fig. 3
figure 3

The elbow point for the clustering analysis of the participants’ trust levels

The clustering analysis revealed that the participants fell into a pattern that some of them agreed with ChatGPT’s answers entirely (Trust group), some of them partially agreed with ChatGPT’s answers (Partial Trust group), and a few of them substantially disagreed with ChatGPT’s answers (Distrust group). To investigate if their trust levels in ChatGPT’s answers were dependent on the accuracy of the answers, a statistical analysis using ANOVA (Analysis of variance) was carried out by SPSS 27. Prior to the test, the accuracy of the answers was normalized by using the percentages of questions correctly answered by ChatGPT and the total number of questions asked. All assumptions were checked and met. The ANOVA was significant, F (2, 37) = 6.383, p = 0.004, η2 = 0.257. A post hoc Tukey HSD test indicated that the accuracy of ChatGPT’s answers of the Distrust group was significantly lower than that of the Partial Trust group (p = 0.010) and the Trust group (p = 0.003). However, there were no significant differences in answer accuracy between the Partial Trust group and the Trust group (p = 0.671). Table 2 shows descriptive statistics for the participants’ trust level and the accuracy of ChatGPT’s answers. Even though the accuracy of ChatGPT’s answers was only 82% correct, the participants in the Trust group agreed to the answers 100% of the time. On the other hand, the accuracies of the ChatGPT’s answers were slightly higher than the agreement levels in the Partial Trust group and Distrust group.

Table 2 Descriptive statistics for the accuracy of ChatGPT’s answers at trust levels

To assess if there were differences in the perception of ChatGPT among the three levels of trust groups, a one-way MANOVA (Multivariate analysis of variance) was performed to determine if the three groups differed in the perception of ChatGPT in terms of perceived usefulness, perceived ease of use, and intention to use in the future. Assumptions of MANOVA were checked and were met. A significant difference was found in the perception of ChatGPT based on trust levels, F (6, 70) = 2.478, p = 0.031; Wilk’s lambda = 0.680, η2 = 0.175. A post hoc Tukey HSD test revealed that the Trust group perceived ChatGPT as significantly more ease of use than the Partial Trust group (p = 0.026), and a borderline significant difference was found between the three groups in terms of continuous intention to use in the future (p = 0.059) with the Trust group showed a higher likelihood of intention than the Partial Trust group. Table 3 shows descriptive statistics for the participants’ perception of ChatGPT in terms of trust levels.

Table 3 Descriptive statistics for the perception of ChatGPT’s answers at trust levels

Unfortunately, only 28 participants submitted their drawings, and one of the files lost readability and could not be opened. Therefore, 27 drawings were included in the analysis. A deductive analysis approach was adopted (Bingham & Witkowsky, 2022). That is, drawing from the literature on the anthropomorphism of AI as well as viewing the participants’ drawings repeatedly, the first author of this study developed a coding scheme that mainly focused on two aspects of the drawings—perceived humanness and perceived warmth (Belanche et al., 2021). Perceived humanness was coded as Human and Robot/Machine; perceived warmth was coded as Positive, Negative, and Neutral/No Expression. Each category in the coding scheme was provided with a detailed description and this coding scheme was employed by a second researcher to code the participants’ drawings. An interrater reliability analysis using the Kappa statistic was performed to determine consistency between the two researchers. The reliability of perceived humanness yielded to be 0.723 (p < 0.001) and the reliability of perceived warmth was 0.875 (p < 0.001). Both were at substantial agreement levels (Landis & Koch, 1977). The two researchers then met to address any inconsistencies in the codes until an agreement was reached.

Overall, as shown in Table 4, the majority of the participants perceived ChatGPT to be a Robot/Machine and to be either positive or neutral. Two Chi-square analyses were carried out to assess if there was any correlation between trust levels, perceived humanness, and perceived warmth. Due to the violation of assumptions for Chi-square tests, we adopted the Likelihood Ratio instead of Pearson Chi-square statistics as recommended by Field (2009). It appeared that the trust levels of ChatGPT significantly correlated to the perceived humanness and that the majority of the participants in the Trust group perceived ChatGPT as more of a Robot/Machine than Human compared to the other two groups, χ2 (2, N = 27) = 7.37, p = 0.025.

Table 4 Descriptive statistics for the drawing analysis

Other than perceiving ChatGPT as a human or robot, several other misconceptions found in people’s understanding of other types of AI systems also appeared in the participants’ drawings. First, some participants directly or indirectly indicated in their drawings that they believed ChatGPT to be a know-it-all supercomputer or a magic entity. Most of this misconception appeared in those who perceived ChatGPT as a robot or a machine in the Trust group. Figure 4a shows a drawing of a participant who directly indicated that they believed ChatGPT was a know-it-all machine and Fig. 4b is a drawing of a magic eight ball which usually represents a fortune-telling sphere. Although the drawing in Fig. 4c shows a robot, the wizard hat that the robot is wearing implies a magic power. Another misconception surfaced in our data is “AI has a brain.” For example, a participant submitted a picture of a laptop containing a human brain (Fig. 4d) which could be an indication that ChatGPT is a human intelligence possessed by machines. Finally, some participants drew pictures of a search engine or a Siri device (Fig. 4e), which can imply a misconception that “all AI systems are made the same way.”

Fig. 4
figure 4

Students’ drawings/pictures of their perceived ChatGPT


This study explores students’ perceptions toward using ChatGPT as a virtual tutor in a physics class. Our findings carry similarities and differences relative to previous research on other AI-based systems. Consistent with the findings found in previous research on accuracy issues of other AI systems (Mhlanga, 2023), in our study ChatGPT only provides 85% accuracy in solving physics questions, particularly for light and radioactivity-related topics. While students are specifically informed that AI can make mistakes and the exam keys are even provided, almost half of the students still agree with all the answers provided by ChatGPT regardless of whether the answers are correct or incorrect. And those students are also found to be perceiving ChatGPT as easy to use and more likely to use it in the future. Previous research has found that people’s perceived ease of use of an AI-based technology would enhance their trust in the AI technology (Qin et al., 2020), our findings suggest that the reverse relationship also exists. Nevertheless, caution is warranted when it comes to employing GenAI as a virtual tutor. Any incorrect information offered by GenAI can lead to misconceptions in learning in the future. Misconceptions have been notoriously known as challenging to correct in learning and once the misconception has been formed, it is very hard to change (Smith et al., 1993). Given the language power of GenAI (particularly ChatGPT) and the lack of transparency regarding how GenAI works, people without AI literacy training are more likely to blindly trust incorrect information provided by AI systems (Lockey et al., 2021). Adding the blind trust of AI systems on top of science misconceptions when using AI systems to teach would make the science misconceptions even more robust and more challenging to address.

In addition, our findings reveal that students hold several misconceptions about ChatGPT that are consistent with those observed toward other AI systems. The first prevalent misconception is the anthropomorphic conception. Many students draw ChatGPT in the form of a human or a machine that has a brain. The ability of ChatGPT to engage in human-like conversations may mislead students into perceiving ChatGPT as human/human-like or possessing human-like cognitive abilities. This misconception has persisted throughout the history of computer advancement and is not only limited to AI (Mertala et al., 2022; Rücker & Pinkwart, 2016).

The second most prominent misconception shown in our data is that students confuse ChatGPT with robots. This misconception could link back to the anthropomorphizing of ChatGPT and AI systems in general as both embody an abstract concept of GenAI as a concrete representation of a human or human-like object. Mass media may also have played a critical role in forming the misconception about ChatGPT as a robot given the portrayal of general AI characters in Western movies such as the Star Wars series (e.g., C-3PO, R2-D2, and BB-8), and Marvel Studios productions (e.g., J.A.R.V.I.S., F.R.I.D.A.Y., and Ultron).

Another misconception that appears in our study is that students seem to believe ChatGPT has super-intelligence and can perform tasks in a “magic” way. This misconception mostly emerges from the Trust group in which students blindly trust ChatGPT’s answers without questioning their accuracy. Previous research conducted with K-12 teachers found that their participants believed that AI-enhanced educational technologies should be perfect and make no errors (Nazaretsky et al., 2021). This result indirectly aligns with our finding that if people believe AI should be perfect then they would trust the AI’s output (i.e., ChatGPT’s answers) uncritically.

Contrary to the findings reported from previous studies, we find that the students who perceive ChatGPT as being human or human-like are primarily from the Partial Trust and Distrust groups, whereas the students who perceived ChatGPT as being a machine or robot are majorly from the Trust group. Previous research has shown that individuals were more likely to trust AI-enhanced technologies when they perceived them as possessing human-like characteristics and exhibiting positive attributes such as warmth and friendliness (Cheng et al., 2022; Glikson & Woolley, 2020). One possible reason that is attributable to the difference between our findings and the findings reported from previous studies could be that those studies were conducted in industry environments and mainly focused on using AI-based chatbots for performing customer service-related functions. In those studies, the ultimate objective for designing the chatbot was to convince customers to purchase products, and therefore the ability of the AI chatbots to establish and maintain effective emotional connections is crucial in building trust (Gonzalez-Jiminez, 2018). Additionally, Sundar and Kim (2019) found that people are more likely to trust machines when they have the machine heuristic mindset that “machines are more secure and trustworthy than Humans” (p. 538). In our study, the fact that students hold a naive belief of ChatGPT being infallible and the lack of judgment can have contributed to their trust in the answers generated by ChatGPT. Furthermore, ChatGPT was used as a virtual tutor, therefore the perceived authority may have also influenced their trust in it. Internet users typically employ the authority heuristic by adopting the viewpoint of a recognized expert in the given field. The authority heuristic can be activated by a communication cue such as showing a certification or references (Liao & Sundar, 2022).


Our findings suggest both pedagogical and design implications. With the development and increased popularity of AI, our findings support the call for teaching AI literacy to students to maximize the potential AI could bring to education, especially GenAI.

First, we find that many students blindly trust ChatGPT’s answers. Therefore, in AI literacy education, we suggest first and foremost improving students’ critical thinking skills in assessing the information from their surroundings. This is not only limited to text information they receive from GenAI such as ChatGPT and Google Bard, but also includes images, audio, and videos (e.g., deep fakes). Students need to be able to critically analyze the information they receive from GenAI and know how to find reliable resources to verify the validity of the information. It is also important for them to be able to recognize artifacts created by GenAI (Michaeli et al., 2023). Second, an intriguing finding from our study is that among the 206 additional questions asked by the students, ChatGPT changed its original answers 41 times. One of the reasons that there is a variety in the answers provided by ChatGPT could be how the students asked their questions. ChatGPT has the unique ability to not only provide a response but also generate related content based on subsequent questions and prompts (Sun et al., 2022). Therefore, prompt engineering is indeed needed in AI literacy education to help students understand how to ask proper questions that yield the desired information. This skill is particularly important when tackling open-ended science problems that do not have definitive answers.

In addition, it seems anthropomorphism, and by extension, equating AI to robots, have been the predominant misconceptions persistently shown across all different types of AI, including GenAI in our study. When designing materials to teach AI literacy, previous pedagogical strategies to address misconceptions in science education, such as conceptual change (Daniel & Carrascosa, 1990), can be adapted to address this misconception in AI. Fourth, some students in our study manifested the mindset that ChatGPT has super-intelligence and is infallible. Therefore, in AI literacy education, students need to develop a warranted judgment through understanding how AIs work and need to be aware that AI, including GenAI, largely relies on the data used for training their algorithms and that AI can make critical mistakes and be very error-prone (Buolamwini & Gebru, 2018). In relation to this finding from our study (i.e., super-intelligence), it is also imperative to allow students to differentiate the concepts of general AI versus narrow AI. The super-intelligence mentality implies the concept of general AI, however, the majority of existing AI systems are tailored to specific domains and have clear boundaries within which they operate effectively, such as utilizing natural language processing (NLP) for answering questions or employing computer vision (CV) to identify individuals by their facial features (Kim et al., 2023). Last but not least, some of the students in our study equate ChatGPT to a search engine or Siri suggesting that the concept of all AI not working the same way needs to be strongly considered when it comes to AI literacy education.

Our results also provide implications for designing GenAI for educational purposes. The most used large language model (LLM) is GPT-3 (Kasneci et al., 2023), and GPT-3 is trained based on online data and not scientifically proven data. That being said, GPT-3 is error-prone and is not entirely reliable when it comes to addressing concepts in sciences. In our study, ChatGPT only performed at an 85% accuracy level and sometimes would change its correct answers to incorrect answers when additional questions were asked.

Furthermore, due to a lack of AI knowledge, many students heavily relied upon the answers provided by the model. This suggests that caution should be exercised when using it directly for teaching sciences. Therefore, in future design and employment of GenAI for teaching and learning, techniques such as explainable AI or interpretable AI can be considered to “white box” how models work and allow students to make warranted judgments. Our findings also suggest that when designing GenAI for facilitating teaching and learning, especially utilizing it as a virtual tutor, designers may want to shy away from making it too anthropomorphic. The anthropomorphic characteristics of ChatGPT may have contributed to the students’ misconception of perceiving it as a human or human-like entity in our study. Once this misconception is formed, it will be challenging to correct it. Considering the fact that a majority of students from the Trust group perceive ChatGPT as a know-it-all or a magic machine or robot, it would be beneficial to avoid using persuasive communication cues in designing GenAI for teaching and learning that could trigger authority heuristics in students, even when the generated information is incorrect.


Several limitations in this study could be improved in future studies. First, the majority of the students enrolled in the study were female and the study was conducted within one physics class. It has been found that gender can play a critical role in individuals’ perceptions of AI. Females are more likely to have less knowledge about AI and to believe in anthropomorphism than males (Ding, et. al., 2023). This may have limited the generability of the results found in this study. Future studies can benefit from investigating a more balanced sample to verify the results reported in this study. Second, more participants who have more experience with AI systems could have been recruited for the study to test if there are any differences in the trust of ChatGPT’s answers between a more experienced group and a less experienced group. Third, we could not interview students for additional questions about their drawings of ChatGPT. Students might have intended to convey specific details or nuances in their drawings that are not immediately apparent through coding alone. Through solely coding students’ drawings, some information may have been lost or the drawing could have been wrongly interpreted by the researchers. Follow-up interviews could be beneficial to confirm the accuracy of our interpretation of students’ drawings and to verify the findings of this study. For instance, the wizard hat on the robot may indicate a misconception that AI has a magic power as we interpreted. Interviews could help confirm the accuracy of our interpretation, clarifying whether the hat actually suggests a magic power or if it is simply a cosmetic element. Finally, students in this study worked individually. Considering the potential value of group discussions, conducting focus groups in the future could offer a broader perspective. Exploring the impact of potential group dynamics might shed light on whether a collaborative setting could make a significant difference in the results, enriching the overall understanding of students’ perceptions and experiences with ChatGPT.


Large language models (LLM) offer many opportunities for assisting in teaching and learning and maintain so much potential for researchers to develop and enhance the models to fulfill future educational needs. In this study, we tested ChatGPT’s performance in learning physics and students’ perception of ChatGPT. ChatGPT was used in an undergraduate-level introductory physics class as a virtual tutor to address questions in an exam that were incorrectly answered by students. ChatGPT provided an 85% accuracy, however, would occasionally change its answers from correct answers to incorrect answers when additional questions were asked and vice versa. Students held several misconceptions of ChatGPT that were similar to those found in the studies conducted with other forms of AI (e.g., anthropomorphism, AI thinks the same as humans, AI has super-intelligence). Almost half of the students trusted ChatGPT’s answers regardless of their accuracy and the majority of them believed ChatGPT was a know-it-all Machine/Robot. Those students also found ChatGPT to be easy to use and more likely to use it in the future compared to the Partial Trust group and Distrust group.

Availability of data and materials

The anonymous datasets used and/or analyzed during the current study are available from the corresponding author upon reasonable request.


  • Aaker, J. L., Garbinsky, E. N., & Vohs, K. D. (2012). Cultivating admiration in brands: Warmth, competence, and landing in the “golden quadrant.” Journal of Consumer Psychology, 22(2), 191–194.

    Article  Google Scholar 

  • Adiguzel, T., Kaya, M. H., & Cansu, F. K. (2023). Revolutionizing education with AI: Exploring the transformative potential of ChatGPT. Contemporary Educational Technology, 15(3), ep429.

    Article  Google Scholar 

  • Airenti, G. (2015). The cognitive bases of anthropomorphism: From relatedness to empathy. International Journal of Social Robotics, 7(1), 117–127.

    Article  Google Scholar 

  • Alshater, M. (2022). Exploring the role of artificial intelligence in enhancing academic performance: A case study of ChatGPT. SSRN Electronic Journal.

    Article  Google Scholar 

  • Baidoo-Anu, D. & Owusu Ansah, L (2023). Education in the era of generative artificial intelligence (AI): Understanding the potential benefits of ChatGPT in promoting teaching and learning. SSRN.

  • Belanche, D., Casaló, L. V., Schepers, J., & Flavián, C. (2021). Examining the effects of robots’ physical appearance, warmth, and competence in frontline services: The Humanness-Value-Loyalty model. Psychology and Marketing, 38(12), 2357–2376.

    Article  Google Scholar 

  • Bewersdorff, A., Zhai, X., Roberts, J., & Nerdel, C. (2023). Myths, mis-and preconceptions of artificial intelligence: A review of the literature. Computers and Education Artificial Intelligence, 100143.

  • Bingham, A. J., & Witkowsky, P. (2022). Deductive and inductive approaches to qualitative data analysis. In C. Vanover, P. Mihas, & J. Saldaña (Eds.), Analyzing and interpreting qualitative data: After the interview (pp. 133–146). SAGE Publications.

    Google Scholar 

  • Bisdas, S., Topriceanu, C. C., Zakrzewska, Z., Irimia, A. V., Shakallis, L., Subhash, J., ... & Ebrahim, E. H. (2021). Artificial intelligence in medicine: a multinational multi-center survey on the medical and dental students’ perception. Frontiers in Public Health9, 795284.

  • Bitzenbauer, P. (2023). ChatGPT in physics education: A pilot study on easy-to-implement activities. Contemporary Educational Technology, 15(3), ep430.

    Article  Google Scholar 

  • Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., … Amodei, D. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 33, 1877–1901.

  • Buabbas, A. J., Miskin, B., Alnaqi, A. A., Ayed, A. K., Shehab, A. A., Syed-Abdul, S., & Uddin, M. (2023). Investigating Students’ Perceptions towards Artificial Intelligence in Medical Education. Healthcare, 11, 1298.

    Article  Google Scholar 

  • Buolamwini, J., & Gebru, T. (2018). Gender shades: Intersectional accuracy disparities in commercial gender classification. In S. A. Friedler & C. Wilson (Eds.), Conference on fairness, accountability and transparency (pp. 77–91). PMLR.

    Google Scholar 

  • Chan, C. K. Y., & Hu, W. (2023). Students' Voices on Generative AI: Perceptions, Benefits, and Challenges in Higher Education. arXiv preprint arXiv:2305.00290

  • Chatterjee, J., & Dethlefs, N. (2023). This new conversational AI model can be your friend, philosopher, and guide... and even your worst enemy. Patterns4(1).

  • Cheng, X., Zhang, X., Cohen, J., & Mou, J. (2022). Human vs. AI: Understanding the impact of anthropomorphism on consumer response to chatbots from the perspective of trust and relationship norms. Information Processing and Management.

    Article  Google Scholar 

  • Daniel, G.-P., & Carrascosa, J. (1990). What to do about science “misconceptions.” Science Education, 74(5), 531–540.

    Article  Google Scholar 

  • Davis, F. D. (1989). Perceived usefulness, perceived ease of use, and user acceptance of information technology. MIS Quarterly, 13(3), 319–340.

    Article  Google Scholar 

  • Ding, L., Li, T., & Turkson, A. (2023). (Mis)conceptions and perceptions of artificial intelligence: A scoping review. Manuscript Submitted for Publication.

  • Epley, N., Waytz, A., & Cacioppo, J. T. (2007). On seeing human: A three-factor theory of anthropomorphism. Psychological Review, 114(4), 864–886.

    Article  Google Scholar 

  • Falode, O. (2018). Pre-service teachers’ perceived ease of use, perceived usefulness, attitude, and intentions towards virtual laboratory package utilization in teaching and learning of physics. Malaysian Online Journal of Educational Technology, 6(3), 63–72.

    Article  Google Scholar 

  • Ferrara, E. (2023). Should ChatGPT be biased? challenges and risks of bias in large language models. arXiv preprint arXiv:2304.03738.

  • Field, A. (2009). Discovering statistics using SPSS. Sage publications.

    Google Scholar 

  • Finson, K. D. (2002). Drawing a scientist: What we do and do not know after fifty years of drawings. School Science and Mathematics, 102(7), 335–345.

    Article  Google Scholar 

  • Gillissen, A., Kochanek, T., Zupanic, M., & Ehlers, J. (2022). Medical students’ perceptions towards digitalization and artificial intelligence: A mixed-methods study. Healthcare, 10(4), 723.

    Article  Google Scholar 

  • Glikson, E., & Woolley, A. W. (2020). Human trust in artificial intelligence: Review of empirical research. Academy of Management Annals, 14(2), 627–660.

    Article  Google Scholar 

  • Gong, B., Nugent, J. P., Guest, W., Parker, W., Chang, P. J., Khosa, F., & Nicolaou, S. (2019). Influence of artificial intelligence on Canadian medical students’ preference for radiology specialty: A national survey study. Academic Radiology, 26(4), 566–577.

    Article  Google Scholar 

  • Gonzalez-Jiminez, H. (2018). Taking the fiction out of science fiction: (Self-aware) robots and what they mean for society, retailers and marketers. Futures, 98, 49–56.

    Article  Google Scholar 

  • Graesser, A. C. (2016). Conversations with AutoTutor help students learn. International Journal of Artificial Intelligence in Education, 26, 124–132.

    Article  Google Scholar 

  • Hair, J. F. (2009). Multivariate data analysis (7th ed.). Prentice Hall.

    Google Scholar 

  • Hancer, E., & Karaboga, D. (2017). A comprehensive survey of traditional, merge-split and evolutionary approaches proposed for determination of cluster number. Swarm and Evolutionary Computation, 32, 49–67.

    Article  Google Scholar 

  • Hu, K. (2023). ChatGPT sets record for fastest-growing user base. Reuters.

    Google Scholar 

  • Kaplan, A. M., & Haenlein, M. (2019). Siri, siri, in my hand: Who’s the fairest in the land? On the interpretations, illustrations and implications of artificial intelligence. Business Horizons, 62(1), 15–25.

    Article  Google Scholar 

  • Karabenick, S. A. (2003). Seeking help in large college classes: A person-centered approach. Contemporary Educational Psychology, 28(1), 37–58.

    Article  Google Scholar 

  • Kasneci, E., Sessler, K., Küchemann, S., Bannert, M., Dementieva, D., Fischer, F., Gasser, U., Groh, G., Günnemann, S., Hüllermeier, E., Krusche, S., Kutyniok, G., Michaeli, T., Nerdel, C., Pfeffer, J., Poquet, O., Sailer, M., Schmidt, A., Seidel, T., … Kasneci, G. (2023). ChatGPT for good? On opportunities and challenges of large language models for education. Learning and Individual Differences, 103, 102274.

  • Kim, K., Kwon, K., Ottenbreit-Leftwich, A., Bae, H., & Glazewski, K. (2023). Exploring middle school students’ common naive conceptions of Artificial Intelligence concepts, and the evolution of these ideas. Education and Information Technologies.

    Article  Google Scholar 

  • Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33, 159–174.

    Article  Google Scholar 

  • Liao, Q. V., & Sundar, S. S. (2022). Designing for responsible trust in AI systems: A communication perspective. ACM International Conference Proceeding Series.

    Article  Google Scholar 

  • Lim, W. M., Gunasekara, A., Pallant, J. L., Pallant, J. I., & Pechenkina, E. (2023). Generative AI and the future of education: Ragnarök or reformation? A paradoxical perspective from management educators. The International Journal of Management Education, 21(2), 100790.

    Article  Google Scholar 

  • Lo, C. K. (2023). What is the impact of ChatGPT on education? A rapid review of the literature. Education Sciences.

    Article  Google Scholar 

  • Lockey, S., Gillespie, N., Holm, D., & Someh, I. A. (2021). A review of trust in artificial intelligence: Challenges, vulnerabilities and future directions. Proceedings of the 54th Hawaii International Conference on System Sciences, 5463–5472.

  • Matsuda, N., Yarzebinski, E., Keiser, V., Raizada, R., Cohen, W. W., Stylianides, G. J., & Koedinger, K. R. (2013). Cognitive anatomy of tutor learning: Lessons learned with SimStudent. Journal of Educational Psychology, 105(4), 1152.

    Article  Google Scholar 

  • McNamara, D. S., Crossley, S. A., & Roscoe, R. (2013). Natural language processing in an intelligent writing strategy tutoring system. Behavior Research Methods, 45, 499–515.

    Article  Google Scholar 

  • Mertala, P., Fagerlund, J., & Calderon, O. (2022). Finnish 5th and 6th grade students’ pre-instructional conceptions of artificial intelligence (AI) and their implications for AI literacy education. Computers and Education Artificial Intelligence.

    Article  Google Scholar 

  • Mhlanga, D. (2023). Open AI in education, the responsible and ethical use of ChatGPT towards lifelong learning. Social Science Research Network.

    Article  Google Scholar 

  • Michaeli, T., Romeike, R., & Seegerer, S. (2023). What students can learn about artificial intelligence-recommendations for K-12 computing education. IFIP WCCE 2022: World Conference on Computers in Education.

  • Nazaretsky, T., Cukurova, M., Ariely, M., & Alexandron, G. (2021). Confirmation bias and trust: Human factors that influence teachers’ attitudes towards AI-based educational technology.

  • O’Connor, S., ChatGPT. (2023). Open artificial intelligence platforms in nursing education: Tools for academic progress or abuse? Nurse Education in Practice, 66, 103537.

    Article  Google Scholar 

  • Pavlik, J. V. (2023). Collaborating with ChatGPT: Considering the implications of generative artificial intelligence for journalism and media education. Journalism & Mass Communication Educator, 78(1), 84–93.

    Article  Google Scholar 

  • Pizzi, G., Vannucci, V., Mazzoli, V., & Donvito, R. (2023). I, chatbot! The impact of anthropomorphism and gaze direction on willingness to disclose personal information and behavioral intentions. Psychology & Marketing, 40(7), 1372–1387.

    Article  Google Scholar 

  • Qadir, J. (2023). Engineering Education in the Era of ChatGPT: Promise and Pitfalls of Generative AI for Education. IEEE Global Engineering Education Conference (EDUCON), 2023, 1–9.

    Article  Google Scholar 

  • Qin, F., Li, K., & Yan, J. (2020). Understanding user trust in artificial intelligence-based educational systems: Evidence from China. British Journal of Educational Technology, 51(5), 1693–1710.

    Article  Google Scholar 

  • Removed for blinded review.

  • Rücker, M. T., & Pinkwart, N. (2016). Review and discussion of children’s conceptions of computers. Journal of Science Education and Technology, 25(2), 274–283.

    Article  Google Scholar 

  • Sahoo, S., Kumar, S., Abedin, M. Z., Lim, W. M., & Jakhar, S. K. (2023). Deep learning applications in manufacturing operations: A review of trends and ways forward. Journal of Enterprise Information Management, 36(1), 221–251.

    Article  Google Scholar 

  • Sallam, M. (2023). ChatGPT utility in health care education, research, and practice: Systematic review on the promising perspectives and valid concerns. Healthcare, 11(6), 887.

    Article  Google Scholar 

  • Schmidt-Fajlik, R. (2023). ChatGPT as a Grammar Checker for Japanese English Language Learners: A Comparison with Grammarly and ProWritingAid. AsiaCALL Online Journal, 14(1), 105–119.

    Article  Google Scholar 

  • Smith, J. P., diSessa, A. A., & Roschelle, J. (1993). Misconceptions reconceived: A constructivist analysis of knowledge in transition. The Journal of Learning Sciences, 3(2), 115–163.

    Article  Google Scholar 

  • Su, Y., Lin, Y., & Lai, C. (2023). Collaborating with ChatGPT in argumentative writing classrooms. Assessing Writing, 57, 100752.

    Article  Google Scholar 

  • Sun, J., Liao, Q. V., Muller, M., Agarwal, M., Houde, S., Talamadupula, K., & Weisz, J. D. (2022). Investigating Explainability of Generative AI for Code through Scenario-based Design. International Conference on Intelligent User Interfaces, Proceedings IUI.

    Article  Google Scholar 

  • Sundar, S. S., & Kim, J. (2019). Machine heuristic: When we trust computers more than humans with our personal information. Conference on Human Factors in Computing Systems Proceedings.

    Article  Google Scholar 

Download references


We thank all the students who took their valuable time to participate in this study and Removed for Blinded Review for their edits to the manuscript. In addition, we would like to express our sincere appreciation to all the reviewers for their comments and feedback to improve this paper.


All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.

Author information

Authors and Affiliations



LD: Conceptualization, Methodology, Formal analysis, Project administration, Writing—Original Draft, Writing—Reviewing & Editing; TL: Writing—Original Draft, Writing—Reviewing & Editing; SJ: Writing—Reviewing & Editing; AG—Investigation.

Corresponding author

Correspondence to Lu Ding.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A

Appendix A

figure a

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ding, L., Li, T., Jiang, S. et al. Students’ perceptions of using ChatGPT in a physics class as a virtual tutor. Int J Educ Technol High Educ 20, 63 (2023).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: