VR-based health and safety training in various high-risk engineering industries: a literature review

This article provides a critical review of the current studies in VR-based health and safety training, assessment techniques, training evaluation, and its potential to improve the training evaluation outcomes in various high-risk engineering industries. The results of this analysis indicate the breadth of VR-based applications in training users on a combination of topics including risk assessment, machinery, and/or process operation in various industries. Data showed that the use of fully immersive VR increased significantly due to the improvements in hardware, display resolution, and affordability. Most of the articles used external assessment to measure the changes in the satisfaction and the declarative knowledge of trainees as these are easier to implement, while some articles started to implement internal assessment that provides an automated assessment capable of measuring complex skills. The results of the study also suggest that it has the potential to improve the training evaluation outcomes compared to traditional training methods. The findings from this study help practitioners and safety managers by providing a training design framework that may be adopted to optimise the condition of a VR-based training.

widely integrated into H&S training for safety-critical industries. This technology can create safe yet complex learning and training environment, as well as promote knowledge acquisition through active involvement (Gao et al., 2017;Isleyen & Duzgun, 2019). Considering the promising learning improvement that VR can provide, it is understandable that the number of publications focusing on the application of VR to specific areas such as education and training is increasing (Checa & Bustillo, 2019;Jensen & Konradsen, 2018). However, little research has been conducted into the analysis of different assessment methods and instruments to evaluate the effectiveness of VR training, especially in engineering.
To make sure that the new VR training is effective and aligned with the needs of the organisation it is important for the stakeholders, especially trainers and administrators to understand why and how learning occurs when using VR tools in order to choose appropriate assessment methods and use these as indicators of the effectiveness of the VR training. However, the recent review on the effectiveness of both conventional and computer-aided technologies for health and safety training in the construction sector by Gao et al. (2019) stated that the empirical evidence supporting the effectiveness of computer-aided technologies (CAT) is still limited. This claim was based on the result that out of the 34 CAT articles considered, only one study evaluated the effectiveness of knowledge acquisition during training .
Professional training is considered effective when the required attributes, such as problem-solving and analytical skills, are transferred and applied successfully to the daily jobs of trainees. Since it is the responsibility of stakeholders to choose and implement effective health and safety training, it is beneficial for them to understand the various assessment methods used to evaluate training effectiveness. The purpose of this study is to conduct a systematic examination of literature to review the assessment methods used to evaluate the outcomes of the different VR-based health and safety programmes in various high-risk engineering industries. Thus, the following research questions guided this review: 1. What topics have researchers investigated for VR-based health and safety training in various high-risk engineering industries? 2. What types of VR were used to deliver health and safety training in various high-risk engineering industries? 3. What were the outcome(s) measured for establishing the effectiveness of the VRbased health and safety training in various high-risk engineering industries? 4. What assessment techniques were used to evaluate the outcome(s) of VR-based health and safety training in various high-risk engineering industries? 5. Does VR-based health and safety training in various high-risk engineering industries have the potential to improve the training evaluation outcome(s) compared to traditional and/or other VR-based training methods?

Selection criteria
A detailed review of research studies published within an 11-year time frame (Jan. 2011to Nov. 2021) was undertaken following the procedure proposed by (Kitchenham, 2004). Date criteria was based on the need to provide an updated picture of the recent VR development in various high-risk engineering health and safety programmes. A literature search was conducted on Scopus (www. scopus. com) as this is the largest abstract and citation database of peer-reviewed research literature (Jin et al., 2019). The following keywords were used for the literature search: "Immersive virtual environment ", or "virtual reality" or "virtual environment" or "VR", and "assessment" or "performance assessment" or "evaluation" and "health and safety training" or "safety training" or "industrial safety" or "plant operators" or "high risk industry" The literature search and publication selection process are shown in Fig. 1. As observed, the literature search yielded 1381 records. These records were then screened based on the language, year of publication, document type, and whether used for health and safety training in high-risk industries. After examining the title and abstract of each publication, 116 articles were identified as eligible. These selected publications were then subjected to further full-text screening. Forty-five (45) articles were selected for a detailed analysis following the screening criteria such as whether or not the considered article applied a specific assessment method and not merely describing the framework of the projects.

Data analysis
In line with the prompts introduced by the research questions, the following information serving as the column headings was extracted from each article (shown in Additional file 1: Appendix A).

What topics have researchers investigated for VR-based health and safety training in various high-risk engineering industries?
Of the 45 papers analysed, VR-based technologies have been used for H&S training in the following industrial sectors comprising high-risk engineering activities: construction (n = 25), manufacturing and assembly (n = 8), chemical process/laboratory (n = 7), mining (n = 2), electric power and electronics (n = 2), and agricultural (n = 1). This range of industries is due to the potential of VR technologies to create digital analogues for real-life scenarios that can be used for training, including both normal and abnormal operating conditions, in which stress drivers can still be incorporated while ensuring a safe training setting (Bissonnette et al., 2019;Dholakiya et al., 2019). Specifically, the use of VR-based technologies in the field of construction was evident when compared to other industries (Fig. 2). One of the possible reasons why majority of the researchers used VR-based technologies in construction industries may be the high accident and fatality rates in this industrial sector (Pedro et al., 2020). Another possible reason may be the ease of simulation and development of VR in construction related activities are compared to other high-risk engineering industries.
In terms of the specific H&S topics taught in VR-based training, 18 out of 45 articles ( Fig. 2) reported in this literature review used VR technologies to upskill trainees on how to assess risk(s) in various high-risk engineering industries. Risk assessment is the process of assessing the nature and likelihood of undesirable effects that may occur following exposure to hazards (e.g., biological, chemical, or physical) in a systematic way (Brecher, 1997). Most manuals and instructions on different safety training (e.g., construction, chemical, mining) list hazard identification, risk analysis and evaluation, risk control, and risk assessment documentation and review as the important steps (Health & Safety Executive, 2014). The first step is the identification of hazard(s) which requires learners to investigate and determine how and when a hazardous situation can lead to a certain accident(s). Risk analysis and evaluation is Toyoda et al. Int J Educ Technol High Educ (2022) 19:42 the second step which requires learners to understand the nature of the identified hazard(s) and determine the impact of corresponding risk(s). The third step is the risk control which requires learners to implement appropriate action(s) to identified risk(s) and the last step is the risk assessment documentation and review which requires learners to keep a formal record of the risk assessment. Despite the consistent adaption of the abovementioned steps, the safety performance of the trainee group will remain low if the training programme is unengaging and passive. As the literature indicates, numerous hazards remain unrecognised and poorly managed in several high-risk industries such as construction-related workplaces, due to adoption of sub-standard practices and delivery methods in training programmes (Jeelani et al., 2020).
To bridge these gaps, several researchers adapted and used VR-based training methods for risk assessment training such as the abovementioned list of risk assessment steps in various high-risk engineering industries such as in construction (Ahn et al., 2020;Albert et al., 2014;Han et al., 2021;Joshi et al., 2021;Kazar & Comu, 2021;Lin et al., 2011Lin et al., , 2018Pedro et al., 2020;Perlman et al., 2014;Pham et al., 2019;Xu & Zheng, 2021), chemical process/laboratory (Kwegyir-Afful et al., 2021;Nazir et al., 2015;Nicoletti & Padovano, 2019;Stransky et al., 2021;Tawadrous et al., 2017), manufacturing and assembly (Diego-Mas et al., 2020), and mining (Isleyen & Duzgun, 2019), as these technologies can recreate a realistic but safe 3D environment of some hazardous workplace scenarios where the trainees can improve their risk assessment skills through the learning-by-doing approach (Fig. 3). For instance, the study of Han et al. (2021) used VR wearable device (HTC Vive) to locate, analyse, and mitigate hazards such as structural collapse, injuries by heavy equipment and injuries by manual handling or lifting at a construction sites. Moreover, Kwegyir-Afful and his colleagues (2021) also used VR wearable device (HTC Vive) to recognise, evaluate, and control the fire hazard at a gas power plant. Another 18 out of 45 papers (Fig. 2) reported in this literature review used VR-based technologies to train learners on how to control and manipulate machinery and process operation. As working with different equipment and their corresponding processes in an industrial plant is a practical skill which often requires on-site experience during a sustained period of time in order to be developed, it is important for the new employees to have on-site hands-on practice with the equipment and/or processes involved so they can fully appreciate and lessen the corresponding risks involved (Serpa et al., 2020). However, it is often impractical to carry out training on actual machinery and process operation safety training on the plant when this interrupts on-site operations. As an alternative, employees are usually provided with a set of guidelines in the form of two-dimensional (2D) pictures and text which covers topics from terminology up to the operation and maintenance of equipment and/or processes.
Unfortunately, safety training delivering information through the abovementioned procedure usually offers a low level of engagement, presence, as well as realism since it is difficult for the new employees to fully visualize and understand the information provided from 2D pictures and text (Numfu et al., 2020). To bridge these gaps, several researchers developed VR-based machinery and/or process operation safety training in construction (Beh et al., 2021;Choi et al., 2020;Guo et al., 2012;Li et al., 2012;Osti et al., 2021;Shi et al., 2020;Song et al., 2021;Vahdatikhaki et al., 2019;Wang et al., 2020), manufacturing and assembly (Dado et al., 2018;Gallegos-Nieto et al., 2017;Grandi et al., 2021;Hernández-Chávez et al., 2021;Numfu et al., 2020;Serpa et al., 2020), electric power and electronics (Ayala García et al., 2016;Ogbuanya & Onele, 2018), and agriculture (Ojados Gonzalez et al., 2017) as shown in Fig. 4. For instance, Dado et al. (2018), used VR wearable device (HTC Vive) to allow trainees to use and become familiar with operation of an industrial lathe. Moreover, Song et al. (2021), used HTC Vive to train users in the operation of different cranes (e.g., overhead crane, tower crane, and container crane) by providing a virtual experience on how to operate these different type of cranes. As confirmed by the studies of the abovementioned researchers, adapting the VR-based technologies allow trainees to safely study and practice the operating procedure of the given machine/equipment that are closely resembles the real environment they will encounter on-site.
Lastly, some authors ( Fig. 2) used VR technologies to train users on both risk assessment and machinery/process operation in construction (Adami et al., 2021;Dhalmahapatra et al., 2021;Le et al., 2015;Nykänen et al., 2020;Sacks et al., 2013), chemical process/laboratory (Makransky et al., 2019;Poyade et al., 2021), manufacturing and assembly (Leder et al., 2019), and mining (Liang et al., 2019) as shown in Fig. 5. For instance, Dhalmahapatra et al. (2021) used Oculus Rift to help the learners to grasp the sequence of overhead crane operations as well as the process of managing the possible hazards while working.

What types of VR were used to deliver health and safety training in various high-risk engineering industries?
Depending on the level of immersion, type of interactive, and display device used, VR can be classified as either non-immersive (i.e., desktop), semi-immersive, or fully immersive VR (van Wyk & de Villiers, 2019). Non-immersive or desktop VR uses a conventional PC monitor, speakers and mouse to display virtual reality environment (VRE), sound, and interaction, respectively (van Wyk & de Villiers, 2019). On the other hand, semi-immersive or projected VR uses a system consisting of multiple projectors and projection screens, speakers, and controllers to display VRE, sound, and interaction, respectively while a fully immersive VR uses a head-mounted display (HMD) with earphones and motion tracking device to display VRE, sound, and interaction, respectively (van Wyk & de Villiers, 2019). In order to explore the use of different VR-based technologies used for H&S training in various high-risk engineering industries, Fig. 6 shows the distribution of VR technology types that were reported in the reviewed publications from 2011 to 2021.
As shown in Fig. 6, the amount of studies using non-immersive VR technologies for H&S training in high-risk engineering industries as a fraction of the total of studies in each year has declined over the past 11 years. For instance, Ayala García et al. (2016), Lin et al. (2011), Nicoletti andPadovano, (2019), and Serpa et al. (2020) used nonimmersive VR technologies, such as desktop computers, as a tool for H&S training in construction, electric power, chemical process, and manufacturing-related industries, respectively. Although as Freina and Canessa (2015) noted that the non-immersive VR technologies lacks the feeling of presence (i.e., the subjective feeling of "being" in Distribution of VR technology types that were reported in the reviewed publications the task environment) compared to immersive VR and this leads to lower engagement and transfer of learning. However, possible reasons why there are still notion for some researchers to use this type of VR technologies is due to the limited resources (e.g., financial, accessibility, etc.) of the governing bodies (e.g., institutions, funding agencies, etc.) and also due to the limited accessibility of the immersive type VR-based technologies.
As compared to non-immersive VR, semi-immersive VR gives a greater sense of presence (An & Park, 2018). However, only 4 authors namely Sacks et al. (2013), Perlman et al. (2014), Nazir et al. (2015), and Leder et al. (2019) used semi-immersive type of VR technology such as the cave automatic virtual environment (CAVE) for H&S training in high-risk engineering industries over the past 11 years (Fig. 6). The low preference for semi-immersive VR as compared to the other two types is due to the financial and management considerations. Since the construction and the installation of a new CAVE facility which consisting of a multiple high-resolution projectors and projection screens are frequently complex, costly, and laborious in maintenance work, it is reasonable that the limited amount of articles published on this topic is due also to the limited number of institutions having access to these facilities (Havig et al., 2011). However, it should be pointed out that some researchers still prefer to use CAVE for H&S training in various high-risk engineering industries as this technology has the ability to allow multiple participants to interact and share ideas/experiences with each other at the same time (Muhanna, 2015).
On the other hand, the number of publications on the usage of fully immersive VR for health and safety training in various high-risk industries increased significantly from 2019 ( Fig. 6), and nowadays comprises the vast majority of the studies published. One of the reasons for this paradigm shift is the continuous improvement of these fully immersive VR over time. From the release of the first commercial VR head-mounted display (Oculus Rift) in 2013, the hardware and in display resolution have improved significantly over the last few years (Jensen & Konradsen, 2018). For instance, the typical field of view (FOV) of older HMDs was between 25 to 60 degrees but new type of HMDs have FOVs above 100 degrees (Riva et al., 2016). Another reason is the potential of the fully immersive technologies to provide a high degree of presence and immersion. Fully immersive VR allows users to be completely isolated from the real world, thus letting the user focus entirely on the VRE to spend more time on the learning tasks, and gain better skills (Jensen & Konradsen, 2018). Aside from the ability of these technologies to offer a better user experience, the significant reduction in cost of the new generation of HMDs made these the best choice for several companies as well as research institutions. For instance, the recently released Oculus Quest 2 (cordless HMD), which costs around 299 USD, is much cheaper than the previous version of Oculus. Given these benefits, it is expected that there will be a progressive increase in the number of publications on this type of VR for H&S training in various high-risk industries in the next few years.

What are the outcome(s) measured for establishing the effectiveness of the VR-based health and safety training in various high-risk engineering industries?
The adoption of VR-based technologies for H&S training in high-risk engineering industries has increased in the past 11 years. However, since the success of a given training method depends on the degree to which the training prepares trainees for real-world situations (a.k.a. transfer of training), it is important to understand how to analyse and evaluate the outcome(s) of given VR-based health and safety training in high-risk engineering industries. Prior to objectively investigating the impact of the training, it is vital to categorise the outcomes(s) of the training programme.
According to Kirkpatrick (2006), evaluating the outcome(s) of training can be classified into four levels. The first level is the reaction level where reactions of the trainees (e.g., trainees thought) are identified and the satisfaction of the trainees is measured. The second level, which can be described as the measurement of the increase in knowledge or intellectual capability as a consequence of the training is known as the learning level. The third level is the behaviour level. This requires measuring the change behaviour that transfers to actual performance in the job as a result of the training. The final level involves the result level, which assesses the impact of training in terms of organizational outcomes (Kirkpatrick, 2006). Although the framework is usually applied in step-by-step manner to map the process of evaluating the success of a given training, some training does not require implementation of this step-by-step process. For instance, training such as information security training requires the trainees not only retain the information but also apply this information at work. Thus, assessment designer should focus on level 2 and level 3. However, if the institution developed a new information security training method (e.g., VR-based training), then the assessment designer should apply the Kirkpatrick evaluation model in step-by-step manner as they need to assess the overall impact of the newly developed training method and make practical judgement whether to adopt or replace the existing training method.
As implementation of VR-based technologies is relatively expensive compared to conventional training methods such as lectures and PowerPoint presentations, it is important for the stakeholders to use the Kirkpatrick four level of training evaluation so that they can use these evidence to decide whether to invest in VR-based training or not. Figure 7 shows the distribution of the outcome(s) measured for VR-based health and safety training based on Kirkpatrick's training evaluation model over a span of 2011-2021.
As shown in Fig. 7, a number of authors such as Lin et al. (2011), Isleyen and Duzgun (2019), and Numfu et al. (2020), evaluated the reaction level (Level 1) outcome for different type of VR-based H&S training. For instance, the study of Vahdatikhaki et al. (2019) confirmed that the use of fully immersive VR as a tool for construction equipment training was feasible (based from the feedback scores collected from the trainers) as this technology provides an effective solution for students to learn about operation safety from their mistakes in the VR environment. Although VR-based training has gained a significant level of attention in several high-risk industries such as in medical, in aviation, and even in engineering, most of the authors still conducted the reaction level training evaluation for the past 11 years. The main reason why researchers still conduct evaluation on reaction level outcomes is because they want to verify the potential of these new technologies (regardless of the type) for a specific topic of H&S training in high-risk engineering industries.
However, having a positive outcome at the reaction level does not guarantee that there is a knowledge/skills acquisition when VR-based technologies were adopted. This is because the data gathered from reaction level (level 1) only reflects the overall reaction/experience (e.g., satisfaction, enjoyment, etc.) of the given training. As a result, authors such as Nazir et al. (2015), Tawadrous et al. (2017), and Choi et al. (2020) evaluated the learning level (Level 2) outcome for different type of VR-based H&S training (Fig. 7). For instance, Dado et al. (2018) confirmed that the learners who used the fully immersive VR for identification of hazards related to lathe operation achieved higher scores (average number of correct hazards identified = 8.1) as compared to learners who used the control setting (e.g., PowerPoint presentation setting) (average = 7.7). One of the reasons why many of the studies considered in this review conduct learning level training evaluation is because they want to measure the degree of the intended knowledge/skills trainees acquired upon completion of the VR-based training. Another reason is because they want to carry out preliminary research to determine whether VR training is comparable to the control setting (e.g., traditional training) in developing the necessary H&S skills. Fewer authors used the third level of Kirkpatrick's model to evaluate the long-term effect of the given VR-based training to the behaviour of the trainees (Fig. 7). For instance, Nykänen et al. (2020), confirmed that there was a greater increase in the self-reported safety performance of the participants (e.g., identifying factors affecting safety) 1 month after their VR-based construction safety training. Moreover, there was no study on measuring the fourth level of Kirkpatrick's model which evaluates the organisational results and the cost and return on investment of the training for the past 11 years. The low number of articles on the third and fourth level of Kirkpatrick's model is due to the fact that the process of measuring the amount of learning transferred to job behaviour (Level 3) or the overall success of the training (Level 4) require researchers to perform longitudinal studies (e.g., conducting and reviewing pre-defined performance metrics in a pre-set time interval through observation). As most of the projects have limited funding duration, it is difficult to obtain additional funding if there is some delay due to unforeseen circumstances and this leads to fewer studies dealing with the behaviour and results level outcomes (Caruana et al., 2015).

What assessment techniques were used to evaluate the outcome(s) of VR-based health and safety training in various high-risk engineering industries?
Since it is important for every training to measure specific performance criteria that are essential to the development of skilled personnel, understanding appropriate assessment methods play a vital role. Broadly speaking, assessment is a systematic process of recording and presenting information (e.g., knowledge, skills, etc.) about learner accomplishment and instructional processes (Brookhart, 1999). Generally, assessment can be categorised into two groups: summative (assessment of learning) and formative (assessment for learning) (Loh, 2012). Summative assessment analyses the understanding and mastery of the topic after an activity is completed while formative assessment makes use of regular interactive measurements that identify points of improvement for better learning outcomes (Loh, 2013;Sadler, 1989). The information for both summative and formative assessment can be collected through several ways such as conducting paper-and-pencil tests (e.g., multiple choice, matching, etc.), self/peer/supervisor-report (e.g., feedback, observation, etc.).
According to Loh (2011), many researchers claimed that formative assessment can have positive effect on the learning processes as it provides continuous and timely assessment information which can point out, shape, and improve the specific area of difficulties trainees are having. Unfortunately, the implementation of formative assessment is not an easy task for most of the trainers and lecturers especially teaching in large lecture classes as they cannot afford extra time and effort to provide valuable feedback for addressing the gap between their present and their projected performance (Bennett, 2011). However, as the use of an online learning has evolved considerably for the past few years, the concept of digital-based simulation or game assessment (i.e., process of automated collection, organisation, documentation, and presentation of scores and the corresponding feedback on individual learner performance managed through the medium of digital devices such as computers, VR-HMDs, and etc.) becomes broadly recognised as a solution to the abovementioned implementation problem (Bulut et al., 2019).
According to Eseryel et al. (2011), digital-based simulation or game assessment may be categorised as either external or internal assessment. Some examples of external assessment include interview, multiple-choice questionnaire (MCQ), knowledge tests, and practical tests which are similar to the traditional assessment method. On the other hand, data provided through the simulation and game files that log the actions of the player and game variables are examples of internal assessment. Both assessment methods can be used as summative or formative depending on the timing of implementation (e.g., before, during, and/or after playing the game/simulation). The main difference between these two types of assessment is that the external assessment is not normally part of the game/simulation course and it will interrupt the game/simulation while the internal assessment is typically used in the game/simulation course without interfering with the game/simulation itself (Eseryel et al., 2011). Figure 8 shows the distribution of the assessment methods used to measure the different outcomes of training evaluation based on Kirkpatrick's model.
Most of the articles considered in this review used external assessment such as selfassessment questionnaires (e.g., intrinsic motivation, perceived enjoyment, presence, self-efficacy, effectiveness and satisfaction questionnaires) or interviews to evaluate the first level of Kirkpatrick's model (reaction criteria) of the trainees which measures their satisfaction with the VR-based H&S training (Fig. 8). For instance, Wang et al. (2020) used Likert scale questionnaire to determine the change in terms of the confidence (e.g., satisfaction) of the trainees in undertaking either the fully immersive VR or traditional lecture-based scaffolding erection operation training in construction industry. Their results from a paired-sample test confirmed that compared to lecture-based training, participants who used VR-based training showed a stronger impact on satisfaction (5.81 vs. 6.81). Moreover, the VR-based training approach (mean value = 6.56) was more helpful compared to the lecture-based training (mean value = 5.75). One of the reasons for the frequent usage of external assessment over internal assessment for evaluating the first level of Kirkpatrick's model is because questionnaires are relatively easy to administer and implement. Another reason is because the implementation of internal assessment for measuring satisfaction/usability of VR-based training requires an additional work from the integration of the appropriate tools (e.g., emotion sensors) to the analysis of the desired variable(s) from the large amount of data corresponding to certain emotions (e.g., positive, negative, neutral) (Dzedzickis et al., 2020).
As shown in Fig. 8, both external and internal assessment method can be used to evaluate how much knowledge/skills trainees gained in the VR-based training programme (Level 2 of Kirkpatrick's model). For instance, Ogbuanya and Onele (2018) used a knowledge test to assess the fundamental knowledge of electrical/electronic technology (e.g., electronic circuits and power supply design) operation gained in non-immersive VR-based training compared to conventional classroom training. Using analysis of covariance (ANCOVA), their results indicate that virtual reality positively affected the academic performance of the learners as there was a significant difference in the knowledge test scores of the participants who used the non-immersive VR-based training performed (mean = 71.7) and traditional method (mean = 60.1) (Ogbuanya & Onele, 2018). Although it is easier to evaluate the effectiveness of the VR training by checking the learning of the players using conventional assessment methods, assessment of trainees through external assessment such as the paper-andpencil format is only efficient for measuring some simple outcomes such as declarative knowledge but may not be effective to measure the development of complex skills (e.g., problem-solving, teamwork and collaboration, etc.) (Garcia Fracaro et al., 2021).
To maximize the potential of VR-based technologies, several authors used internal assessments such as log data to trace and capture learner-generated data (e.g., correct actions, tasks completed). For instance, Nazir et al. (2015) used log data to capture the actual performance of operators on locating correct valves, opening or closing a valve, and/or identify leakages for safety training in chemical plant. Their results showed that participants trained in a VR environment were able to identify more leakages (67%) and manually operated valves (83%) compared to conventional methods such as power point presentation (42% and 50%, respectively) (Nazir et al., 2015). Through integrating internal assessment it is possible to create an automated assessment capable of measuring complex skills, such as problem solving, which translate into better performance in the real world (Shute & Wang, 2016). Although Loh (2012) argued that creating a game/ simulation-based analytics requires a lot of work from discovering useful metrics for measuring human performance, to verifying the corresponding equivalence of digital and actual actions, and to identifying strong predictors from thousands of information points available in each data set. Nevertheless, it is expected that there will be a progressive increase in the usage of internal assessment for H&S training in various high-risk industries in the next few years given the rapid advancement in the field of data mining and machine learning which will facilitate the development of the log data analysis. Figure 8 further shows even fewer authors evaluating the third level of Kirkpatrick's model compared to the previous two levels as the former requires significant amount of time and money. Moreover, it was evident from the figure that most of the authors prefer using external assessment methods (e.g., questionnaires and practical exams) to internal methods (e.g., log data) for evaluating the amount of learning transferred to job behaviour. For instance, Makaransky et al., (2019) used practical exams (e.g., situational judgement scenario) to assess the amount of learning transferred to job behaviour in a chemical laboratory setting after training using a fully immersive VR platform. Their results showed that the students in the VR-based safety training showed a greater increase in terms of the ability to demonstrate appropriate laboratory skills and behaviour in the practical tests compared to desktop VR and conventional safety manual training (Makransky et al., 2019). On the other hand, Albert et al., (2014) used longitudinal collection and analysis of log data to measure the behavioural criteria of the trainees in VR-based construction safety. Their study confirmed that participants were able to increase their hazard recognition skills from 46 to 77% in the post-intervention phase, and maintained this score until the end of the 16th working period. They also stated that it is important to have a support from funding agencies and partnership with a wide range of industry professionals with varied skills and experience in order to capture the needed variables from the log data and accurately measure patterns of change that can be used for the determination of behavioural criteria (Albert et al., 2014).

Does VR-based health and safety training in various high-risk engineering industries have the potential to improve the training evaluation outcome(s) compared to traditional and/ or other VR-based training methods?
VR technology creates a representation of real-life scenario which allows trainees to be exposed and to be trained in dealing with hazardous situations within a safe 3D setting. In this context, several authors explore the potential impact of such VR-based H&S training to improve the outcome(s), measured based on Kirkpatrick's training evaluation model, compared to traditional (e.g., lecture, PowerPoint presentation, audio-visual presentation, etc.) and/or other VR-based training methods (Dhalmahapatra et al., 2021;Makransky et al., 2019;Osti et al., 2021). Figure 9 shows the distribution of the studies which compare the outcome(s), measured based on Kirkpatrick's training evaluation models, between different types of VR-based training or between VR-based training visa-vis traditional training.
As shown in Fig. 9, 16 out of 36 papers compared the reaction level between different types of VR-based training or between VR-based training vis-a-vis traditional training in various high-risk engineering industries. Among the 16 papers, the results of a statistical test (e.g., t-test, ANOVA, etc.) of the 13 studies showed that the VRbased training provides a greater degree of reaction level compared to the traditional setting (Ahn et al., 2020;Beh et al., 2021;Diego-Mas et al., 2020;Guo et al., 2012;Joshi et al., 2021;Leder et al., 2019;Li et al., 2012;Makransky et al., 2019;Nykänen et al., 2020;Pham et al., 2019;Poyade et al., 2021;Sacks et al., 2013;Xu & Zheng, 2021). For instance, Leder et al. (2019), stated that compared to the traditional training method (PowerPoint presentation), there was a better degree of immersion and presence in the semi-immersive VR (CAVE) condition. Moreover, two of the papers compared the reaction level of different types of VR-based training system. For instance, Hernández-Chávez et al. (2021) and Dhalmahapatra et al. (2021) compared the reaction level between fully immersive VR and non-immersive VR. Both of their results showed that the fully immersive VR was better with respect to several reaction level criteria such as ease of operation, ease of learning, realism, immersion, and/ Fig. 9 Distribution of the studies which compare the outcome(s) between different types of VR-based training or between VR-based training vis-a-vis traditional training or graphics quality compared to desktop VR. On the other hand, Osti et al. (2021) showed that there was no statistical difference on the usability scores between the fully immersive VR training and traditional video training. However, it is important to note that the usability score of fully immersive VR was higher than the score of traditional video training (Osti et al., 2021). Given that most of the scores for the reaction level were better in the VR-based setting, this suggests that the use of VR-based training in various high-risk engineering industries may have a higher potential to provide enhanced degree of reaction level compared to the traditional setting.
25 out of 36 papers considered in this study compared the learning level between different types of VR-based training or between VR-based training vis-a-vis traditional training in various high-risk engineering industries (Fig. 9). Out of the 25 papers, 17 papers showed that the VR-based training provides a higher learning and/ or performance scores with respect to several H&S topics, such as risk assessment and/or machinery and process operation compared to the traditional setting (Adami et al., 2021;Ahn et al., 2020;Ayala García et al., 2016;Dado et al., 2018;Diego-Mas et al., 2020;Gallegos-Nieto et al., 2017;Kazar & Comu, 2021;Li et al., 2012;Liang et al., 2019;Nazir et al., 2015;Nykänen et al., 2020;Ogbuanya & Onele, 2018;Perlman et al., 2014;Pham et al., 2019;Sacks et al., 2013;Shi et al., 2020;Stransky et al., 2021). For instance, Pham et al. (2019) stated that users who used non-immersive VR obtained higher scores (mean = 80.1%) in hazard investigation compared to users who used the traditional lecture-based platform (mean = 76.3%). Moreover, Dhalmahapatra et al. (2021) compared the degree of learning level between fully immersive VR and non-immersive VR and their t-test results showed that the safety performance of the users trained in the fully immersive VR was better than the performance of the users trained in the non-immersive VR. On the other hand, researchers such as Beh et al.  Osti et al. (2021), and, Poyade et al. (2021), showed in their respective studies that there was no statistical difference on the performance scores between the VR-based training and traditional training methods. Although the results confirmed that there was no statistical difference, four out of six studies stated that performance gained by the VRbased training was higher than using the traditional training method. Majority of the studies imply favourable use of VR-based training as it can provide higher learning and performance scores compared to traditional training methods. This indicates that the use of VR-based training may have a higher potential to provide better learning and/or performance to the users. In terms of the behaviour level, five out of six papers compare the level between VR-based training and traditional training in various high-risk engineering industries (Fig. 9). Out of the five papers, researchers such as Ayala García et al. (2016), Makransky et al. (2019), and Nykänen et al. (2020), confirmed that there was a significant difference in terms of the ability of the users to demonstrate appropriate skills and behaviour in VR-based training compared to traditional training methods. On the other hand, authors such as Leder et al. (2019), and Diego-Mas et al. (2020), confirmed that there was no significant difference in the behaviour level in their respective studies. However, it is important to note that the behaviour performance scores of the VR-based training for both studies were higher than the traditional training. This suggests that the use of VR-based training in various high-risk engineering industries may have a higher potential to provide improved degree of behaviour level compared to the traditional setting.

Implications
Although this paper focuses on the use of VR-based H&S training in various high-risk engineering industries, researchers and stakeholders may consider the findings of this study as a basis for providing a training design framework that may be adopted to align the VR-based training with desired training outcome and assessment method. Figure 10 shows the proposed training design framework.
As training outcomes tend to focus on what trainees should achieve upon completion of certain training programme, practitioners must clearly and accurately define these training outcome(s). For instance, if the training is newly developed, it might be beneficial for the stakeholders to initially assess the usability and satisfaction of the said new training (Level 1) as well as the immediate knowledge/skills gain (Level 2). After analysing those outcomes, a decision can be made on whether to continue and whether to invest/develop the said training programme through evaluating the behavioural change among trainees (Level 3). Although evaluation of this outcome is only possible if there is an additional support in terms of human resource, time, and funds from the institution/ organisation.
After defining and determining the desired outcome(s), it is important to choose the appropriate digital-based assessment method(s) for evaluating the outcome(s). For instance, in evaluating the training satisfaction (Level 1), it is better to use external assessment methods such as questionnaires or interviews as these are proven to be easier to implement. On the other hand, both level 2 and level 3 outcomes can be evaluated by internal and external assessment methods. For instance, if the institution wants to create and develop an automated assessment, it might be beneficial to consider internal assessment method rather than external assessment method. However, practitioners must consider the advantages as well as the disadvantages of every assessment type as this will affect the required resources (e.g., human, financial, and time) needed by the institution/organisation. Upon aligning the desired training outcome(s) and the assessment method(s), practitioners can then select the suitable training method that will boost the engagement of the trainees. For instance, if the institution/organisation is aiming to create an affordable, Fig. 10 Proposed training design framework realistic, but safe replica of specific dangerous training activity, it might be appropriate to consider fully immersive VR as a training tool.

Limitations and future research
Due to the focus of this review, and to the selection and filtering processes, it is important to acknowledge some limitations. Firstly, this study only explored VR-based H&S training applications for high-risk engineering industries. There are various other industries (e.g., medical, military, and aviation) that have also used VR-based technologies for H&S training that readers may be interested in. It is worth acknowledging that the validity of the conclusions from this study is within the scope of the aforementioned research boundaries. For the future studies, it would be beneficial to expand the industrial sectors constraints to visualise the whole picture on VR-based H&S training and to establish the wider applicability of the results. Secondly, as this study only considered peer-reviewed article from the largest digital source available (Scopus) during literature search, it may be beneficial to consider other type of articles (e.g., professional reports, research project deliverables, trade publications) from other databases as there may be interesting results from these type of articles.

Conclusion
This study presents a review of the existing articles relating to the use of VR-based H&S training in various high-risk engineering industries. It also provides some insights on the types of VR, topics of H&S training, and types of assessment techniques and training evaluation. In addition, this study explored the potential of VR-based H&S training to improve the training evaluation outcome(s) compared to traditional and/or other VRbased training methods. 45 articles reporting specific assessment techniques were considered and analysed. The results indicated that most of the industries used VR-based technologies to train users either on the topic of risk assessment, machinery, or process operation. Moreover, the usage of fully immersive VR increased rapidly due to the recent improvements in hardware, display resolution, and price. In terms of the outcomes measured for establishing effectiveness of the VR-based H&S training, the interest of the trainers is focused on the measurement of the amount of change in the satisfaction and/or learning achievement of trainees within a short span of time. For instance, most of the researchers were using external assessments such as questionnaire, and interviews for training satisfaction studies as these are proven to be easier to implement. Moreover, external assessment such as knowledge test and MCQs were also used to evaluate the amount of declarative knowledge gained by the trainees in the VR-based training. On the other hand, some researchers used internal assessment methods such as log data to create an automated assessment which is capable of measuring complex skills such as problem solving and teamwork. Lastly, the VR-based H&S training was also found to have the potential to improve the reaction level, learning level, and behaviour level compared to traditional training methods.
In conclusion, the findings from this study can contribute and support the practitioners and safety managers in practice by providing a training design framework that may be adopted to align the VR-based training with desired training outcome and assessment method. This study can also be used as a basis to suggest that researchers should