Ethereal

Step into Greek Myth: AI-Powered VR Adventure

Ruslan Bekniyazov and Shaan Singh Chattrath

University of Washington

Abstract

Ethereal is an educational VR game designed to bridge the gap between learning and immersive technology by utilizing a Greek mythology-themed environment. Users engage with representations of Greek Gods and various mythological objects, providing a dynamic platform for historical education. This project, leveraging the Oculus platform, combines Raycasting, custom shaders, and audio recording to achieve high efficiency and immersive interaction. Using Whisper API for transcribing player inquiries and ChatGPT for generating contextual responses, we provide an enriching learning experience. Additionally, the use of ElevenLabs Text-to-Speech API enables the response to be played back, enhancing user immersion. Our results show an improvement in user engagement when given an interactive option to learning about Greek Mythology

1 / 5
2 / 5
3 / 5
4 / 5
5 / 5

Introduction

As digital technology evolves, new mediums have emerged, transforming traditional forms of learning into more engaging and immersive experiences. One such medium is virtual reality (VR), which provides a unique platform for interactive and experiential learning. Our project, Ethereal, falls into this innovative domain, providing an interactive VR experience based on Greek mythology.

Prior works have utilized VR for education, but Ethereal stands out by offering an immersive exploration of Greek mythology and employing OpenAI's Whisper and ChatGPT APIs, and the ElevenLabs Text-to-Speech API. The game capitalizes on user interactions and inquiries to provide customized educational content.

Ethereal is set in a beautiful Greek-themed garden, populated by statues of Greek Gods. As players navigate this space, Raycasts are used to identify points of interest based on where the player is looking. Once an object is focused on, it is highlighted, and the player can ask questions about it, which are transcribed and processed to generate contextual responses. This immersive interaction is achieved using the Whisper API for transcription, the ChatGPT API for response generation, and the ElevenLabs API for vocal response playback.

Our hypothesis that interactive and contextual learning enhances user engagement and learning outcomes was substantiated by the enthusiastic user feedback when interacting with our immersive world. As future work, we aim to expand the thematic content of the game and explore the application of our approach in other educational contexts, putting Ethereal at the forefront of the emerging field of immersive learning.

Contributions
  • We introduced a novel approach of interactive education by blending VR and AI, offering an immersive learning Greek mythology experience
  • We successfully integrated multiple technologies in a very coherent manner: OpenAI's Whisper and ChatGPT APIs, and the ElevenLabs Text-to-Speech API, creating an effective method for processing player inquiries and generating contextual responses.
  • We implemented efficient graphics rendering techniques, such as Raycasting and custom shaders, ensuring smooth game operation on Oculus Quest 2.
  • Through analysis of user feedback and knowledge uptake, we established the effectiveness of interactive and contextual learning, providing a model for future developments in immersive learning.
Method

Our method focused on the integration of VR and AI technologies to provide a unique, interactive educational experience. The theoretical approach can be broadly divided into three main aspects: VR environment creation, user interaction handling, and AI integration.

Firstly, the VR environment was created within Unity, themed around a Greek mythology garden. The space is populated with 3D objects representing Greek Gods and other mythological elements. An efficient rendering method (suited for mobile devices) using custom shaders was adopted to ensure smooth performance on Oculus Quest 2. Raycasting was employed to identify player gaze direction, which is then used to highlight objects of interest. Careful creation of themed assets as well as selection of assets ensured that the environment is coherent and immersive.

Secondly, user interaction handling was structured around the concept of inquisitive learning. Players can ask questions about the objects they are focused on. These questions are captured as audio input through the controller's trigger button, creating a dynamic query-response educational system. Our number one goal was to make it as intuitive as possible, so that it feels like a natural conversation. To counteract users' tendency to re-submit inquiries doe to the delay in responses, we added a status cue as part of our UI, which lets the user know when the system listening to the user, when it is transcriibing user's response, when it is thinking (waiting for ChatGPT to respond), and when it is waiting to speak (waiting for ElevenLabs API to generate a response).

Finally, the AI integration plays a crucial role in processing the user inquiries and generating responses. We utilized OpenAI's Whisper API to transcribe the user's spoken question, then passed this transcription to ChatGPT, which generates a contextual response. We ensured that a follow-up conversation is possible by including user's previously asked question as context to the prompt. The response from ChatGPT is then converted into spoken language using ElevenLabs Text-to-Speech API, providing an immersive, conversational learning experience.

This method forms the core of Ethereal's gameplay, providing a framework for an engaging, interactive, and educational VR experience. Its broad concepts and techniques can be adapted and applied to various thematic areas and educational contexts, making it a versatile approach for immersive learning.

Implementation Details

The implementation of Ethereal was achieved by integrating several hardware and software components. The hardware used was primarily the Oculus Quest 2 VR headset, chosen for its popularity, standalone capabilities, and great performance.

The software was built using the Unity game engine due to its comprehensive support for VR development and compatibility with the Oculus SDK. The 3D environment was designed within Unity, incorporating imported as well as custom-designed assets representing Greek Gods and mythological elements. Oculus Integration package was heavily utilized in the development of the game as it offers great functionality out of the box.

The implementation of gaze-based interaction was done through Raycasting, a built-in Unity function. This involved projecting a Ray from the user's gaze point and identifying if it intersects with any object in the scene. Upon detection, the object was highlighted using a custom shader to signal its active status to the user.

The AI integration involved a flow of several services. When a user's gaze triggers a Raycast on an object, pressing the trigger button activates an audio recording. This recording was then sent to OpenAI's Whisper API, an automatic speech recognition (ASR) system, to transcribe the audio into text.

The transcribed text, now a question through careful prompt crafting:

                      
                        var newMessage = new ChatMessage()
                        {
                            Role = "user",
                            Content = "You are guiding students through a VR abandoned 
                            garden for a research project dubbed \" Ethereal \"."  + 
                            " This following text is the object that the user is 
                            inquiring about: " + SelectionManager.selectionName  + 
                            ".  this is the users question/prompt:\" " + message + 
                            "\". Please keep your response very short (maximum of 
                            1 sentence)." + "Do not mention that you are an AI model 
                            or that you were developed by OpenAI, and pretend that 
                            you are Ether, a pool of " + "infinite knowledge." + 
                            "say jokes in case you do not understand the question 
                            or do not have the answer to the question"
                        };
                      
                    
was then sent to the ChatGPT API. It provided a contextual response based on the user's question, which was then sent to the ElevenLabs Text-to-Speech API to convert the response into spoken language. This text-to-speech service offered a natural-sounding voice that greatly enhanced the immersive experience.

The implementation of the Whisper and ChatGPT APIs was done using the HTTP client library, which allowed sending POST requests with the corresponding API key and payload. Both Whisper and ChatGPT provided a fairly straightforward integration, with the main challenge being handling async operations correctly to ensure a smooth and consistent user experience.

Evaluation of Results

Our evaluation of Ethereal focuses on three primary dimensions: system performance, AI response time, and user feedback.

System Performance

In terms of computational efficiency, the game performed well on the Oculus Quest 2, averaging around 51 frames per second (FPS). There were occasional drops in frame rate, but these were rare and did not significantly impact the user experience. This demonstrates the efficiency of our custom shaders and other optimization techniques used in the game's design.

AI Response Time

We also evaluated the response time of the integrated AI components, namely the Whisper, ChatGPT, and ElevenLabs APIs. Whisper and ChatGPT together took an average of 3.1 seconds to process and return a response. The ElevenLabs API, which converted the ChatGPT text responses to speech, added approximately 6.4 seconds, making the total response time around 9.5 seconds. While this is a relatively short time for processing, transcription, and response generation, it does present a slight delay in the seamless VR experience and indicates an area for potential optimization.

User Feedback

Our initial evaluation involved eight individuals who got to try the Ethereal demo. The feedback was overwhelmingly positive, with all participants expressing enjoyment of the VR experience and interest in seeing the project expanded. While only three individuals said they would personally use it for educational purposes if it was fully developed, seven out of eight expressed a desire to see this type of technology utilized on a larger scale in educational institutions.

The feedback indicates a high level of acceptance and enthusiasm for the concept of Ethereal. The desire for expansion and use in educational settings also reflects the potential for Ethereal as a valuable learning tool.

Criteria Results
System Performance (FPS) Average ~51 FPS with occasional drops
AI Response Time Whisper and ChatGPT: ~3.1 seconds, ElevenLabs API: ~6.4 seconds, Total: ~9.5 seconds
User Feedback 100% enjoyed the experience
100% were interested in seeing this expanded
37.5% would personally use it for educational purposes if expanded
87.5% want to see this approach taken to a higher scale in educational institutions
Discussion of Benefits and Limitations

The Ethereal project provides several key benefits. First and foremost, it offers an immersive learning environment, allowing users to explore and understand Greek mythology in an engaging, interactive way. The integration of AI services, such as the Whisper and ChatGPT APIs, adds a layer of dynamism and personalization to the educational content. This interaction-based education promotes active learning and encourages user curiosity. Furthermore, the utilization of custom shaders and Raycasting techniques ensure smooth and responsive VR performance, enhancing the user experience.

However, there are certain limitations to the current implementation of Ethereal. One of the main limitations stems from the complexity and depth of Greek mythology, which can lead to situations where the AI response might not fully address a user's question. Additionally, there might be occasional inaccuracies or misunderstandings in the Whisper API's transcriptions or the ChatGPT's responses, leading to confusion or miscommunication. Moreover, the focus on one cultural context (Greek mythology) might limit the appeal to users interested in other themes or topics. Finally and most importantly, the delay in responses is quite noticeable, mostly due to Text-to-Speech API's latency. This can be quite frustrating for the user, and might lead to re-submission of the same question, which can further exacerbate the delay.

Future Work

Looking ahead, there are several exciting avenues for future development. Expanding the thematic content of the game is a priority, incorporating other cultural, historical, or scientific contexts to broaden the appeal and educational scope of Ethereal. Further enhancement of the AI systems to improve the quality and accuracy of the responses could also be achieved through iterative feedback and machine learning techniques.

We also envision potential improvements in user interaction design, such as gesture-based control or voice command, providing more natural and intuitive ways for users to navigate and interact with the VR environment. Additionally, developing a multiplayer mode where users can learn and explore together might greatly enhance the social and collaborative aspects of the learning experience.

Conclusion

In conclusion, Ethereal merges the power of VR and AI to deliver a highly immersive and interactive educational experience. The potential of this approach extends far beyond Greek mythology, setting the stage for a new paradigm in digital learning. As the field of AR/VR continues to evolve, projects like Ethereal underscore the immense potential of these technologies in reshaping education. The integration of immersive learning environments with advanced AI systems offers a promising direction for the future of education. As more researchers and developers explore this intersection, we can look forward to a new era of experiential learning that bridges the gap between knowledge and exploration.

Acknowledgments

Special shoutout to our professor Douglas Lanman for all the support along the way.

References
  1. "Labster | Award-Winning Virtual Lab Simulations." Labster, 2023, https://www.labster.com/.
  2. "The largest platform for VR educational experiences." Unimersiv, 2023, https://unimersiv.com/.
  3. "Whisper: An Automatic Speech Recognition System." OpenAI, 2023, https://openai.com/research/whisper-asr/.
  4. "ChatGPT: A large-scale system for writing assistance." OpenAI, 2023, https://openai.com/research/chatgpt/.
  5. "Text-to-Speech Solutions." Eleven Labs, 2023, https://www.eleven-labs.com/.
  6. Zhang, Yizhe, et al. "DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation." arXiv, 2019, arXiv:1911.00536.
  7. Merchant, Zahira, et al. "Effectiveness of virtual reality-based instruction on students' learning outcomes in K-12 and higher education: A meta-analysis." Computers & Education, 2014.
  8. Radford, Alec, et al. "Language models are unsupervised multitask learners." Openai Blog, 2019.
  9. Yu, Kai, et al. "Training Temporally-Consistent Generalized Zero-Shot Learning Model via a Recurrent Variational Autoencoder." Proceedings of the AAAI Conference on Artificial Intelligence, 2019.

Links to other resources used: