A mechanism for directing the virtual agent's gaze based on sound and visual input

Student: Slaven Bakula

This work focuses on enhancing the multimodal interaction capabilities of the PLEA virtual agent through the integration of real-time visual and auditory perception.

The developed system enables more natural nonverbal communication by incorporating features such as gaze direction and sound-source localization. A face detection and head-pose estimation module was implemented in Python, allowing the system to determine the position and orientation of a user in real time. In parallel, sound localization was achieved using the LyraTD-MSC module, enabling the agent to respond to the direction of incoming audio signals.

Both visual and auditory data are transmitted via socket communication to a virtual agent implemented in Unreal Engine, where the agent adapts its behaviour accordingly—establishing eye contact and orienting toward the user.

This work contributes to the development of more socially aware and responsive virtual agents, supporting advanced human–AI interaction through multimodal sensing and real-time behavioural adaptation.

Repository: https://repozitorij.fsb.unizg.hr/object/fsb:8196