Tutorial date/time
September 10 from 13:30 – 16:30
Room E14-240
Tutorial presenter(s)
Dr. Siyuan Chen, University of New South Wales (Siyuan.chen@unsw.edu.au)
Dr. Ting Dang, Nokia Bell Labs/ University of Cambridge (ting.dang@nokia-bell-labs.com)
Prof. Julien Epps, University of New South Wales (j.epps@unsw.edu.au)
Tutorial description
Multimodal processing for affect analysis is instrumental in enabling natural human-computer interaction, facilitating health and wellbeing, and enhancing overall quality of life. While different modalities, including facial expression, brain waves, speech, skin conductance and blood volume offer valuable insights, modalities like eye behavior and audio provide exceptionally rich information and can be easily and non-invasively collected in mobile contexts without physical movement restrictions. Such rich information is highly correlated with cognitive and affective states, and is reflected not only in conventional eye and speech behavior such as gaze, pupil size, blink, linguistics and paralinguistic, but also newly developed behavior descriptors such as eyelid movement, the interaction between the eyelid, iris and pupil, eye action units, heart and breathing sensing through in-ear microphones, abdominal sound sensing via custom belt-shaped wearables, and the sequence and coordination of multimodal behavior events. The high-dimensional nature of the available information makes eye and audio sensing ideal for multimodal affect analysis. However, fundamental and state-of-the-art eye and audio behavior computing has not been widely introduced to the audience in the form of tutorial. Meanwhile, advancements in wearables and head-mounted devices like Apple Vision Pro, smart glasses or VR make them the likely next generation of computing devices, providing novel opportunities to explore new types of eye behavior and new methods of body sound sensing for affect analysis and modelling. Therefore, this tutorial will focus on eye and audio modality computing, using an eye camera and a microphone as examples, and multimodal wearable computing approaches, using the modalities of the eye, speech and head movement as examples, aiming to propel the development of future multimodal affective computing systems in diverse domains.
Structure and Contents
This tutorial contains four parts, with the full program shown in the table below.
Time | Program |
13:40-14:40 | Overview: Background of present sensing modalities and technologies, and the motivations for eye and audio processing in affect analysis |
14:40-15:10 | Part 1 Introduction to camera-based eye behaviour computing for affect Wearable devices to sense eye information. Eye behaviour types (gaze, pupil size, blink, saccade, eyelid shape etc.) and their relationships with affect. Computational methods for eye behaviour analysis. Issues in experiment design, including data collection, feature extraction and selection, machine learning pipeline, in-the-wild data, bias. Available datasets, off-the-shelf tools and how to get started. Future directions and challenges. |
15:10-15:30 | Part 2 Audio analysis for affect computing using a single microphone Introduction to wearable audio for affect (sensors and wearable audio devices; audio types; relevance to affective computing, applications in healthcare, etc.). Exploration of innovative body sound audio sensing for affect analysis. Speech and audio processing analysis, machine learning pipelines for affect computing Future directions and challenges (emerging trends and technologies, e.g., augmented reality and personalized audio; ethical considerations, e.g. privacy and security, etc.). |
15:10-15:30 | Coffee Break |
15:30-16:00 | Part 3 Multimodality (focus on eye camera, microphone and IMU sensors) Motivation for multimodal approaches (performance increase, redundancy, different types of information, context). What multimodal approaches can contribute to assessing affect and cognition (benefits of multimodal specifically in the context of affect/cognition). Approaches for multimodal analysis, modelling and system design (fusion, statistical features vs. event feature based, analysis methods). Examples of multimodal system designs and their benefits.Applications of multimodal systems and use case considerations.Future directions and challenges. |
16:00-16:30 | Part 4 Interactive research design activity Discussion about processing eye and speech/audio behaviour, applications and challenges in practice. Students/researchers will: present their own related projects,share experience on using different modalities/approaches in their applications, discuss future research plans or directions on the modalities, approaches and applications they would like to adopt. |
Tutorial materials
The tutorial will be delivered by the three organizers. Each will deliver one part, which includes practical examples and time for interactive questions. The final part of the discussion, which is an interactive session, will be coordinated by the three organizers. It is based on the following references, which will be provided beforehand. The presentation slides as well as the notes and videos from the tutorial will be made available after the tutorial on ACII2023 website and https://ireye4task.github.io/acii_tutorial.html.
Key References
D. W. Hansen and Q. Ji. “In the Eye of the Beholder: A Survey of Models for Eyes and Gaze”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, pp. 478-500, 2010.
K. Holmqvist, et al. “Eye tracking: empirical foundations for a minimal reporting guideline.” Behavior research methods 55.1: 364-416, 2023.
RA, Khalil, et al. “Speech emotion recognition using deep learning techniques: A review.” IEEE Access 7 (2019): 117327-117345, 2019.
Y. Wang, et al., “A systematic review on affective computing: Emotion models, databases, and recent advances”, Information Fusion, vol. 83, pp.19-52, 2022.
Contact:
Siyuan Chen, University of New South Wales (Siyuan.chen@unsw.edu.au)
General enquiries to ACII2023 Tutorial Chairs:
Emily Mower Provost (University of Michigan): emilykmp@umich.edu
Albert Ali Salah (Universiteit Utrecht): a.a.salah@uu.nl