Representation learning for wearable data based mental health and well-being recognition


Huiyuan Yang, Kusha Sridhar, and Akane Sano.


Huiyuan Yang at hy48@rice.edu


Millions of people around the world are suffering from mental and physical wellbeing related illnesses such as depression, anxiety, bipolar disorder and schizophrenia. Mental health disorders not only debilitate physical wellbeing but further permeate into societies, crippling institutions. Timely interventions and consistent support is essential to help alleviate this mental health crisis. With an attempt to solve the unaffordable and expensive nature of conventional in-person mental health care and therapy sessions, people are turning to digital solutions such as mobile applications and wearable devices. Wearable devices (fitbit, smartwatch, halo, etc.,) are easy-to-use, very affordable and privacy-preserving. They have emerged as popular instruments in healthcare applications, such as for well-being management, elderly care, medical diagnosis and mental health and well-being detection. Although these devices are prevalent and ubiquitous in digital healthcare, they lack thorough scientific validation, reliable decision making, specificity, generalizability and moreover the right features, impeding their widespread usage and hampering their prediction capabilities. Therefore, it is imperative to build a sense of objectivity and reliability into models, without compromising too much on prediction accuracies. Affective computing is most relevant tool to achieve this objective where the advancement of machine learning technologies and multimodal sensors including wearable sensors can be successfully leveraged to recognize, interpret, process and simulate human emotions and provide solutions to mental health problems and treat physical wellbeing.
Machine learning models to detect or predict mental health and wellbeing using wearable devices have been developed. However, majority of the algorithms have used hand-crafted features from wearable sensor data or myopic models have been designed to solve a specific downstream task. These type of features fall short when aiming for a generalized solution that is robust to diverse conditions such as inconsistencies in input conditions (lab versus field setting), multimodal cues (e.g., acoustic, visual, semantic, physiological) from different individuals, biases introduced by the demographic factors and accounting for the long and short-term context in the signals. As a consequence, when emotions are coded into digital systems based on poorly understood features, the models are heavily dependent on shaky parameters. Therefore, it is important to learn meaningful representations and assert the relevance of these representations with high confidence. While extensive researches have gone into the design of representation learning frameworks in the domain of computer vision (e.g., ResNet-18, VGG, Inception) and natural language processing (e.g., Transformer, BERT, GPT-3), none exists for affective computing with wearable data. An effective feature learning framework will favor fair comparison and reproducibility, facilitate more efficient multimodal research, and increase the overall robustness of affective computing algorithms for healthcare applications. Representation learning provides the perfect platform to create meaningful and robust features and further appeals to the interpretability aspects of the machine learning models developed during the process. Studies on representation learning are scattered across different venues targeting computer vision, speech or natural language processing, but few are explored in the area of wearable computing [4, 3, 2] and previous installations of ACII [4, 1, 5].
The scope of this proposed special session is representation learning for affective computing using wearable or mobile device data (e.g. physiological, behavioral, speech, audio, facial data) and understanding it through the lens of robustness, generalizability and statistical uncertainties. Interpretability and representation learning is one of the main aspects of neural networks. Thus, the scope of this proposal to the special session is well aligned with the scope of the ACII. The aim of this session is to establish a platform for engineers, scientists, and practitioners from both academia and industry, to present and discuss innovations in representation learning for wearable data and its application for mental health and well-being recognition. The organizing committee of this session is motivated to build a solid reference within the computational intelligence community for the affective computing field. This unique opportunity will stimulate and encourage researchers working on, but not limited to:

  • Uncertainty-aware representation learning – asserting confidence over the learned representations in predicting affect using wearable data. This can be helpful in human-in-the-loop solutions.
  • Proxy label generation approaches such as self-training, multi-view training, self-ensembling and knowledge distillation for wearable data based representation learning
  • Transformer based models to alleviate challenges in wearable data such as missing modalities, noisy signals, long context dependencies
  • Information theoretic approaches to learn representations from wearable data
  • Statistical methods for wearable data analysis
  • Self-reported label validation strategies


Session structure
The papers accepted for this session will be selected for oral/poster presentations at the conference.
[1] Surjya Ghosh, Shivam Goenka, Niloy Ganguly, Bivas Mitra, and Pradipta De. Representation learning for emotion recognition from smartphone keyboard interactions. In 2019 8th International Conference on Affective Computing and Intelligent Interaction (ACII), pages 704–710, 2019.
[2] Arinbj¨orn Kolbeinsson, Piyusha Gade, Raghu Kainkaryam, Filip Jankovic, and Luca Foschini. Self-supervision of wearable sensors time-series data for influenza detection. arXiv preprint arXiv:2112.13755, 2021.
[3] Kyle Ross, Paul Hungler, and Ali Etemad. Unsupervised multi-modal representation learning for affective computing with multi-corpus wearable data. Journal of Ambient Intelligence and Humanized Computing, pages 1–26, 2021.
[4] Juan Vazquez-Rodriguez. Using multimodal transformers in affective computing. In 2021 9th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW), pages 1–5. IEEE, 2021.
[5] Guangyi Zhang and Ali Etemad. Deep recurrent semi-supervised eeg representation learning for emotion recognition. In 2021 9th International Conference on Affective Computing and Intelligent Interaction (ACII), pages 1–8, 2021.