Multimodal Emotion Recognition Based on Video and Physiological Signals

Jimmc414 · 2025-01-07T18:44:45 1736275485

Proposed MVP architecture combines facial expressions and body signals (heart rate, sweat response) using transformers that can analyze longer 1-2 minute clips, outperforming previous systems by better integrating voluntary (facial) and involuntary (physiological) responses for emotion detection. The model utilizes cross-attention where facial data provides keys/values and physiological data provides queries.