%0 Journal Article %T Multimodal Digital Phenotyping for Bipolar Disorder: Robust Mood-State Classification and Early Relapse Risk Monitoring %A Rocco de Filippis %A Abdullah Al Foysal %J Open Access Library Journal %V 12 %N 12 %P 1-19 %@ 2333-9721 %D 2025 %I Open Access Library %R 10.4236/oalib.1114600 %X Bipolar disorder (BD) is characterized by recurrent transitions between manic, depressive, and euthymic states, yet continuous symptom monitoring remains a major clinical challenge. We present a multimodal digital phenotyping framework for fine-grained BD mood-state classification and relapse-risk monitoring using naturalistic facial video, voice audio, and phone-usage metadata. The proposed architecture employs modality-specific encoders with late-fusion logits to learn disentangled representations of affective, prosodic, and behavioural signals. Across a moderately imbalanced but clinically representative dataset, the model achieves near-perfect validation performance, including a 100% final accuracy and a strictly diagonal confusion matrix, indicating complete separation between euthymic, depressive, and manic classes. t-SNE visualizations show well-defined clusters at the embedding level for each individual modality and even tighter grouping in the fused representation, suggesting robust cross-modal alignment. An ablation analysis confirms that facial affect provides the strongest single-modality predictive signal (98.8% accuracy), while combining voice and facial features yields the highest bi-modal performance (99.0%), closely followed by the full multimodal system (98.5%). We further demonstrate a relapse-risk layer that transforms predicted mood probabilities into a continuous risk score, triggering alerts when a calibrated clinical threshold is crossed. Although the results are strong, we critically examine the possibility of data leakage and overfitting underlying ¡°perfect¡± validation learning curves. To ensure realistic clinical utility, we outline subject-wise evaluation, temporal blocking, calibration strategies, and privacy-preserving deployment considerations. Class proportions (euthymic ¡Ö 1000, depressive ¡Ö 534, manic ¡Ö 468) reflect real-world prevalence patterns rather than strict balance. Overall, our findings highlight the promise of low-burden multi-modal monitoring for BD while emphasizing the methodological rigor and safeguards required for real-world translation.
%K Bipolar Disorder %K Digital Phenotyping %K Multimodal Learning %K Face/Voice/Phone %K Mood Classification %K Relapse Prediction %K T-SNE %K Ablation %U http://www.oalib.com/paper/6880095