← Back to Blog

Kaggle Competition Silver Medal: Child Mind Institute-Detect Behavior with Sensor Data

September 23, 2025

kaggle

I received my Silver Medal achievement in the Child Mind Institute Detect Behavior with Sensor Data Kaggle competition, marking my second participation in a Child Mind Institute challenge and my closest result yet to a gold medal.

This project leveraged wrist-worn multimodal sensor data—movement, temperature, and proximity—to distinguish body-focused repetitive behaviors (BFRBs) from everyday gestures. The objective was to improve the accuracy of wearable BFRB-detection systems, supporting better diagnosis and treatment of mental health conditions involving compulsive behaviors.

The dataset comprised three modalities: the Inertial Measurement Unit (IMU), including accelerometer, gyroscope, and magnetometer readings; the Thermopile (THM), capturing non-contact infrared temperature; and the Time-of-Flight (ToF) sensor, providing infrared distance measurements. The evaluation metric combined binary F1 (BFRB vs. non-target) and macro F1 (fine-grained gesture classification), each contributing 50% to the final score.

We developed a two-branch model that jointly learns IMU and ToF/THM features and applied subject-level Hungarian matching for optimal label alignment. This approach achieved 16th place among 2,657 teams—less than 0.001 behind the gold-medal threshold.

Our solution pipeline consisted of four key stages:

  • Stage 1 — Data preprocessing
    • Merged demographic metadata and standardized handedness by mirroring left-handed sequences to a unified right-hand reference frame through channel swapping and sign inversion.
    • Engineered physics-informed IMU features: gravity-free linear acceleration, quaternion-based angular velocity, and angular distance metrics.
    • Aggregated ToF infrared distance maps into statistical features (mean, standard deviation, minimum, maximum) per sensor while preserving missing pixels as NaN.
    • Applied forward/backward fill and zero imputation for missing values, normalized with StandardScaler, and pre-padded sequences to fixed length aligned with action endpoints.
  • Stage 2 — Model architecture and training
    • Designed a dual-branch Residual SE-CNN architecture:
      • IMU branch: split into acceleration and rotation processing subroutes
      • ToF/THM branch: dedicated to temperature and proximity data fusion
    • Integrated a learnable gating mechanism to dynamically weight the ToF/THM branch, enhancing robustness for IMU-only samples.
    • Optimized with composite loss function: Cross-Entropy + Triplet (hard mining) + gate supervision (0.2×BCE).
    • Employed AdamW optimizer with cosine learning rate scheduling and StratifiedGroupKFold cross-validation (grouped by subject).
    • Augmented training data through temporal jittering, scaling, and random modality dropout (p=0.25) to improve temporal and multimodal generalization.
  • Stage 3 — Inference and postprocessing
    • Maintained preprocessing consistency during inference and ensemble predictions across 5 cross-validation folds.
    • Applied subject-level Hungarian assignment algorithm to enforce globally optimal label matching across sequences from the same participant, significantly stabilizing fine-grained gesture predictions.
  • Stage 4 — Evaluation and results
    • Achieved 5-fold cross-validation score of 0.855 and Kaggle leaderboard score of 0.850, surpassing the IMU-only baseline (CV≈0.820) by +0.03–0.04.
    • Demonstrated robust performance across both all-sensor and IMU-only test subsets, validating the effectiveness of gated fusion architecture and physics-informed feature engineering.

You can find the complete implementation and code details on GitHub.