P-Mix: A Data Augmentation Method for Contrastive Learning based Human Activity Recognition

Department

Program

Citation of Original Publication

Chen, Yingjie, Qi Xie, Wenxuan Cui, Liming Chen, Houbing Herbert Song, and Tao Zhu. “P-Mix: A Data Augmentation Method for Contrastive Learning Based Human Activity Recognition.” IEEE Transactions on Artificial Intelligence, August 25, 2025, 1–13. https://doi.org/10.1109/TAI.2025.3601599.

Rights

© 2025 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

Abstract

Supervised human activity recognition (HAR) with sensor data typically demands substantial labeled datasets to train robust models. Contrastive learning offers a self-supervised alternative by leveraging data augmentation to improve representation learning. However, most existing augmentation methods operate independently on either the time or channel dimension and often introduce unstructured noise, which can distort meaningful temporal and spectral patterns. To address these limitations, we present a novel P-Mix data augmentation method for contrastive learning in HAR tasks, specifically designed to be compatible with the SimCLR framework. P-Mix is a customized data augmentation method tailored to sensor data for human activity recognition, which slices and recombines both the time and channel dimensions, merging multiple temporal segments to encourage the model to explore the underlying relationships and variations in the data in an unsupervised setting. To capture motion cycles and long-term dependencies, we employ shorter temporal segments as fundamental processing units along the time dimension. By incorporating structured noise patterns based on motion cycle characteristics within these segments, we effectively enhance the model’s robustness and generalization capabilities. Extensive evaluations across five HAR benchmarks demonstrate that P-Mix achieves consistent improvements over the strongest baseline (Resample), delivering relative F1-score gains ranging from 1.87% (USC-HAD: 85.63% vs 83.93%) to 6.53% (DSADS: 97.24% vs 91.28%) through controlled multidimensional fusion. These results demonstrate the effectiveness of our approach in optimizing data generation and augmentation strategies for HAR tasks.