3-D DECONVOLUTIONAL NETWORKS FOR THE UNSUPERVISED REPRESENTATION LEARNING OF HUMAN MOTIONS
Abstract
The major obstacle for learning-based RF sensing is to obtain a high-quality large-scale annotated dataset. However, unlike visual datasets that can be easily annotated by human workers, RF signal is non-intuitive and non-interpretable, which causes the annotation of RF signals time-consuming and laborious. To resolve the rapacious appetite of annotated data, we propose a novel unsupervised representation learning (URL) framework for RF sensing, RF-URL, to learn a pre-training model on large-scale unannotated RF datasets that can be easily collected. RF-URL utilizes a contrastive framework to mind the gap between signal-processing-based RF sensing and learning-based RF sensing
Existing System
? To this end, we have noted that the core of contrastive framework lies in building positive and negative pairs to learn their inherent consistencies and discrepancies, while existing methods generally utilize data augmentation to construct the positive and negative pairs, which is designed for visual images but are not able to avoid shortcuts for RF signals. ? Network Structure. For fair comparison with the existing methods, we do not adopt the same backbone in Section 5. Instead, our human silhouette generation network, named as RFSG, follows the design of RF-Pose, which uses two RF encoding networks to extract features from vertical and horizontal RF signals.
Disadvantages
? A translator is utilized as a mediator to embed different signal representations of RF signals into a unified metric space to avoid convergence problem, and a predictor with stop-gradient operation is proposed to improve the performance of RF-URL. ? Translator is utilized as a mediator to embed different signal representations of RF signals into a unified metric space to avoid convergence problem. ? We solve this problem by shuffling BN that trains with multiple GPUs and performs BN on the samples independently for each GPU
Proposed System
? By learning a general semantic information for various sensing tasks from different signal representations, the proposed RF-URL could enhance the sensing performance in an unsupervised manner ? It is noted that the proposed PEN model works only for single-user case since only single-user dataset has been accessible. ? These existing works mainly rely on supervised learning which requires large-scale annotated RF datasets, while the proposed framework exploits unannotated data for model training
Advantages
? Since almost all RF sensing tasks can be seen as a combination of above three tasks, and two most widely used RF signals are WiFi and radar signals, with the above three RF sensing tasks, it is sufficient to demonstrate the universality of the RF-URL framework. ? The first dataset (non number dataset) collects the widely used hand gestures for human-computer interaction, which contains 38687 samples ? The commonly used contrastive learning frameworks include memory bank method InsDis MoCo and contrastive multiview coding (CMC) ), big batchsize, clustering SwAV transformer MoCov3and DINO and negative-pairs-free methods.
