Our proposed architecture for GaitRef and GaitMix. Trapezoids are trainable modules, and modules of the same color in the same model share the weight. Dashed lines are the operation of feature copying. S and J are the input silhouettes and skeletons. F
S represents silhouette features, while F
J and F
J∗ represent skeleton features from input and refined skeletons, respectively.
Architecture of the skeleton correction network. F
JP is the skeleton features after average pooling. We concatenate the joint position J with its feature F
J along with the global feature after pooling F
JP and the silhouette feature F
S before sending it into the decoder for calculating the position difference ∆J for each frame. Decoders at different timestamps share weights.
Comment: Combines skeleton poses with Graph Convolutional Network (GCN) to obtain a modern model-based approach for gait recognition.