Structure-Aware Human-Action Generation
Generating long-range skeleton-based human actions has been a challenging problem since small deviations of one frame can cause an malformed action sequence. Most existing methods borrow ideas from video generation that naively treat skeleton nodes/joints as pixels of images without considering the rich inter-frame and intra-frame structure information, leading to potential distorted actions. Graph convolutional networks (GCNs) could leverage structure information to learn structure representations. However, adopting GCNs to tackle such continuous action sequences both in spatial and temporal space is challenging as the action graph could be huge. To overcome this challenge, we propose a variant of GCNs to leverage the self-attention mechanism to prune a complete action graph in the temporal space. Our method could dynamically attend to past important frames and construct a sparse graph to apply in the GCN framework, well capturing the structure information in action sequences. Extensive experimental results demonstrate the superiority of our method on two standard human action datasets compared with existing methods."