Skeleton data is widely used in human action recognition for easy access, computational efficiency and environmental robustness. Recently, encoding skeleton sequences into color images becomes a popular preprocessing procedure to make use of the spatial modeling ability of convolutional neural network (CNN). Furthermore, inspired by relevant work in other fields, attention mechanism has been introduced to CNN based skeleton action recognition. In this paper, we propose a two-branch hierarchical attention model (HAN) for skeleton based action recognition. The proposed model consists of a base branch for spatial-temporal feature extraction and an attention branch for feature enhancement. In attention branch, we utilize auxiliary feature instead of intermediate feature to generate attention maps. Specifically, variance vectors of skeleton sequences are fused as motion saliency matrices to determine the contributions of each joint. Then the motion saliency matrices are sent into the hierarchical attention branch to obtain multiple attention maps. In order to make better use of the attention maps, two distinct combination schemes are proposed to link the two branches at feature extraction blocks and the fully connected layer. The entire model is integrated into an end-to-end trainable network. The efficacy of the proposed HAN has been verified on three benchmark datasets: NTU RGB+D Dataset, UTD-MHAD Dataset and SYSU-3D Dataset. The comparison results show that our approach outperforms the state-of-the-art methods on all datasets.