Skeleton based human action recognition is an important task in computer vision. However, it is very challenging due to the complex spatio-temporal variations of skeleton joints. In this work, we propose an end-to-end trainable network consisting of a Deep Convolutional Model (DCM) and a Self-Attention Model (SAM) for human action recognition from skeleton data. Specifically, skeleton sequences are encoded into color images and fed into DCM to extract deep features. In the SAM, handcrafted features representing the motion degree of joints are extracted and the attention weights are learned by a simple yet effective linear mapping. The effectiveness of proposed method has been verified on NTU RGB+D, SYSU-3D and UTD-MHAD datasets and achieved state-of-the-art results.