With the advance of deep learning, deep learning based action recognition is an important research topic in computer vision. The skeleton sequence is often encoded into an image to better use Convolutional Neural Networks (ConvNets) such as Joint Trajectory Maps (JTM). However, this encoding method cannot effectively capture long temporal information. In order to solve this problem, This paper presents an effective method to encode spatial-temporal information into color texture images from skeleton sequences, referred to as Temporal Pyramid Skeleton Motion Maps (TPSMMs), and Convolutional Neural Networks (ConvNets) are applied to capture the discriminative features from TPSMMs for human action recognition. The TPSMMs not only capture short temporal information, but also embed the long dynamic information over the period of an action. The proposed method has been verified and achieved the state-of-the-art results on the widely used UTD-MHAD, MSRC-12 Kinect Gesture and SYSU-3D datasets.