Segmentation of human actions is a major research problem in video understanding. A number of existing approaches demonstrate that performing action segmentation before action recognition results in better recognition performance. In this paper, we address the problem of action segmentation in an online manner. We first extend the clustering-based image segmentation approach into a temporal one, where hierarchical supervoxel levels for action segmentation are generated accordingly. We then propose a streaming approach to flatten the hierarchical levels into one based on Uniform Entropy Slice (UES), in order to preserve important information in the video. The flattened level contains the silhouette of a huamn with the structure of body parts labelled in different labels. We then combine the human structure information and the original video frames to & #x201C;strengthen & #x201D; the action in a video, which paves the way for accurate action recognition. The experimental results show that our online approach achieves satisfactory performance regarding action segmentation or recognition on various publicly available datasets, including the DAVIS dataset, the UCF Sports dataset,and the KTH dataset.