Smartphone-sensors-based human activity recognition is attracting increasing interest due to the popularization of smartphones. It is a difficult long-range temporal recognition problem, especially with large intraclass distances such as carrying smartphones at different locations and small interclass distances such as taking a train or subway. To address this problem, we propose a new framework of combining short-term spatial/frequency feature extraction and a long-term independently recurrent neural network (IndRNN) for activity recognition. Considering the periodic characteristics of the sensor data, short-term temporal features are first extracted in the spatial and frequency domains. Then, the IndRNN, which can capture long-term patterns, is used to further obtain the long-term features for classification. Given the large differences when the smartphone is carried at different locations, a group-based location recognition is first developed to pinpoint the location of the smartphone. The Sussex-Huawei Locomotion (SHL) dataset from the SHL Challenge is used for evaluation. An earlier version of the proposed method won the second place award in the SHL Challenge 2020 (first place if not considering the multiple models fusion approach). The proposed method is further improved in this paper and achieves 80.72% accuracy, better than the existing methods using a single model.