Existing approaches for SAR image registration focus on the global transformation correction between SAR images. However, there are often local deformations between images. Due to the time-changing viewpoint of video SAR, the images suffer a lot from local deformations, which can result in false alarms in moving target detection. This article presents an unsupervised image registration approach for the use of video SAR moving target detection, which has good registration performance and acceptable processing efficiency. The designed unsupervised learning-based framework is a cascade of two convolutional neural networks. The first network directly predicts the parameters of the rigid transformation between the reference and unregistered images, and recovers the global transformation between them. Then, the second network uses the reference image and the registered image from the first network as input and then predicts a displacement field. After that, we put a limitation on the predicted displacement field to prevent moving target shadows from being aligned. Finally, the displacement field with limitation is used to compensate local deformations between the two images. Processing results of real video SAR images have shown good performance of the proposed approach with convincing generation ability.