Different from the existing train delay studies that had strived to explore sophisticated algorithms, this paper focuses on finding the bound of improvements on predicting multi-scenario train delays with different machine learning methods. Motivated by the observation of deep learning methods failing to improve the prediction performance if the delay occurs rarely, we present a novel augmented machine learning approach to improve the overall prediction accuracy further. Our solution proposes a rule-driven automation (RDA) method, including a delay status labeling (DSL) algorithm, and the resilience of section (RSE) and resilience of station (RST) indicators to generate the forecast for train delays. The experiment results demonstrate that the Random Forest based implementation of our RDA method (RF-RDA) can significantly improve the generalization ability of multivariate multi-step forecast models for multi-scenario train delay prediction. The proposed solution surpasses state-of-art baselines based on real-world traffic datasets, which treat various real-time delays differently. Even when the predictability of conventional deep learning methods decreases, the performance of our method is still acceptable for practical use to provide accurate forecasts.