The existing residential building stock accounts for a substantial portion of worldwide energy consumption and greenhouse emissions. Improvements to the thermal performance of existing buildings is a vital activity to mitigate climate change, and often has additional benefits in the form of improved comfort, health and well-being for occupants. Despite the extensive body of literature in this area, it remains a difficult task to assess the performance of retrofit packages in occupied residential buildings. Experimental methods often fail to isolate the effect of retrofits from the numerous confounding factors, while modelling studies are prone to uncertainties and simplifications. The aim of this paper is to provide a critical review of previous studies that have applied experimental and simulation techniques to evaluate thermal retrofits, with a focus on data collection and simulation methods. Specifically, we compare monitoring campaigns in terms of monitored parameters, duration of monitoring campaign, temporal resolution and data application. Additionally, we investigate how data-driven building performance simulation may be used to improve predictive capacity and develop robust retrofit solutions. We identified a range of approaches within the literature, with a bias towards simulating simple performance models over detailed data-driven analysis. A recommendation is provided for a systematic approach that employs both intensive monitoring campaigns and robust prediction methods to improve retrofit evaluation, and support assessment of the reliability and shortcomings of evaluation data.