Many scientific and engineering problems involve physical modeling of complex processes. Sometimes multiple candidate models are available, and their performance can be compared by how well their outputs match observations. Various summary statistics can be used for this purpose, but no matter which statistics are chosen, it is important that comparisons based on them be considered in light of the inherent variability of the data used in their calculation. In this article, we consider the variability of a summary statistic through an empirical likelihood. The approach is nonparametric in the sense that a moving-block bootstrap procedure is used to obtain the empirical likelihood. Relative figures of merit for each candidate model are formed as the ratio of each candidate model's likelihood to the largest likelihood. We use a small simulation study to show that our procedure can correctly distinguish between different time series models, and then we demonstrate how the method can be used to evaluate the output of 20 Intergovernmental Panel on Climate Change (IPCC) atmospheric models based on their agreement with the observations. © 2011 California Institute of Technology. Government sponsorship acknowledged.