Climate models produce output over decades or longer at high spatial and temporal resolution. Starting
values, boundary conditions, greenhouse gas emissions, and so forth make the climate model an uncertain
representation of the climate system. A standard paradigm for assessing the quality of climate model simulations
is to compare what these models produce for past and present time periods, to observations of the past
and present. Many of these comparisons are based on simple summary statistics called metrics. In this article,
we propose an alternative: evaluation of competing climate models through probabilities derived from tests of
the hypothesis that climate-model-simulated and observed time sequences share common climate-scale signals.
The probabilities are based on the behavior of summary statistics of climate model output and observational data
over ensembles of pseudo-realizations. These are obtained by partitioning the original time sequences into signal
and noise components, and using a parametric bootstrap to create pseudo-realizations of the noise sequences.
The statistics we choose come from working in the space of decorrelated and dimension-reduced wavelet coefficients.
Here, we compare monthly sequences of CMIP5 model output of average global near-surface temperature
anomalies to similar sequences obtained from the well-known HadCRUT4 data set as an illustration.