Purpose: The purpose of this work is to determine the statistical correlation between per-beam, planar IMRT QA passing rates and several clinically relevant, anatomy-based dose errors for per-patient IMRT QA. The intent is to assess the predictive power of a common conventional IMRT QA performance metric, the Gamma passing rate per beam. Methods: Ninety-six unique data sets were created by inducing four types of dose errors in 24 clinical head and neck IMRT plans, each planned with 6 MV Varian 120-leaf MLC linear accelerators using a commercial treatment planning system and step-and-shoot delivery. The error-free beams/plans were used as "simulated measurements" (for generating the IMRT QA dose planes and the anatomy dose metrics) to compare to the corresponding data calculated by the error-induced plans. The degree of the induced errors was tuned to mimic IMRT QA passing rates that are commonly achieved using conventional methods. Results: Analysis of clinical metrics (parotid mean doses, spinal cord max and D 1cc, CTV D 95, and larynx mean) vs IMRT QA Gamma analysis (3%/3 mm, 2/2, 1/1) showed that in all cases, there were only weak to moderate correlations (range of Pearson's r -values: -0.295 to 0.653). Moreover, the moderate correlations actually had positive Pearson's r -values (i.e., clinically relevant metric differences increased with increasing IMRT QA passing rate), indicating that some of the largest anatomy-based dose differences occurred in the cases of high IMRT QA passing rates, which may be called "false negatives." The results also show numerous instances of false positives or cases where low IMRT QA passing rates do not imply large errors in anatomy dose metrics. In none of the cases was there correlation consistent with high predictive power of planar IMRT passing rates, i.e., in none of the cases did high IMRT QA Gamma passing rates predict low errors in anatomy dose metrics or vice versa. Conclusions: There is a lack of correlation between conventional IMRT QA performance metrics (Gamma passing rates) and dose errors in anatomic regions-of-interest. The most common acceptance criteria and published actions levels therefore have insufficient, or at least unproven, predictive power for per-patient IMRT QA. © 2011 American Association of Physicists in Medicine.