Background: Polygenic scores (PGSs), which assess the genetic risk of individuals for a disease, are calculated as a weighted count of risk alleles identified in genome-wide association studies. PGS methods differ in which DNA variants are included and the weights assigned to them; some require an independent tuning sample to help inform these choices. PGSs are evaluated in independent target cohorts with known disease status. Variability between target cohorts is observed in applications to real data sets, which could reflect a number of factors, e.g., phenotype definition or technical factors. Methods: The Psychiatric Genomics Consortium Working Groups for schizophrenia and major depressive disorder bring together many independently collected case-control cohorts. We used these resources (31,328 schizophrenia cases, 41,191 controls; 248,750 major depressive disorder cases, 563,184 controls) in repeated application of leave-one-cohort-out meta-analyses, each used to calculate and evaluate PGS in the left-out (target) cohort. Ten PGS methods (the baseline PC+T method and 9 methods that model genetic architecture more formally: SBLUP, LDpred2-Inf, LDpred-funct, LDpred2, Lassosum, PRS-CS, PRS-CS-auto, SBayesR, MegaPRS) were compared. Results: Compared with PC+T, the other 9 methods gave higher prediction statistics, MegaPRS, LDPred2, and SBayesR significantly so, explaining up to 9.2% variance in liability for schizophrenia across 30 target cohorts, an increase of 44%. For major depressive disorder across 26 target cohorts, these statistics were 3.5% and 59%, respectively. Conclusions: Although the methods that more formally model genetic architecture have similar performance, MegaPRS, LDpred2, and SBayesR rank highest in most comparisons and are recommended in applications to psychiatric disorders.