This paper introduces a novel Gaussian process (GP) classification method that combines advantages of global and local GP approximators through a two-layer hierarchical model. The upper layer consists of a global sparse GP to coarsely model the entire dataset. The lower layer is a mixture of GP experts which uses local information to learn a fine-grained model. A variational inference algorithm is developed for simultaneous learning of the global GP, the experts and the gating network. Stochastic optimization can be employed for large-scale problems. Experiments on benchmark binary classification datasets demonstrate the advantages of the method in terms of scalability and classification accuracy.