In this article, we propose a scalable Gaussian process (GP) regression method that combines the advantages of both global and local GP approximations through a two-layer hierarchical model using a variational inference framework. The upper layer consists of a global sparse GP to coarsely model the entire data set, whereas the lower layer comprises a mixture of sparse GP experts which exploit local information to learn a fine-grained model. A two-step variational inference algorithm is developed to learn the global GP, the GP experts and the gating network simultaneously. Stochastic optimization can be employed to allow the application of the model to large-scale problems. Experiments on a wide range of benchmark data sets demonstrate the flexibility, scalability and predictive power of the proposed method.