Reinforcement learning has emerged as a powerful tool to compose and adapt Web services in open and dynamic environments. However, the most common applications of reinforcement learning algorithms are relatively inefficient in their use of experience data, which may affect the stability of the learning process. In particular, they make just one learning update for each interaction experience. This paper proposes two novel algorithms that aim to achieve greater data efficiency by saving experience data and using it in aggregate to make updates to the learned policy. The first algorithm introduces an offline learning scheme for service composition where the online learning task is transformed into a series of supervised learning steps. The second algorithm presents a coordination mechanism in order to enable multiple agents to learn the service composition task cooperatively. The results of our experiments show the effectiveness of the proposed algorithms compared to their online learning counterparts.