By considering the underlying neighbourhood structure of images, diffusion process can better evaluate image similarity and has proven highly effective in improving image retrieval. Nevertheless, diffusion process stores a large neighbourhood graph, costs more online retrieval time, and requires special algorithms other than simple Euclidean search. To address these issues, this paper proposes to treat diffusion process as a “black box” and directly model it by training deep neural networks, so as to obtain better image representation that assimilates the effect of diffusion process and works with Euclidean search. We firstly put forward a kernel mapping interpretation to diffusion process, and then formulate the modelling as a deep metric learning problem. The proposed approach is unsupervised in the sense that it needs neither image labels nor external datasets, and completely avoids online diffusion process in retrieval. More interestingly, we find that this approach could even achieve better retrieval than the original diffusion process, instead of merely approximating it. Experiments verify its effectiveness and investigate its appealing characteristics such as the generalisation to new image insertion.