Constructing and analyzing functional brain networks (FBN) has become a promising approach to brain disorder classification. However, the conventional successive construct-and-analyze process would limit the performance due to the lack of interactions and adaptivity among the subtasks in the process. Recently, Transformer has demonstrated remarkable performance in various tasks, attributing to its effective attention mechanism in modeling complex feature relationships. In this paper, for the first time, we develop Transformer for integrated FBN modeling, analysis and brain disorder classification with rs-fMRI data by proposing a Diffusion Kernel Attention Network to address the specific challenges. Specifically, directly applying Transformer does not necessarily admit optimal performance in this task due to its extensive parameters in the attention module against the limited training samples usually available. Looking into this issue, we propose to use kernel attention to replace the original dot-product attention module in Transformer. This significantly reduces the number of parameters to train and thus alleviates the issue of small sample while introducing a non-linear attention mechanism to model complex functional connections. Another limit of Transformer for FBN applications is that it only considers pair-wise interactions between directly connected brain regions but ignores the important indirect connections. Therefore, we further explore diffusion process over the kernel attention to incorporate wider interactions among indirectly connected brain regions. Extensive experimental study is conducted on ADHD-200 data set for ADHD classification and on ADNI data set for Alzheimer’s disease classification, and the results demonstrate the superior performance of the proposed method over the competing methods.