Categorizing multiple objects in images is essentially a structured prediction problem: the label of an object is in general dependent on the labels of other objects in the image. We explicitly model object dependencies in a sparse graphical topology induced by the adjacency of objects in the image, which benefits inference, and then use maximum margin principle to learn the model discriminatively. Moreover, we propose a novel exact inference method, which is used in training to find the most violated constraint required by cutting plane method. A slightly modified inference method is used in testing when the target labels are unseen. Experiment results on both synthetic and real datasets demonstrate the improvement of the proposed approach over the state-of-the-art methods.