Humans are endowed with the ability to grasp the overall meaning or the gist of a complex visual scene at a glance. We need only a fraction of a second to decide if a scene is indoors, outdoors, on a busy street, or on a clear beach. In recent years, computational gist recognition or scene categorization has been actively pursued, given its numerous applications in image and video search, surveillance, and assistive navigation. Many visual descriptors have been developed to address the challenges in scene categorization, including the large number of semantic categories and the tremendous variations caused by imaging conditions. This paper provides a critical review of visual descriptors used for scene categorization, from both methodological and experimental perspectives. We present an empirical study conducted on four benchmark data sets assessing the classification accuracy and class separability of state-of-the-art visual descriptors.