Identifying Urban Areas by Combining Human Judgment and Machine Learning : An Application to India
This paper proposes a methodology for identifying urban areas that combines subjective assessments with machine learning, and applies it to India, a country where several studies see the official urbanization rate as an under-estimate. For a repres...
| Main Authors: | , , | 
|---|---|
| Format: | Working Paper | 
| Language: | English | 
| Published: | 
        
      World Bank, Washington, DC    
    
      2020
     | 
| Subjects: | |
| Online Access: | http://documents.worldbank.org/curated/en/920791582554716856/Identifying-Urban-Areas-by-Combining-Human-Judgment-and-Machine-Learning-An-Application-to-India http://hdl.handle.net/10986/33392  | 
| Summary: | This paper proposes a methodology for
            identifying urban areas that combines subjective assessments
            with machine learning, and applies it to India, a country
            where several studies see the official urbanization rate as
            an under-estimate. For a representative sample of cities,
            towns and villages, as administratively defined, human
            judgment of Google images is used to determine whether they
            are urban or rural in practice. Judgments are collected
            across four groups of assessors, differing in their
            familiarity with India and with urban issues, following two
            different protocols. The judgment-based classification is
            then combined with data from the population census and from
            satellite imagery to predict the urban status of the sample.
            The Logit model, and LASSO and random forests methods, are
            applied. These approaches are then used to decide whether
            each of the out-of-sample administrative units in India is
            urban or rural in practice. The analysis does not find that
            India is substantially more urban than officially claimed.
            However, there are important differences at more
            disaggregated levels, with “other towns” and “census towns”
            being more rural, and some southern states more urban, than
            is officially claimed. The consistency of human judgment
            across assessors and protocols, the easy availability of
            crowd-sourcing, and the stability of predictions across
            approaches, suggest that the proposed methodology is a
            promising avenue for studying urban issues. | 
|---|