The census place project: A method for geolocating unstructured place names

Date

2022-09-20

Department

Program

Citation of Original Publication

Berkes, Enrico, Ezra Karger, and Peter Nencka. “The Census Place Project: A Method for Geolocating Unstructured Place Names.” Explorations in Economic History, Methodological Advances in the Extraction and Analysis of Historical Data, 87 (January 1, 2023): 101477. https://doi.org/10.1016/j.eeh.2022.101477.

Rights

This item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.
CC BY 4.0 DEED Attribution 4.0 International

Subjects

Abstract

Researchers use microdata to study the economic development of the United States and the causal effects of historical policies. Much of this research focuses on county- and state-level patterns and policies because comprehensive sub-county data is not consistently available. We describe a new method that geocodes and standardizes the towns and cities of residence for individuals and households in decennial census microdata from 1790–1940. We release public crosswalks linking individuals and households to consistently-defined place names, longitude-latitude pairs, counties, and states. Our method dramatically increases the number of individuals and households assigned to a sub-county location relative to standard publicly available data: we geocode an average of 83% of the individuals and households in 1790–1940 census microdata, compared to 23% in widely-used crosswalks. In years with individual-level microdata (1850–1940), our average match rate is 94% relative to 33% in widely-used crosswalks. To illustrate the value of our crosswalks, we measure place-level population growth across the United States between 1870 and 1940 at a sub-county level, confirming predictions of Zipf’s Law and Gibrat’s Law for large cities but rejecting similar predictions for small towns. We describe how our approach can be used to accurately geocode other historical datasets.