The census place project: A method for geolocating unstructured place names
Loading...
Collections
Author/Creator
Author/Creator ORCID
Date
2022-09-20
Type of Work
Department
Program
Citation of Original Publication
Berkes, Enrico, Ezra Karger, and Peter Nencka. “The Census Place Project: A Method for Geolocating Unstructured Place Names.” Explorations in Economic History, Methodological Advances in the Extraction and Analysis of Historical Data, 87 (January 1, 2023): 101477. https://doi.org/10.1016/j.eeh.2022.101477.
Rights
This item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.
CC BY 4.0 DEED Attribution 4.0 International
CC BY 4.0 DEED Attribution 4.0 International
Subjects
Abstract
Researchers use microdata to study the economic development of the United States and the causal effects of historical policies. Much of this research focuses on county- and state-level patterns and policies because comprehensive sub-county data is not consistently available. We describe a new method that geocodes and standardizes the towns and cities of residence for individuals and households in decennial census microdata from 1790–1940. We release public crosswalks linking individuals and households to consistently-defined place names, longitude-latitude pairs, counties, and states. Our method dramatically increases the number of individuals and households assigned to a sub-county location relative to standard publicly available data: we geocode an average of 83% of the individuals and households in 1790–1940 census microdata, compared to 23% in widely-used crosswalks. In years with individual-level microdata (1850–1940), our average match rate is 94% relative to 33% in widely-used crosswalks. To illustrate the value of our crosswalks, we measure place-level population growth across the United States between 1870 and 1940 at a sub-county level, confirming predictions of Zipf’s Law and Gibrat’s Law for large cities but rejecting similar predictions for small towns. We describe how our approach can be used to accurately geocode other historical datasets.