As part of my research activities, I have spent significant effort analyzing and collecting data from various historical and survey sources. Here you can find some of the products of those endeavors.


1. American Copyright Registrations and Authors’ 1850 County Residences

This interactive map explores the geography of American copyright registrations using the Library of Congress' collection of Early Copyright Title Pages, 1790–1870. Using OCR software, I extract author first and last names, broad topic, and year of registration from the raw image files. I then link copyright title-page records occurring between 1840 and 1860 to counties using the declassified 1850 census of population. The map allows users to browse total registrations, registrations per capita, and estimated topic categories derived from title-page text at the county level. A companion time-series view shows the long-run growth of book registrations, both in total counts and normalized by population.

Code: Website repository


2. Product by Firm Data: The Thomas Register in 1905

This dataset digitizes the 1905 edition of the Thomas Register, one of the central directories of American industrial firms, suppliers, and product categories. I extract product-by-firm level data for the year 1905 from the scanned PDF of the Thomas Register, a popular buyer's guide featuring major and minor manufacturing firms in the United States producing over 2,900 products across more than 75,000 firms. Below I show two exploratory views of the data: the distribution of listings across U.S. cities, and a city-product specialization matrix that highlights which cities appear unusually concentrated in particular product categories.

Figure 2. Spatial distribution of Thomas Register listings by city. Larger circles contain more listings. Source: Author's digitization of the Thomas Register.
Figure 3. City-product specialization matrix for major Thomas Register cities and product categories. Rows are cities and columns are product categories. Brighter cells indicate that a city is more specialized in a product category than expected given the overall size of that city and the overall frequency of the product category in the register. Source: Author's digitization of the Thomas Register.

Code: Website repository


3. Trade Shocks and Regional Exposure, 1970–2020

This dataset measures country-region-year level trade shocks using IPUMS International microdata covering 31 countries and 40 years over 1970–2020. I calculate regional exposure to changing trade patterns by matching individuals in IPUMS International microdata to the SITC4 products most associated with their industry. This involves creating a crosswalk between over 10,000 industry descriptions in the raw IPUMS files and SITC4 codes using the ChatGPT API. The resulting crosswalk is available upon request. To calculate the shocks, I weight changes in global imports of SITC4 products between 1975–1990 and 1990–2000 by regional shares of employment. Trade data is from Robert Feenstra's website. Below I show the resulting distribution of shocks for one country, France, in a single year, 1999. Geo level 2 datasets are available upon request.

Map of trade shocks in 1999 in France
Figure 4. Regional exposure to trade shocks, France.
Map of global trade shocks
Figure 5. Global regional exposure to trade shocks.