Essential Patent Analysis Datasets
authors: Agnes Cameron, Matt Marx
description: Starting point list of datasets and resources for the study of innovation, including guides to interpreting patent identifiers, request APIs, and disambiguation tools.
Many researchers focused on innovation want to understand the origins of ideas, including as captured as published papers, their evolution and commercialization by firms, including as patents and products. The following datasets provide a starting point for analysis of innovation data. We note proprietary or limited-access options but focus on open-access datasets, which can be retrieved without needin to pay or request access and are bold-italicized. Other than prioritizing open datasets, we list them alphabetically.
- patent data
- Google Patents Public Datasets
- PatentsView – contains data on USPTO patents since 1976.
- PATSTAT – must be licensed, but much underlying DOCDB metadata available in Google Patents Public Datasets.
- PatentCity – Bergeaud/Verluise locations for inventors & assignees for US, UK, FR and DE patent offices through 2013
- scientific literature
- Microsoft Academic Graph “MAG” – open metadata on scientific articles from all fields, 1800-2020
- OpenAlex – an open-source, drop-in replacement for MAG
- PubMed – Public data on articles in the life sciences and related fields.
- Clarivate Web of Science – proprietary data on scientific publications, available via license or limited online searches.
- Elsevier Scopus – proprietary data on scientific publications, available via license or limited online searches.
- patent citation to scientific literature
- Lens.org – Data queriable through the Lens API and also downloadable in bulk. Creates a unique identifier for papers, patents, authors, inventors and institutions.
- Reliance on Science – contains front-page and in-text citations to scientific articles from worldwide patents
- matching patents to products, distinguishing product vs. product innovation
- IPRoduct – links patents to products via Virtual Patent Marking.
- Classifying Patent Claims – Ganglmair, Robinson, & Seeligson classification of patent claims 1836-2020 as process, product, or product-by-process
- matching patents to firms