Essential Patent Analysis Datasets
contributors: Agnes Cameron, Matt Marx
tags: patents
Many researchers focused on innovation want to understand the origins of ideas, including as captured as published papers, their evolution and commercialization by firms, including as patents and products. The following datasets provide a starting point for analysis of innovation data. We note proprietary or limited-access options but focus on open-access datasets, which can be retrieved without needin to pay or request access and are bold-italicized. Other than prioritizing open datasets, we list them alphabetically.
- patent data
- Google Patents Public Datasets
- PatentsView -- contains data on USPTO patents since 1976.
- PATSTAT -- must be licensed, but much underlying DOCDB metadata available in Google Patents Public Datasets.
- PatentCity -- Bergeaud/Verluise locations for inventors & assignees for US, UK, FR and DE patent offices through 2013
- scientific literature
- Microsoft Academic Graph "MAG" -- open metadata on scientific articles from all fields, 1800-2020
- OpenAlex -- an open-source, drop-in replacement for MAG
- PubMed -- Public data on articles in the life sciences and related fields.
- Clarivate Web of Science -- proprietary data on scientific publications, available via license or limited online searches.
- Elsevier Scopus -- proprietary data on scientific publications, available via license or limited online searches.
- patent citation to scientific literature
- Lens.org -- Data queriable through the Lens API and also downloadable in bulk. Creates a unique identifier for papers, patents, authors, inventors and institutions.
- Patcit
- Reliance on Science -- contains front-page and in-text citations to scientific articles from worldwide patents
- matching patents to products, distinguishing product vs. product innovation
- IPRoduct -- links patents to products via Virtual Patent Marking.
- Classifying Patent Claims -- Ganglmair, Robinson, & Seeligson classification of patent claims 1836-2020 as process, product, or product-by-process
- matching patents to firms
- DISCERN Patent Compustat Crosswalk -- links patent assignees to public firms 1980-2015, including resolution of firm acquisitions and reclassifications.
- KPSS patent assignees to firms -- 1926-2019.
- UVA Darden Global Corporate Patent Dataset - 1980-2017, must request access.
Guides
- WIPO Manual on Open Source Patent Analytics and accompanying Github Repository, which gives a practical guide to free and open source software tools for patent analytics
- Lens Labs Knowledge-base, which contains extensive information about patent analysis, including legal status calculation, geographical variability in patent law, analysis of biological patents and guides to reading patents
- Paul Oldham's guide Understanding Patent Data Fields, which gives a thorough overview of using patent identifiers, and also features in the WIPO manual