Essential Patent Analysis Datasets
contributors: Agnes Cameron, Matt Marx
Many researchers focused on innovation want to understand the origins of ideas, including as captured as published papers, their evolution and commercialization by firms, including as patents and products. The following datasets provide a starting point for analysis of innovation data. We note proprietary or limited-access options but focus on open-access datasets, which can be retrieved without needin to pay or request access and are bold-italicized. Other than prioritizing open datasets, we list them alphabetically.
- patent data
- Google Patents Public Datasets
- PatentsView -- contains data on USPTO patents since 1976.
- PATSTAT -- must be licensed, but much underlying DOCDB metadata available in Google Patents Public Datasets.
- PatentCity -- Bergeaud/Verluise locations for inventors & assignees for US, UK, FR and DE patent offices through 2013
- scientific literature
- Microsoft Academic Graph "MAG" -- open metadata on scientific articles from all fields, 1800-2020
- OpenAlex -- an open-source, drop-in replacement for MAG
- PubMed -- Public data on articles in the life sciences and related fields.
- Clarivate Web of Science -- proprietary data on scientific publications, available via license or limited online searches.
- Elsevier Scopus -- proprietary data on scientific publications, available via license or limited online searches.
- patent citation to scientific literature
- matching patents to products, distinguishing product vs. product innovation
- matching patents to firms
- WIPO Manual on Open Source Patent Analytics and accompanying Github Repository, which gives a practical guide to free and open source software tools for patent analytics
- Lens Labs Knowledge-base, which contains extensive information about patent analysis, including legal status calculation, geographical variability in patent law, analysis of biological patents and guides to reading patents
- Paul Oldham's guide Understanding Patent Data Fields, which gives a thorough overview of using patent identifiers, and also features in the WIPO manual