Essential Patent Analysis Datasets
authors: Agnes Cameron, Matt Marx
description: Starting point list of datasets and resources for the study of innovation, including guides to interpreting patent identifiers, request APIs, and disambiguation tools.
Many researchers focused on innovation want to understand the origins of ideas, including as captured as published papers, their evolution and commercialization by firms, including as patents and products. The following datasets provide a starting point for analysis of innovation data. We note proprietary or limited-access options but focus on open-access datasets, which can be retrieved without needin to pay or request access and are bold-italicized. Other than prioritizing open datasets, we list them alphabetically.
- patent data
- scientific literature
- Microsoft Academic Graph “MAG” – open metadata on scientific articles from all fields, 1800-2020
- OpenAlex – an open-source, drop-in replacement for MAG
- PubMed – Public data on articles in the life sciences and related fields.
- Clarivate Web of Science – proprietary data on scientific publications, available via license or limited online searches.
- Elsevier Scopus – proprietary data on scientific publications, available via license or limited online searches.
- patent citation to scientific literature
- Lens.org – Data queriable through the Lens API and also downloadable in bulk. Creates a unique identifier for papers, patents, authors, inventors and institutions.
- Reliance on Science – contains front-page and in-text citations to scientific articles from worldwide patents
- matching patents to products
- IPRoduct – links patents to products via Virtual Patent Marking.
- matching patents to firms