I³ Open Innovation Dataset Index
This is the web version of the I³ Open Dataset Index – a collection of innovation datasets, and related tools, platforms and resources used by the broader research community.
You can contribute to this site, either by editing our google sheet (updates made to the sheet will take a couple of minutes to display),
or by making a pull request to our GitHub repository directly.
You can use the search bar to explore the datasets, or browse the full list
directly, or explore the list of tools
Essential Patent Analysis Datasets
Many researchers focused on innovation want to understand the origins of ideas, including as captured as published papers, their evolution and commercialization by firms, including as patents and products. The following datasets provide a starting point for analysis of innovation data. We note proprietary or limited-access options but focus on open-access datasets, which can be retrieved without needin to pay or request access and are bold-italicized. Other than prioritizing open datasets, we list them alphabetically.
- patent data
- Google Patents Public Datasets
- PatentsView – contains data on USPTO patents since 1976.
- PATSTAT – must be licensed, but much underlying DOCDB metadata available in Google Patents Public Datasets.
- PatentCity – Bergeaud/Verluise locations for inventors & assignees for US, UK, FR and DE patent offices through 2013
- scientific literature
- Microsoft Academic Graph “MAG” – open metadata on scientific articles from all fields, 1800-2020
- OpenAlex – an open-source, drop-in replacement for MAG
- PubMed – Public data on articles in the life sciences and related fields.
- Clarivate Web of Science – proprietary data on scientific publications, available via license or limited online searches.
- Elsevier Scopus – proprietary data on scientific publications, available via license or limited online searches.
- patent citation to scientific literature
- Lens.org – Data queriable through the Lens API and also downloadable in bulk. Creates a unique identifier for papers, patents, authors, inventors and institutions.
- Reliance on Science – contains front-page and in-text citations to scientific articles from worldwide patents
- matching patents to products, distinguishing product vs. product innovation
- IPRoduct – links patents to products via Virtual Patent Marking.
- Classifying Patent Claims – Ganglmair, Robinson, & Seeligson classification of patent claims 1836-2020 as process, product, or product-by-process
- matching patents to firms