Reliance on Science in Patenting


authors: Matt Marx, Aaron Fuegi

tags: citation, scholarly literature, front-page, error metrics

related pages:


description: We introduce an open-access dataset of references from the front pages of patents granted worldwide to scientific papers published since 1800. Each patent-paper linkage is assigned a confidence score, which is characterized in a random sample by false negatives versus false positives. All matches are available for download at We outline several avenues for strategy research enabled by these new data. This contains citations from the front pages of worldwide patents to articles in the Microsoft Academic Graph (MAG) from 1800-2020.


last edit: Tue, 01 Mar 2022 12:23:09 GMT

terms of use: Open Data Commons Attribution License v1.0

timeframe: 1834-2019

Reliance on Science links U.S. Patent & Trademark Office data to a broad set of scientific articles not limited by industry or field. These linkages involve not only proprietary article databases, which cannot be shared, but also the Microsoft Academic Graph which permits us to post the resulting PCS for public use. Based on third-party assessment, we estimate that our algorithm can capture up to 93% of patent citations to science with an accuracy rate of 99% or higher. We believe this to be the longest panel of patent-to-paper citations (spanning more than seven decades) that is publicly available and is accompanied by rigorous performance metrics. We also provide matches from worldwide patents to PubMed.