decorative thumbnail

Patent Citation Similarity


contributors: Jeffrey Kuhn, Kenneth Younge, Alan Marco

tags: similarity, citation

timeframe: 1976-2017

terms of_use: These datasets are provided to the public subject to the Creative Commons Attribution-NonCommercial-NoDerivatives license. No co‑authorship is required to use the data in academic research — please just cite the supporting article.


description: Many studies of innovation rely on patent citations to measure intellectual lineage and impact. To create this dataset, we use a vector space model of patent similarity to compute the technological similarity between each pair of citing-cited patents. The VSM model analyzes the full text of each document to position it as a vector in a vector space that includes more than 700,000 dimensions and then calculates the angular distance between the two vectors. The dataset includes similarity values for all citations made by patents issued between 1976 and 2017 to issued patents or published patent applications.

last edit: Wed, 06 Dec 2023 02:50:53 GMT