Patent text: code, data, and new measures
location: https://zenodo.org/record/3515985
contributors: Sam Arts, Jianan Hou, Juan Carlos Gomez
tags: patent measures, text, natural language processing, novelty, impact, USPTO, technological progress
documentation: https://zenodo.org/record/3515985
code: https://github.com/sam-arts/respol_patents_code
timeframe: 1969-2018
terms of_use: Open Data Commons Attribution License v1.0
description: Different open access data files related to the text of USPTO patent documents, including 1) for each US patent a list of processed, cleaned and stemmed keywords, 2) for each patent a list of the 1,000 most similar patents (based on cosine similarity) from the entire population of US patents, 3) for each US patent the average cosine similarity with all prior patents from the previous 5 years, and the average cosine similarity with all later patents in the following 5 years, 4) each new keyword (unigram), bigram (sequence of two adjacent keywords), trigram, and pairwise keyword combination introduced for the first time in history by a US patent, the number of the patent introducing it for the first time, and the total number of patents from the entire population using these new keywords, bigrams, trigrams, and new keyword combinations.
last edit: Fri, 01 Dec 2023 17:56:16 GMT