about search datasets tools guides validation project

Patent text: code, data, and new measures

location: https://zenodo.org/record/3515985

contributors: Sam Arts, Jianan Hou, Juan Carlos Gomez

tags: patent measures, text, natural language processing, novelty, impact, USPTO, technological progress

related projects:

supercedes:

patenttext

documentation: https://zenodo.org/record/3515985

code: https://github.com/sam-arts/respol_patents_code

timeframe: 1969-2018

terms of_use: Open Data Commons Attribution License v1.0

related publications: Arts S, Hou J, Gomez JC. (2020). Natural language processing to identify the creation and impact of new technologies in patent text: code, data, and new measures. Forthcoming Research Policy. (https://doi.org/10.1016/j.respol.2020.104144)

description: Different open access data files related to the text of USPTO patent documents, including 1) for each US patent a list of processed, cleaned and stemmed keywords, 2) for each patent a list of the 1,000 most similar patents (based on cosine similarity) from the entire population of US patents, 3) for each US patent the average cosine similarity with all prior patents from the previous 5 years, and the average cosine similarity with all later patents in the following 5 years, 4) each new keyword (unigram), bigram (sequence of two adjacent keywords), trigram, and pairwise keyword combination introduced for the first time in history by a US patent, the number of the patent introducing it for the first time, and the total number of patents from the entire population using these new keywords, bigrams, trigrams, and new keyword combinations.

last edit: Fri, 01 Dec 2023 17:56:16 GMT