location: https://doi.org/10.5281/zenodo.3710993
This dataset is also queriable online here via Google BigQuery.
authors: Cyril Verluise, Gabriele Cristelli, Kyle Higham, Lucas Violon, Gaétan de Rassenfosse
tags: citation, scholarly literature, in-text, front-page, patent, science, database, Wikipedia
related projects:
add relationship: +
code: https://cverluise.github.io/notebook
description: In-text and front page citations to non-patent literature and in-text patent citations, extracted and parsed. patCit builds on DOCDB, the largest database of Non Patent Literature (NPL) citations. First, we deduplicate this corpus and organize it into 10 categories. Then, we design and apply category specific information extraction models using spaCy. Eventually, when possible, we enrich the data using external domain specific high quality databases. Managed as an open-source, collaboratively maintained project.
documentation: https://cverluise.github.io/PatCit/
last edit: Sat, 12 Nov 2022 22:08:03 GMT
terms of use: CC-BY 4.0 International
timeframe: 1836-2018