Datasets
This page enumerates research datasets shared and used by the Innovation Information Initiative research community. Where possible, care has been taken to indicate which datasets are most widely used, which have been superceded by other projects, and the relationships these datasets haver to one another. Each dataset on this page is an entry in the Open Innovation Datasets tab of the I3 Index Google sheet. To add a dataset to the index, either add a row to the sheet or make a pull request to our Github repository
For recommendations of useful starting datasets for different research specialisms, we also host a set of guides compiled by researchers in the community. If you think one is missing for your area of research, you could also add or request one. If you would like to use a more fine-grained search to explore the index, you can use the advanced search tool here.
- World Management Survey
The World Management Survey is the first cross-country, cross-industry dataset built to measure the quality of management practices in establishments. The WMS is an interview-based evaluation tool that defines 18 key management practices, and scores ... - Crios‐Patstat Database
Disambiguated inventor and applicant names for EPO records. A major problem with PATSTAT was that data are provided in a raw format. Data have been therefore thoroughly elaborated by ICRIOS to produce a cleaned and harmonized database: PATENTS-ICRIOS... - Computer Retrieval of Information on Scientific Projects
The NIH CRISP (Computer Retrieval of Information on Scientific Projects) is a searchable database of federally funded biomedical research projects conducted at universities, hospitals, and other research institutions. This dataset has not been update... - USPTO OCE Patent Assignment Data
The USPTO allows parties to record assignments of patents and patent applications to, as much as possible, maintain a complete history of claimed interests in a patent. The USPTO also permits recording of other documents that affect title (such as ce... - Indicators on firm level innovation activities from web scraped data
This data sample (in support the article "Indicators on firm level innovation activities from web scraped data" https://ssrn.com/abstract=3938767) contains data on companies' innovative behavior measured at the firm-level based on web scraped firm-le... - AltMetrics
Tracks a combination of public policy documents, wikipedia, open syllabi, social media and mainstream media to provide enhanced measures of academic impact.... - Artificial Intelligence Patent Dataset
The Artificial Intelligence Patent Dataset consists of two files, both released by the OCE. The first data file identifies United States (U.S.) patents issued between 1976 and 2020 and pre-grant publications (PGPubs) published through 2020 that conta... - BACI
BACI provides disaggregated data on bilateral trade flows for more than 5000 products and 200 countries.... - UVA Darden Global Corporate Patent Dataset (disambiguated assignees)
The dataset has information on about 3 million USPTO patents, which were granted between 1980 and 2017, assigned to publicly listed companies worldwide, and linked to those assignee companies using the following identifiers: Unique Patent Number, as ... - US Bureau of Economic Analysis Data
The US Bureau of Economic Analysis publishes live data and metrics on GDP, industries, government investment and R&D spending, international trade, consumer spending and investment. This data is exposed through their API, and also made available thro... - American Business Cycle
Presented here are the tables of quarterly data from Appendix B of "The American Business Cycle: Continuity and Change" Edited by Robert J. Gordon. National Bureau of Economic Research Studies in Business Cycles Volume 25, Univerisity of Chicago Pres... - BIGPATENT
BIGPATENT, consisting of 1.3 million records of U.S. patent documents along with human written abstractive summaries. Each US patent application is filed under a Cooperative Patent Classification (CPC) code. Compared to existing summarization datase... - Biospolar Antarctic Literature and Patents
Mapping the scientific and patent landscapes for biodiversity based research and innovation from Antarctica and the Southern Ocean. Created under the Biospolar Project, Research Council of Norway... - Stanford NLP Model Output for Biofuel Patent Classification
Through use of natural language processing and machine-learning algorithms, we expand patent classification capabilities to better explain the history of biofuels innovation. This NLP model was generated using the Stanford NLP Classifier (available ... - Census Block Distance Database
Census Block Distances are great-circle distances calculated using the Haversine formula based on internal points in the geographic area. Census Blocks are from Census 2000 SF1 and Census 2010 SF1 files. Census Blocks "are statistical areas bounded ... - Open Sourced Database for CEO Dismissal 1992-2018
This is a database of qualitatively coded reasons for a CEO’s dismissal, for S&P 1500 Companies. The maintainers of this dataset run a mailing list with a signup [here](https://docs.google.com/forms/d/e/1FAIpQLSfiZZHwyeWYEZ5fOT1_RygH-ComG9ltad5IUUY60... - Collection of Historical Data on the Uses of Petroleum International Network
The research project CH.DUPIN (Collection of Historical Data on the Uses of Petroleum International Network) aims at gathering historical data on oil consumption for many countries. The current dataset contains yearly information on oil consumption... - ChEMBL
ChEMBL Data is a manually curated database of small molecules used in drug discovery, including information about existing patented drugs.... - ChemBL-NTD
CHEMBL-NTD is a repository for Open Access primary screening and medicinal chemistry data directed at neglected diseases - endemic tropical diseases of the developing regions of the Africa, Asia, and the Americas. The primary purpose of ChEMBL-NTD is... - Chilean IP and firm data
This study describes patterns and trends of intellectual property (IP) use in Chile, drawing on a new database containing all patent, trademark, utility model, and design filings received by the Chilean IP office over the period 1991-2010. The databa... - CLEF-IP
CLEF was a long-running project to benchmark cross-language information retrieval (IR) models. CLEF-IP was a strand of this benchmarking research that ran from 2010-2013, the data from which was used as a benchmarking activity of the CLEF 2010-2013 c... - CMS's SSA to FIPS CBSA and MSA County Crosswalk
CMS periodically produces SSA to FIPS CBSA to county crosswalk files. They released a CBSA to MSA to FIPS county crosswalk as well. Some CMS data files have SSA state and county codes or county name rather than FIPS state and county codes. Jean Roth ... - Cooperative Patent Classification Data
Cooperative Patent Classification Data contains the scheme and definitions of the Cooperative Patent Classification system for classifying patent documents. The CPC is the result of a partnership between the EPO and the USPTO in their joint effort to... - Disambiguation and Co-authorship Networks of the U.S. Patent Inventor Database
Name disambiguation of US inventors, 1975-2010. Using a Bayesian supervised learning approach, we identify individual inventors from the U.S. utility patent database, from 1975 to the present. An interface to calculate and illustrate patent co-author... - Imputation of missing applicant country codes in worldwide patent data
We present a general method for imputing missing information in the Worldwide Patent Statistical Database (PATSTAT) and make the resulting datasets publicly available. The PATSTAT database is the de facto standard for academic research using patent d... - The careers and co-authorship networks of U.S. patent-holders, since 1975
The identification enables construction of social networks based on patent co-authorship. We will eventually provide descriptive statistics of individual and collaborative variables and illustrated examples of networks for an individual, an organizat... - A large-scale COVID-19 Twitter chatter dataset for open scientific research
Dataset of tweets acquired from the Twitter Stream related to COVID-19 chatter. The first 9 weeks of data (from January 1st, 2020 to March 11th, 2020) contain very low tweet counts as we filtered other data we were collecting for other research purpo... - Dimensions
Dimensions contains more than 100 million publications, ranging from articles published in scholarly journals, books and book chapters, to preprints and conference proceedings. All publications are contextualized with linked data sets, funding, publi... - EPO worldwide bibliographic data (DOCDB)
DOCDB is the EPO's master documentation database with worldwide coverage. It contains bibliographic data, abstracts, citations and the DOCDB simple patent family, but no full text or images. ... - Disclosed Standard Essential Patents Database
The OEIDD database provides a full overview of all disclosed IPR at setting organizations world-wide. Based on the archives of thirteen major SSOs as of March 2011, the disclosure data is cleaned, harmonized, and all disclosed USPTO or EPO patents or... - EurekAlert
EurekAlert! is a nonprofit news-release distribution platform operated by the American Association for the Advancement of Science (AAAS) as a resource for journalists and the public. EurekAlert! hosts news releases produced by universities, journal p... - European Business Performance Database
The European Business Performance database describes the performance of the largest enterprises in the twentieth century. It covers eight countries that together consistently account for above 80 per cent of western European GDP: Great Britain, Germa... - FinBERT
FinBERT is a pre-trained NLP model to analyze sentiment of financial text. It is built by further training the BERT language model in the finance domain, using a large financial corpus and thereby fine-tuning it for financial sentiment classification... - NIH Research Portfolio Online Reporting Tools Database (ExPORTER)
ExPORTER is an open data dump of the National Institute of Health RePORTER database, which tracks administrative data on NIH funded projects. ExPORTER provides bulk administrative data found in RePORTER to the public for detailed analyses or to load ... - FnGuide
Database of financial information about Korean firms, plus national banking, finance and economic data... - Google Patents Research Data
Google Patents Research Data contains the output of much of the data analysis work used in Google Patents (patents.google.com), including machine translations of titles and abstracts from Google Translate, embedding vectors, extracted top terms, simi... - Measuring the Position and Differentiation of Firms in Technology Space
Although the rate of invention by firms and the effect on firm performance have been central themes in economics and strategy, the position and differentiation of invention by firms have received less attention. We develop a method to characterize a ... - Geocoding of worldwide patent data
The dataset provides geographic coordinates for inventor and applicant locations in 18.8 million patent documents spanning over more than 30 years. The geocoded data are further allocated to the corresponding countries, regions and cities. When the a... - Government-funded US patents
This includes patent level metadata, 1926-1975 (OCRed from USPTO Image PDF files), 1976-2017 (parsed from USPTO HTML files), patent meta data, CPC, geography, agencies, entity size of the patent owner etc, government support categories at patent leve... - Replication Data for: Government-funded research increasingly fuels innovation
This includes patent level metadata, 1926-1975 (OCRed from USPTO Image PDF files), 1976-2017 (parsed from USPTO HTML files), patent meta data, CPC, geography, agencies, entity size of the patent owner etc, government support categories at patent leve... - Classification Data for "Classifying Patents Based on their Semantic Content"
https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/ZULMOY... - GPT Indicators
This database contains yearly technology-level measures of Growth, Use Complementarity (UC) and Innovation Complementarity (IC) since 1920 for all technological classes in the United States Patent and Trademark Office (USPTO) classification system, a... - GRID: Global Database of Research Institutes
GRID is a free and openly available global database of over 100,000 research-related organisations, including healthcare organizations, companies, governments, non-profits, each provided with a unique and persistent identifier. In addition to IDs and... - Tools for Harmonizing County Boundaries
This tool creates the csv tables that allow county boundaries to be synchronized to a base year, exported to the directory you run this from. While this code takes shape files of any type and preforms an intersect, it was written to follow the method... - HistPat Dataset
HistPat provides the geography of historical patents granted by the United States Patent and Trademark Office (USPTO) from 1790 to 1975. This historical dataset is constructed using digitalized records of original patent documents that are publicly a... - Historical Cross-Country Technology Adoption (HCCTA) Dataset
This Historical Cross Country Technology Adoption Dataset is a dataset that was collected to allow for the analysis of the adoption patterns of some of the major technologies introduced in the past 250 years across the World's leading industrialized ... - IFI Claims Patent Data Enrichments
IFI CLAIMS Patent Data Enrichments includes standardized assignee/applicant names and integrated legal status information.... - HistPat International Dataset
HistPat International provides the geography of historical patents granted to foreign nationals by the United States Patent and Trademark Office (USPTO) from 1836 to 1975. This historical dataset is constructed using digitalized records of original p... - Inventor biography data linked to administrative data of the IAB (INV-BIO ADIAB)
We introduce the employer-employee inventor dataset INV-BIO ADIAB 1980-2014, which records complete biographies of more than 150,000 inventors in Germany between 1980 and 2014. This dataset tracks each inventor’s employment status and inventive outpu... - Inventor disambiguation
... - IPRoduct
The IPRoduct project seeks to link innovative goods to the patents upon which they are based. By directly linking products to patents, this project tracks innovation to the point where it meets consumers, the true commercial end point of investments ... - Indian Patent Advanced Search System
Platform for accessing indian public patents data... - Japanese Patent Office
IIP Patent Database (IIP Patent DB) is a database developed for statistical analysis of patents based on the Japan Patent Office (JPO) “Standardized Data.“ Intellectual Property Institute (IIP) provides the IIP patent DB to further promote patent s... - Korea Patent Data Project (KoPDP)
The project collects all utility patents granted from the Korea Intellectual Property Office (KIPO) for the period 1948-2016 and the US Patent and Trademark Office (USPTO) for the period 1976-2017. The project also matches their assignees to firms in... - KPSS patent assignees to firms
We propose a new measure of the economic importance of each innovation. Our measure uses newly collected data on patents issued to U.S. firms in the 1926 to 2010 period, combined with the stock market response to news about patents. Our patent-level ... - Journal Commercial Impact Factor
This measure is analogous to the commonly used Journal Impact Factor (JIF), which is also calculated here. A journal’s impact factor is a popular measure of its quality, calculated for year t as the number of times articles from years t-1 and t-2 wer... - Lens.org
Lens serves nearly all of the patent documents in the world as open, annotatable digital public goods that are integrated with scholarly and technical literature along with regulatory and business data. The Lens will allow documents and analyses to b... - Lens Labs
Links to datasets, APIs, and tools... - Long-run dynamics of the US patent classification system
The dataset contains several files used in the paper "Long-run dynamics of the US patent classification system" by François Lafond and Daniel Kim. The files contain data on the evolution of the number of classes in the United States Patent Classifica... - Women in the Copyright System
Coverage: The data set contains copyright registration records, copyright renewal records, and recorded document records, from January 1, 1978, to July 8, 2021. Content: The data set contains information on authors, types of works registered, public... - World Bank Development Indicators
World Development Indicators Data is the primary World Bank collection of development indicators, compiled from officially-recognized international sources. It presents the most current and accurate global development data available, and includes nat... - Long-Term Productivity database
The Long-Term Productivity database was created as a project at the Bank of France in 2013 by Antonin Bergeaud, Gilbert Cette and Remy Lecat. Following the work of Cette, Mairesse and Kocoglu (2009), we extended the database to include 17 countries i... - The scientific knowledge base of low carbon energy technologies (updated and extended version)
This data publication offers updated data about low-carbon energy technology (LCET) patents and citations links to the scientific literature. Compared to a previous version, it also contains data on biofuels and fuels from waste technologies. The upd... - Microsoft Academic Knowledge Graph
A large RDF data set with over eight billion triples with information about scientific publications and related entities, such as authors, institutions, journals, and fields of study. The data set is based on the Microsoft Academic Graph and licensed... - MatrixWare Research Collection
MAREC Data is a static collection of over 19 million patent applications and granted patents in a unified file format normalized from EP, WO, US, and JP sources, spanning a range from 1976 to June 2008. In MAREC, the documents from different countrie... - Matched inventor ages from patents, based on web scraped sources
We use information about U.S. residing inventors from patents which include name and location and search for age and date of death information from publicly available online web directories and build a scoring system to indicate the quality... - Chinese Patent Data Project Dataverse
Matching SIPO patents to Chinese listed firms ("Main Board"). Please refer to the user documentation "Chinese Patent Database User Documentation: Matching SIPO Patents to Chinese Publicly-Listed Companies and Subsidiaries" for more details ... - MIT Scholarly Works Over Time
Scholarly works produced by MIT 1950-2018... - MIT Scholarly Works Cited by Patents
MIT Scholarly Works Cited by Patents 1950-2018... - NBER Economic Indicators and Releases
Regularly-updated and archived index of economic indicators, including interest rates, stock reserves, home sales, labour statistics and productivity. This page is updated Monday-Friday.... - NBER Macrohistory Database
During the first several decades of its existence, the National Bureau of Economic Research (NBER) assembled an extensive data set that covers all aspects of the pre-WWI and interwar economies, including production, construction, employment, money, p... - Newpaper.com Index
Index of newspaper.com articles... - NBER US Patent Data Project
The main dataset extends from Jan 1, 1963, through december 30, 2006, and includes all the utility patents granted during that period. The citations file includes all citations made by patents granted in 1975-1999.... - OpenAlex
OpenAlex is a free and open catalog of the world's scholarly papers, researchers, journals, and institutions — along with all the ways they're connected to one another. It is maintained by the non-profit OurResearch.... - FDA Orange Book Database
The publication Approved Drug Products with Therapeutic Equivalence Evaluations (commonly known as the Orange Book) identifies drug products approved on the basis of safety and effectiveness by the Food and Drug Administration (FDA) under the Federal... - ORBIS Intellectual Properties
Database that links patents to companies... - PatCit
Citazioni nel testo e in prima pagina alla letteratura non brevettuale e citazioni brevettuali nel testo, estratte e analizzate. patCit si basa su DOCDB, il più grande database di citazioni di letteratura non brevettuale (NPL). Innanzitutto, deduplic... - NBER Orange Book Dataset
Each edition of the Orange Book provides a snapshot of unexpired patent protection at a moment in time. As patents on a drug expire and new patents are issued, these changes are reflected in later editions. The Orange Book also provides a snapshot of... - Patent Citation Similarity
Many studies of innovation rely on patent citations to measure intellectual lineage and impact. To create this dataset, we use a vector space model of patent similarity to compute the technological similarity between each pair of citing-cited patents... - Patent Citation Timing and Source
Innovation studies frequently distinguish between patent citation submitted by the patent examiner and those submitted by the patent application. However, publicly available citations data is often misleading, for instance by attributing a patent cit... - Disclosure of Patenting Activities within Scientific Publications as Potential Conflict-of-Interests: Evidences from Biomedical Literature
Most scientific publishers require authors to submit their manuscripts with a text reporting their own parallel activities, including patent-related ones, that may be considered as potential Conflict-of-Interest (COI). The authors’ or institutional p... - Patent Families Dataset
Patent applicants frequently file groups of patent applications linked together by priority claims. These priority claims create families of patent applications that share features such as inventors, priority dates, and technical descriptions. By ana... - Geography of patents
... - Patent PDF Samples with Extracted Structured Data
The dataset consists of PDFs in Google Cloud Storage from the first page of select US and EU patents, and BigQuery tables with extracted entities, labels, and other properties, including a link to each file in GCS. The structured data contains labels... - On the price elasticity of demand for patents
Fees since 1980 at the European (EPO), the US and the Japanese patent offices.... - WIPO PATENT REGISTER PORTAL
The WIPO's Patent Register Portal gives details of the availability of online patent registers by country / jurisdiction, as well as their search functionalities and the type of information they provide.... - Patent text: code, data, and new measures
Different open access data files related to the text of USPTO patent documents, including 1) for each US patent a list of processed, cleaned and stemmed keywords, 2) for each patent a list of the 1,000 most similar patents (based on cosine similarity... - Patent Scope and Examiner Toughness
This dataset includes an easy-to-use measure of patent scope that is grounded both in patent law and in the practices of patent attorneys. Our measure counts the number of words in the patents’ first claim. The longer the first claim, the less scope ... - Patent-to-article intext citations for 244 journals
The data contains all articles in 244 journals as described in "In-Text Patent Citations: A User's Guide", and all front-page and in-text citations as found by the algorithm described in this paper. ... - Patent value
The data contains all articles in 244 journals as described in "In-Text Patent Citations: A User's Guide", and all front-page and in-text citations as found by the algorithm described in this paper. ... - PatentCity
PatentCity is a dataset on the location of patentees since the 19th century in Germany, France, Great Britain and the United States of America. Beta available for test! Drop us a mail if you are interested in becoming a beta tester.... - Patents Citing MIT Publications
This collection encompasses patents that cite the scholarly works of Massachusetts Institute of Technology. ... - PATENTSCOPE
The PATENTSCOPE database provides access to international Patent Cooperation Treaty (PCT) applications in full text format on the day of publication, as well as to patent documents of participating national and regional patent offices.... - USPTO PatentsView
PatentsView includes US patent data including raw data (summaries, applications, pregrant applications), disambugations of inventors and assignees, and inventor gender estimates. Also foreign priority data, # of figures and sheets, and government in... - PatentsView Citation data
Citation to foreign patents from US patents (foreigncitation), citation to US patent applications from US patents (usapplicationcitation), citation to US patents from US patents (uspatentcitation), non-patent citations in patents (otherreference)... - PatentsView Classification data
CPC classifications, NBER classifications (to 2015), USP classificiations, WIPO technology fields, Lookup tables (CPC, USPC, WIPO, NBER, US gov. organizations), botanic info for plant patents.... - USPTO OCE Patent Examination Research Data (PatEx)
The latest version of PatEx (referred to below as the 2020 release) contains detailed information on nearly 11.9 million publicly-viewable provisional and non-provisional patent applications to the USPTO and over 4.6 million Patent Cooperation Treaty... - PatentText
We provide open access to the code and data to calculate the text-based similarity between any two utility patents granted by the United States Patent and Trademark Office between 1976 and 2013, or between any two patent portfolios ... - PATSTAT
PATSTAT contains bibliographical and legal event patent data from leading industrialised and developing countries. This is extracted from the EPO’s databases and is either provided as bulk data or can be consulted online. ... - Patstat Register
This database contains bibliographic and legal event data on published European and Euro-PCT patent applications. Like the core PATSTAT database, it is maintained by the EPO, however PATSTAT Register only contains information about patent applicatio... - Patent Examination Data System
PEDS contains the bibliographic, published document and patent term extension data tabs in Public PAIR from 1981 to present. There is also some data dating back to 1935.The data can be accessed by anyone using the web interface or the provided Applic... - Worldwide Count of Priority Patents
The goal of the project was to produce a dataset of priority patent applications filed across the globe, allocated by inventor and applicant location.... - USPTO Patent Trial and Appeal Board (PTAB) API Data
USPTO Patent Trial and Appeal Board (PTAB) API Data contains data from the PTAB E2E (end-to-end) system making public America Invents Action (AIA) Trials information and documents available. This dataset is hosted as a RESTful API with an easy to us... - Penn World Tables
PWT version 10.0 is a database with information on relative levels of income, output, input and productivity, covering 183 countries between 1950 and 2019. Access to the data is provided in Excel, Stata and online formats.... - Replicable patent indicators
A series of BigQuery scripts to create popular indicators using Google BigQuery and the Google Patents Public Datasets. The companion paper is available at https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4644778... - Research Disclosures
Research disclosures are a form of defensive publication, that establish innovations as prior art and prevent the same invention from being patented. ... - Breadth and RETech (Text-based)
Patent-level variables that provide researchers a new way to characterize innovation within public firms, startups, places and more. Importantly, they are distinct from existing measures and do not have look-ahead bias: they only use information avai... - Google Patent Phrase Similarity Dataset
This is a human rated contextual phrase to phrase matching dataset focused on technical terms from patents. In addition to similarity scores that are typically included in other benchmark datasets we include granular rating classes similar to WordNet... - Reliance on Science in Patenting
We introduce an open-access dataset of references from the front pages of patents granted worldwide to scientific papers published since 1800. Each patent-paper linkage is assigned a confidence score, which is characterized in a random sample by fals... - Semantic Scholar Open Research Corpus
Semantic Scholar's records for research papers published in all fields provided as an easy-to-use JSON archive. ... - Classification Data for "Classifying Patents Based on their Semantic Content"
https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/ZULMOY... - SureChEMBL
SureChEMBL is a publicly available large-scale resource containing compounds extracted from the full text, images and attachments of patent documents. The data are extracted from the patent literature according to an automated text and image-mining p... - Semantic similarity of patent-standard pairs (ETSI, IEEE, ITUT)
We provide data with information on the semantic similarity of more than 200 million patent-standard pairs for three major standard-setting organizations: ETSI, IEEE and ITU-T. The semantic similarity of patents to standards can be used to approximat... - CPA Global Technical Standards ETSI Data
European Telecommunications Standards Institute (ETSI) IPR dataset for technical standards. These are the US assets disclosed by companies as related to technical standards in ETSI. The two major ones included are 3GPP and LTE.... - Transportation Economics in the 21st Century
Improving access to data sets related to transportation economics and facilitating research with these datasets are cental objectives of this project. Post-doctoral researcher Caitlin Gorback, with advice from from a steering committee including Nath... - UK IPO
Snapshots of British patent/SPC applications received and subsequently published by the Intellectual Property Office.... - UCB Fung Institute Patent Data
Drawing upon recent advances in machine learning and natural language processing, we introduce new tools that automatically ingest, parse, disambiguate and build an updated database using United States patent data. The tools identify unique inventor,... - Monthly statistics -- Patents, trade marks, and designs
These statistics include monthly data for designs, patents, trade marks.... - 337Info - Unfair Import Investigations Information System
US International Trade Commission 337Info Unfair Import Investigations Information System contains data on investigations done under Section 337. Section 337 declares the infringement of certain statutory intellectual property rights and other forms ... - UniChem
UniChem is large-scale non-redundant database of pointers between chemical structures and EMBL-EBI chemistry resources. Its purpose is to optimise the efficiency with which structure-based hyperlinks may be built and maintained between chemistry-bas... - Patents arising from U.S. government funding
The 3PFL database links information on patented inventions and scientific publications related to a public procurement contract or a research grant awarded by the U.S. Federal Government to detailed contract-level/grant-level information (e.g., award... - List of USPTO patents from US universities
Using cross-state panel and cross-U.S. commuting-zone data to look at the relationship between innovation, top income inequality and social mobility. From the paper "Innovation and Top Income Inequality" (Aghion, Akcigit, Bergeaud, Blundell, Hémous).... - US Patent Similarity Data
Pairwise semantic similarity measures for US utility patents. Includes measures for citing/cited patent pairs, 100 most-similar patents for each patent, and doc2vec vectors for each patent.... - USPTO OCE Patent Claims Research Data
The Patent Claims Research Dataset contain detailed information on claims from U.S. patents granted between 1976 and 2014 and U.S. patent applications published between 2001 and 2014. The dataset is derived from the Patent Application Publication Ful... - USPTO OCE Cancer Moonshot Patent Data
The USPTO Cancer Moonshot Patent Data contains detailed information on published patent applications and granted patents relevant to cancer research and development (R&D). We generate the dataset using USPTO examiner tools to execute a series of quer... - Google Patents Public Datasets
Worldwide (100+ countries) bibliographic and USPTO full-text, available via BigQuery. Provided by IFI CLAIMS Patent Services, a worldwide bibliographic and US full-text dataset of patent publications. Updated quarterly.... - CSET Private-Sector AI Indicators
The Private-Sector AI Indicators dataset includes a diverse range of indicators of AI-related activity for hundreds of companies worldwide, from startups to multinationals. The dataset uses original metadata, models, and methods developed by ETO and ... - AIMixDetect: detect mixed authorship of a language model (LM) and humans
This dataset was utilized in the paper "An Information-Theoretic Approach for Detecting Edits in AI-Generated Text" by Alon Kipnis and Idan Kashtan. The data was generated using GPT-3.5-turbo (ChatGPT) and is organized into five distinct sub-datasets... - Microsoft Academic Graph
The Microsoft Academic Graph is a heterogeneous graph containing scientific publication records, citation relationships between those publications, as well as authors, institutions, journals, conferences, and fields of study. ... - DISCERN: Duke Innovation & SCientific Enterprises Research Network
Patents (as well as scientific articles, and NPL citations at the aggregate firm-level) matched to U.S. Compustat firms over the period 1980-2015. In extending the match to Compustat up to 2015, we address two major challenges: name changes and owner... - Chinese Patent Data Project
In this project, patents from China's State Intellectual Property Office (SIPO) are matched to various types of companies. Matching SIPO patents to firms in the Annual Survey of Industrial Enterprises (ASIE) of China's National Bureau of Statistics.... - Crunchbase Data
Crunchbase collects data on both private and public companies.Their content includes investment and funding information, founding members and individuals in leadership positions, mergers and acquisitions, news, and industry trends. They have a free a...