-
American Business Cycle
Presented here are the tables of quarterly data from Appendix B of "The American Business Cycle: Continuity and Change" Edited by Robert J. Gordon. National Bureau of Economic Research Studies in Business Cycles Volume 25, Univerisity of Chicago P...
-
BACI
BACI provides disaggregated data on bilateral trade flows for more than 5000 products and 200 countries.
-
Biospolar Antarctic Literature and Patents
Mapping the scientific and patent landscapes for biodiversity based research and innovation from Antarctica and the Southern Ocean. Created under the Biospolar Project, Research Council of Norway
-
Census Block Distance Database
Census Block Distances are great-circle distances calculated using the Haversine formula based on internal points in the geographic area.
Census Blocks are from Census 2000 SF1 and Census 2010 SF1 files. Census Blocks "are statistical areas bound...
-
Open Sourced Database for CEO Dismissal 1992-2018
This is a database of qualitatively coded reasons for a CEO’s dismissal, for S&P 1500 Companies. The maintainers of this dataset run a mailing list with a signup [here](https://docs.google.com/forms/d/e/1FAIpQLSfiZZHwyeWYEZ5fOT1_RygH-ComG9ltad5IUU...
-
Collection of Historical Data on the Uses of Petroleum International Network
The research project CH.DUPIN (Collection of Historical Data on the Uses of Petroleum International Network) aims at gathering historical data on oil consumption for many countries.
The current dataset contains yearly information on oil consumpt...
-
ChEMBL
ChEMBL Data is a manually curated database of small molecules used in drug discovery, including information about existing patented drugs.
-
ChemBL-NTD
CHEMBL-NTD is a repository for Open Access primary screening and medicinal chemistry data directed at neglected diseases - endemic tropical diseases of the developing regions of the Africa, Asia, and the Americas. The primary purpose of ChEMBL-NTD...
-
Chilean IP and firm data
In-text and front page citations to non-patent literature and in-text patent citations, extracted and parsed. patCit builds on DOCDB, the largest database of Non Patent Literature (NPL) citations. First, we deduplicate this corpus and organize it ...
-
Chinese Patent Data Project
In this project, patents from China's State Intellectual Property Office (SIPO) are matched to various types of companies. Matching SIPO patents to firms in the Annual Survey of Industrial Enterprises (ASIE) of China's National Bureau of Statistics.
-
CMS's SSA to FIPS CBSA and MSA County Crosswalk
CMS periodically produces SSA to FIPS CBSA to county crosswalk files. They released a CBSA to MSA to FIPS county crosswalk as well. Some CMS data files have SSA state and county codes or county name rather than FIPS state and county codes. Jean Ro...
-
The careers and co-authorship networks of U.S. patent-holders, since 1975
The identification enables construction of social networks based on patent co-authorship. We will eventually provide descriptive statistics of individual and collaborative variables and illustrated examples of networks for an individual, an organi...
-
Disambiguation and Co-authorship Networks of the U.S. Patent Inventor Database
Name disambiguation of US inventors, 1975-2010. Using a Bayesian supervised learning approach, we identify individual inventors from the U.S. utility patent database, from 1975 to the present. An interface to calculate and illustrate patent co-aut...
-
Cooperative Patent Classification Data
Cooperative Patent Classification Data contains the scheme and definitions of the Cooperative Patent Classification system for classifying patent documents. The CPC is the result of a partnership between the EPO and the USPTO in their joint effort...
-
Imputation of missing applicant country codes in worldwide patent data
We present a general method for imputing missing information in the Worldwide Patent Statistical Database (PATSTAT) and make the resulting datasets publicly available. The PATSTAT database is the de facto standard for academic research using paten...
-
Crios‐Patstat Database
Disambiguated inventor's and applicant's names for EPO records. A major problem with PATSTAT was that data are provided in a raw format. Data have been therefore thoroughly elaborated by ICRIOS to produce a cleaned and harmonized database: PATENTS...
-
Computer Retrieval of Information on Scientific Projects
The NIH CRISP (Computer Retrieval of Information on Scientific Projects) is a searchable database of federally funded biomedical research projects conducted at universities, hospitals, and other research institutions. This dataset has not been upd...
-
Dimensions
Dimensions contains more than 100 million publications, ranging from articles published in scholarly journals, books and book chapters, to preprints and conference proceedings. All publications are contextualized with linked data sets, funding, pu...
-
DISCERN: Duke Innovation & SCientific Enterprises Research Network
Patents (as well as scientific articles, and NPL citations at the aggregate firm-level) matched to U.S. Compustat firms over the period 1980-2015. In extending the match to Compustat up to 2015, we address two major challenges: name changes and ow...
-
EPO worldwide bibliographic data (DOCDB)
DOCDB is the EPO's master documentation database with worldwide coverage. It contains bibliographic data, abstracts, citations and the DOCDB simple patent family, but no full text or images.
-
Disclosed Standard Essential Patents Database
The OEIDD database provides a full overview of all disclosed IPR at setting organizations world-wide. Based on the archives of thirteen major SSOs as of March 2011, the disclosure data is cleaned, harmonized, and all disclosed USPTO or EPO patents...
-
European Business Performance Database
The European Business Performance database describes the performance of the largest enterprises in the twentieth century. It covers eight countries that together consistently account for above 80 per cent of western European GDP: Great Britain, Ge...
-
Geocoding of worldwide patent data
The dataset provides geographic coordinates for inventor and applicant locations in 18.8 million patent documents spanning over more than 30 years. The geocoded data are further allocated to the corresponding countries, regions and cities. When th...
-
Classification Data for "Classifying Patents Based on their Semantic Content"
An open consolidated database from raw data on 4 million patents taken from the US patent office from 1976 onward. To build the pattern network, not only do we look at each patent title, but we also examine their full abstract and extract the rele...
-
Google Patents Public Datasets
Worldwide (100+ countries) bibliographic and USPTO full-text, available via BigQuery. Provided by IFI CLAIMS Patent Services, a worldwide bibliographic and US full-text dataset of patent publications. Updated quarterly.
-
Google Patents Research Data
Google Patents Research Data contains the output of much of the data analysis work used in Google Patents (patents.google.com), including machine translations of titles and abstracts from Google Translate, embedding vectors, extracted top terms, s...
-
Government-funded US patents
This includes patent level metadata, 1926-1975 (OCRed from USPTO Image PDF files), 1976-2017 (parsed from USPTO HTML files), patent meta data, CPC, geography, agencies, entity size of the patent owner etc, government support categories at patent l...
-
Replication Data for: Government-funded research increasingly fuels innovation
This includes patent level metadata, 1926-1975 (OCRed from USPTO Image PDF files), 1976-2017 (parsed from USPTO HTML files), patent meta data, CPC, geography, agencies, entity size of the patent owner etc, government support categories at patent l...
-
GPT Indicators
This database contains yearly technology-level measures of Growth, Use Complementarity (UC) and Innovation Complementarity (IC) since 1920 for all technological classes in the United States Patent and Trademark Office (USPTO) classification system...
-
GRID: Global Database of Research Institutes
GRID is a free and openly available global database of over 100,000 research-related organisations, including healthcare organizations, companies, governments, non-profits, each provided with a unique and persistent identifier. In addition to IDs ...
-
Tools for Harmonizing County Boundaries
This tool creates the csv tables that allow county boundaries to be synchronized to a base year, exported to the directory you run this from. While this code takes shape files of any type and preforms an intersect, it was written to follow the met...
-
Historical Cross-Country Technology Adoption (HCCTA) Dataset
This Historical Cross Country Technology Adoption Dataset is a dataset that was collected to allow for the analysis of the adoption patterns of some of the major technologies introduced in the past 250 years across the World's leading industrializ...
-
HistPat Dataset
HistPat provides the geography of historical patents granted by the United States Patent and Trademark Office (USPTO) from 1790 to 1975. This historical dataset is constructed using digitalized records of original patent documents that are publicl...
-
HistPat International Dataset
HistPat International provides the geography of historical patents granted to foreign nationals by the United States Patent and Trademark Office (USPTO) from 1836 to 1975. This historical dataset is constructed using digitalized records of origina...
-
IFI Claims Patent Data Enrichments
IFI CLAIMS Patent Data Enrichments includes standardized assignee/applicant names and integrated legal status information.
-
Indian Patent Advanced Search System
Platform for accessing indian public patents data
-
Inventor disambiguation
-
IPRoduct
The IPRoduct project seeks to link innovative goods to the patents upon which they are based. By directly linking products to patents, this project tracks innovation to the point where it meets consumers, the true commercial end point of investmen...
-
Japanese Patent Office
IIP Patent Database (IIP Patent DB) is a database developed for statistical analysis of patents based on the Japan Patent Office (JPO) “Standardized Data.“ Intellectual Property Institute (IIP) provides the IIP patent DB to further promote paten...
-
Lens.org
Lens serves nearly all of the patent documents in the world as open, annotatable digital public goods that are integrated with scholarly and technical literature along with regulatory and business data. The Lens will allow documents and analyses t...
-
Lens Labs
Links to datasets, APIs, and tools
-
Long-Term Productivity database
The Long-Term Productivity database was created as a project at the Bank of France in 2013 by Antonin Bergeaud, Gilbert Cette and Remy Lecat. Following the work of Cette, Mairesse and Kocoglu (2009), we extended the database to include 17 countrie...
-
The scientific knowledge base of low carbon energy technologies (updated and extended version)
This data publication offers updated data about low-carbon energy technology (LCET) patents and citations links to the scientific literature. Compared to a previous version, it also contains data on biofuels and fuels from waste technologies. The ...
-
Microsoft Academic Graph
The Microsoft Academic Graph is a heterogeneous graph containing scientific publication records, citation relationships between those publications, as well as authors, institutions, journals, conferences, and fields of study.
-
Microsoft Academic Knowledge Graph
A large RDF data set with over eight billion triples with information about scientific publications and related entities, such as authors, institutions, journals, and fields of study. The data set is based on the Microsoft Academic Graph and licen...
-
MatrixWare Research Collection
MAREC Data is a static collection of over 19 million patent applications and granted patents in a unified file format normalized from EP, WO, US, and JP sources, spanning a range from 1976 to June 2008. In MAREC, the documents from different count...
-
Matched inventor ages from patents, based on web scraped sources
We use information about U.S. residing inventors from patents which include name and location and search for age and date of death information from publicly available online web directories and build a scoring system to indicate the qual...
-
Chinese Patent Data Project Dataverse
Matching SIPO patents to Chinese listed firms ("Main Board"). Please refer to the user documentation "Chinese Patent Database User Documentation: Matching SIPO Patents to Chinese Publicly-Listed Companies and Subsidiaries" for more details abou...
-
MIT Scholarly Works Over Time
Scholarly works produced by MIT 1950-2018
-
MIT Scholarly Works Cited by Patents
MIT Scholarly Works Cited by Patents 1950-2018
-
NBER US Patent Data Project
The main dataset extends from Jan 1, 1963, through december 30, 2006, and includes all the utility patents granted during that period. The citations file includes all citations made by patents granted in 1975-1999.
-
NBER Economic Indicators and Releases
Regularly-updated and archived index of economic indicators, including interest rates, stock reserves, home sales, labour statistics and productivity. This page is updated Monday-Friday.
-
NBER Macrohistory Database
During the first several decades of its existence, the National Bureau of Economic Research (NBER) assembled an extensive data set that covers all aspects of the pre-WWI and interwar economies, including production, construction, employment, money...
-
Newpaper.com Index
Index of newspaper.com articles
-
OpenAlex
OpenAlex is a free and open catalog of the world's scholarly papers, researchers, journals, and institutions — along with all the ways they're connected to one another. It is maintained by the non-profit OurResearch.
-
PatCit
In-text and front page citations to non-patent literature and in-text patent citations, extracted and parsed. patCit builds on DOCDB, the largest database of Non Patent Literature (NPL) citations. First, we deduplicate this corpus and organize it ...
-
Patent Citation Similarity
Many studies of innovation rely on patent citations to measure intellectual lineage and impact. To create this dataset, we use a vector space model of patent similarity to compute the technological similarity between each pair of citing-cited pate...
-
Patent Citation Timing and Source
Innovation studies frequently distinguish between patent citation submitted by the patent examiner and those submitted by the patent application. However, publicly available citations data is often misleading, for instance by attributing a patent ...
-
Patent Families Dataset
Patent applicants frequently file groups of patent applications linked together by priority claims. These priority claims create families of patent applications that share features such as inventors, priority dates, and technical descriptions. By ...
-
Geography of patents
-
Patent PDF Samples with Extracted Structured Data
The dataset consists of PDFs in Google Cloud Storage from the first page of select US and EU patents, and BigQuery tables with extracted entities, labels, and other properties, including a link to each file in GCS. The structured data contains lab...
-
On the price elasticity of demand for patents
Fees since 1980 at the European (EPO), the US and the Japanese patent offices.
-
WIPO PATENT REGISTER PORTAL
The WIPO's Patent Register Portal gives details of the availability of online patent registers by country / jurisdiction, as well as their search functionalities and the type of information they provide.
-
Patent Scope and Examiner Toughness
This dataset includes an easy-to-use measure of patent scope that is grounded both in patent law and in the practices of patent attorneys. Our measure counts the number of words in the patents’ first claim. The longer the first claim, the less sco...
-
Patent text: code, data, and new measures
Different open access data files related to the text of USPTO patent documents, including 1) for each US patent a list of processed, cleaned and stemmed keywords, 2) for each patent a list of the 1,000 most similar patents (based on cosine similar...
-
Patent-to-article intext citations for 244 journals
The data contains all articles in 244 journals as described in "In-Text Patent Citations: A User's Guide", and all front-page and in-text citations as found by the algorithm described in this paper.
-
Patent value
The data contains all articles in 244 journals as described in "In-Text Patent Citations: A User's Guide", and all front-page and in-text citations as found by the algorithm described in this paper.
-
PatentCity
PatentCity is a dataset on the location of patentees since the 19th century in Germany, France, Great Britain and the United States of America. Beta available for test! Drop us a mail if you are interested in becoming a beta tester.
-
Patents Citing MIT Publications
This collection encompasses patents that cite the scholarly works of Massachusetts Institute of Technology.
-
PATENTSCOPE
The PATENTSCOPE database provides access to international Patent Cooperation Treaty (PCT) applications in full text format on the day of publication, as well as to patent documents of participating national and regional patent offices.
-
USPTO PatentsView
PatentsView includes US patent data including raw data (summaries, applications, pregrant applications), disambugations of inventors and assignees, and inventor gender estimates. Also foreign priority data, # of figures and sheets, and government...
-
PatentsView Citation data
Citation to foreign patents from US patents (foreigncitation), citation to US patent applications from US patents (usapplicationcitation), citation to US patents from US patents (uspatentcitation), non-patent citations in patents (otherreference)
-
PatentsView Classification data
CPC classifications, NBER classifications (to 2015), USP classificiations, WIPO technology fields, Lookup tables (CPC, USPC, WIPO, NBER, US gov. organizations), botanic info for plant patents.
-
USPTO OCE Patent Examination Research Data (PatEx)
The latest version of PatEx (referred to below as the 2020 release) contains detailed information on nearly 11.9 million publicly-viewable provisional and non-provisional patent applications to the USPTO and over 4.6 million Patent Cooperation Tre...
-
PATSTAT
PATSTAT contains bibliographical and legal event patent data from leading industrialised and developing countries. This is extracted from the EPO’s databases and is either provided as bulk data or can be consulted online.
-
Patstat Register
This database contains bibliographic and legal event data on published European and Euro-PCT patent applications.
Like the core PATSTAT database, it is maintained by the EPO, however PATSTAT Register only contains information about patent applica...
-
Patent Examination Data System
PEDS contains the bibliographic, published document and patent term extension data tabs in Public PAIR from 1981 to present. There is also some data dating back to 1935.The data can be accessed by anyone using the web interface or the provided App...
-
Worldwide Count of Priority Patents
The goal of the project was to produce a dataset of priority patent applications filed across the globe, allocated by inventor and applicant location.
-
USPTO Patent Trial and Appeal Board (PTAB) API Data
USPTO Patent Trial and Appeal Board (PTAB) API Data contains data from the PTAB E2E (end-to-end) system making public America Invents Action (AIA) Trials information and documents available.
This dataset is hosted as a RESTful API with an easy to...
-
Penn World Tables
PWT version 10.0 is a database with information on relative levels of income, output, input and productivity, covering 183 countries between 1950 and 2019. Access to the data is provided in Excel, Stata and online formats.
-
Reliance on Science in Patenting
We introduce an open-access dataset of references from the front pages of patents granted worldwide to scientific papers published since 1800. Each patent-paper linkage is assigned a confidence score, which is characterized in a random sample by f...
-
Semantic Scholar Open Research Corpus
Semantic Scholar's records for research papers published in all fields provided as an easy-to-use JSON archive.
-
Classification Data for "Classifying Patents Based on their Semantic Content"
An open consolidated database from raw data on 4 million patents taken from the US patent office from 1976 onward. To build the pattern network, not only do we look at each patent title, but we also examine their full abstract and extract the rele...
-
SureChEMBL
SureChEMBL is a publicly available large-scale resource containing compounds extracted from the full text, images and attachments of patent documents. The data are extracted from the patent literature according to an automated text and image-minin...
-
CPA Global Technical Standards ETSI Data
European Telecommunications Standards Institute (ETSI) IPR dataset for technical standards. These are the US assets disclosed by companies as related to technical standards in ETSI. The two major ones included are 3GPP and LTE.
-
Open Sourced Database for CEO Dismissal 1992-2018
This is a database of qualitatively coded reasons for a CEO’s dismissal, for S&P 1500 Companies. The maintainers of this dataset run a mailing list with a signup [here](https://docs.google.com/forms/d/e/1FAIpQLSfiZZHwyeWYEZ5fOT1_RygH-ComG9ltad5IUU...
-
Transportation Economics in the 21st Century
Improving access to data sets related to transportation economics and facilitating research with these datasets are cental objectives of this project. Post-doctoral researcher Caitlin Gorback, with advice from from a steering committee including N...
-
UCB Fung Institute Patent Data
Drawing upon recent advances in machine learning and natural language processing, we introduce new tools that automatically ingest, parse, disambiguate and build an updated database using United States patent data. The tools identify unique invent...
-
UK IPO
Snapshots of British patent/SPC applications received and subsequently published by the Intellectual Property Office.
-
Monthly statistics -- Patents, trade marks, and designs
These statistics include monthly data for designs, patents, trade marks.
-
337Info - Unfair Import Investigations Information System
US International Trade Commission 337Info Unfair Import Investigations Information System contains data on investigations done under Section 337. Section 337 declares the infringement of certain statutory intellectual property rights and other for...
-
UniChem
UniChem is large-scale non-redundant database of pointers between chemical structures and EMBL-EBI chemistry resources. Its purpose is to optimise the efficiency with which structure-based hyperlinks may be built and maintained between chemistry-...
-
Patents arising from U.S. government funding
The 3PFL database links information on patented inventions and scientific publications related to a public procurement contract or a research grant awarded by the U.S. Federal Government to detailed contract-level/grant-level information (e.g., aw...
-
US Patent Similarity Data
Pairwise semantic similarity measures for US utility patents. Includes measures for citing/cited patent pairs, 100 most-similar patents for each patent, and doc2vec vectors for each patent.
-
List of USPTO patents from US universities
Using cross-state panel and cross-U.S. commuting-zone data to look at the relationship between innovation, top income inequality and social mobility. From the paper "Innovation and Top Income Inequality" (Aghion, Akcigit, Bergeaud, Blundell, Hémou...
-
USPTO OCE Cancer Moonshot Patent Data
The USPTO Cancer Moonshot Patent Data contains detailed information on published patent applications and granted patents relevant to cancer research and development (R&D). We generate the dataset using USPTO examiner tools to execute a series of q...
-
USPTO OCE Patent Assignment Data
The USPTO allows parties to record assignments of patents and patent applications to, as much as possible, maintain a complete history of claimed interests in a patent. The USPTO also permits recording of other documents that affect title (such as...
-
USPTO OCE Patent Claims Research Data
The Patent Claims Research Dataset contain detailed information on claims from U.S. patents granted between 1976 and 2014 and U.S. patent applications published between 2001 and 2014. The dataset is derived from the Patent Application Publication ...
-
UVA Darden Global Corporate Patent Dataset (disambiguated assignees)
The dataset has information on about 3 million USPTO patents, which were granted between 1980 and 2017, assigned to publicly listed companies worldwide, and linked to those assignee companies using the following identifiers: Unique Patent Number, ...
-
World Bank Development Indicators
World Development Indicators Data is the primary World Bank collection of development indicators, compiled from officially-recognized international sources. It presents the most current and accurate global development data available, and includes ...