decorative thumbnail

Google Patent Phrase Similarity Dataset


contributors: Grigor Aslanyan, Ian Wetherbee

tags: phrases, similarity, semantic matching, validation



terms of_use: Please cite the paper if you use the dataset.

description: This is a human rated contextual phrase to phrase matching dataset focused on technical terms from patents. In addition to similarity scores that are typically included in other benchmark datasets we include granular rating classes similar to WordNet, such as synonym, antonym, hypernym, hyponym, holonym, meronym, domain related. The dataset was used in the U.S. Patent Phrase to Phrase Matching competition.

last edit: Mon, 19 Jun 2023 16:47:03 GMT