contributors: Florina Piroi, Mihai Lupu, Allan Hanbury, Veronika Zenz

tags: validation, information retrieval, patents, semantic


description: CLEF was a long-running project to benchmark cross-language information retrieval (IR) models. CLEF-IP was a strand of this benchmarking research that ran from 2010-2013, the data from which was used as a benchmarking activity of the CLEF 2010-2013 conferences. The CLEF-IP collection contains patents, physically stored as a collection of XML fields encoding patent documents. A patent document may be an application document, a search report, or a granted patent document. All textual documents in the CLEF-IP collection contain the following main XML fields: bibliographic data, abstract, description, and claims. Not all documents actually have content in these elds. The datasets are archived on the CLEF-IP site, and in the TU Wien research data repository (with DOIs) here:

last edit: Thu, 27 Jul 2023 08:24:20 GMT