Patent Classification using Extreme Multi-label Learning: A Case Study of French Patents - Traitement du Langage Parlé Accéder directement au contenu
Communication Dans Un Congrès Année : 2022

Patent Classification using Extreme Multi-label Learning: A Case Study of French Patents

Résumé

Most previous patent classification methods have treated the task as a general text classification task, and others have tried to implement XML (extreme multi-label learning) methods designed to handle vast numbers of classes. However, they focus only on the IPC subclass level, which has fewer than 700 labels and is far from "extreme." This paper presents a French Patents corpus INPI-CLS extracted from the INPI internal database. It contains all parts of patent texts (title, abstract, claims, description) published from 2002 to 2021, with IPC labels at all levels. We test different XML methods and other classification models at the subclass and group levels of the INPI-CLS dataset with about 600 and 7k labels, respectively, demonstrating the XML approach's validity to patent classification.
Fichier principal
Vignette du fichier
PatentSemTech_2022___extended_abstract.pdf (373.43 Ko) Télécharger le fichier
pdddpqmtwvttsdsjxwsywmchssbskrsv.zip (1.22 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03850405 , version 1 (14-11-2022)

Identifiants

  • HAL Id : hal-03850405 , version 1

Citer

You Zuo, Houda Mouzoun, Samir Ghamri Doudane, Kim Gerdes, Benoît Sagot. Patent Classification using Extreme Multi-label Learning: A Case Study of French Patents. SIGIR 2022 - PatentSemTech workshop - 3rd Workshop on Patent Text Mining and Semantic Technologies, Jul 2022, Madrid, Spain. ⟨hal-03850405⟩
222 Consultations
210 Téléchargements

Partager

Gmail Facebook X LinkedIn More