Entrepôts, Représentation et Ingénierie des Connaissances
Publications du laboratoire

Recherche approfondie

par Année
par Auteur
par Thème
par Type
- United we stand: Using multiple strategies for topic labeling doi link

Auteur(s): Gourru A., Velcin J., Roche Mathieu, Gravier Christophe, Poncelet Pascal

Conference: NLDB: Natural Language Processing and Information Systems (Paris, FR, 2018-06-13)
Actes de conférence: 23rd International Conference on Applications of Natural Language to Information Systems, vol. LNCS p.352-363 (2018)

Ref HAL: lirmm-01910614_v1
DOI: 10.1007/978-3-319-91947-8_37

Topic labeling aims at providing a sound, possibly multi-words, label that depicts a topic drawn from a topic model. This is of the utmost practical interest in order to quickly grasp a topic informa-tional content-the usual ranked list of words that maximizes a topic presents limitations for this task. In this paper, we introduce three new unsupervised n-gram topic labelers that achieve comparable results than the existing unsupervised topic labelers but following different assumptions. We demonstrate that combining topic labelers-even only two-makes it possible to target a 64% improvement with respect to single topic labeler approaches and therefore opens research in that direction. Finally, we introduce a fourth topic labeler that extracts representative sentences, using Dirichlet smoothing to add contextual information. This sentence-based labeler provides strong surrogate candidates when n-gram topic labelers fall short on providing relevant labels, leading up to 94% topic covering.