Entrepôts, Représentation et Ingénierie des Connaissances
Publications du laboratoire

Recherche approfondie

par Année
par Auteur
par Thème
par Type
- TOM: A library for topic modeling and browsing hal link

Auteur(s): Guille A., Soriano-Morales Edmundo-Pavel

Conference: Conférence sur l'Extraction et la Gestion des Connaissances (Reims, FR, 2016-01-18)
Actes de conférence: , vol. p. (2016)

Ref HAL: hal-01442868_v1

In this paper, we present TOM (TOpic Modeling), a Python library for topic modeling and browsing. Its objective is to allow for an efficient analysis of a text corpus from start to finish, via the discovery of latent topics. To this end, TOM features advanced functions for preparing and vectorizing a text corpus. It also offers a unified interface for two topic models (namely LDA using either variational inference or Gibbs sampling, and NMF using alternating least-square with a projected gradient method), and implements three state-of-the-art methods for estimating the optimal number of topics to model a corpus. What is more, TOM constructs an interactive Web-based browser that makes exploring a topic model and the related corpus easy.