Entrepôts, Représentation et Ingénierie des Connaissances
Publications du laboratoire

Recherche approfondie

par Année
par Auteur
par Thème
par Type
--------------------
- Self-Organized Co-Clustering for textual data synthesis hal link

Auteur(s): Selosse M., Jacques J., Biernacki Christophe

(Document sans référence bibliographique)


Ref HAL: hal-02115294_v1
Exporter : BibTex | endNote
Résumé:

Recently, different studies have demonstrated the interest of co-clustering, which simultaneously produces row-clusters of observations and column-clusters of features. The present work introduces a novel co-clustering model for parsimoniously summarizing textual data in document-term format. In addition to highlighting homogeneous co-clusters-as other existing algorithms do-we also distinguish noisy co-clusters from significant ones, which is particularly useful for sparse document-term matrices. Furthermore, our model proposes a structure among the significant co-clusters and thus provides better interpretability for the user. The approach proposed competes with state-of-the-art methods for document and term clustering, and offers user-friendly results. The model relies on the Poisson distribution, and a constrained version of the Latent Block Model, which is a probabilistic approach for co-clustering. A Stochastic Expectation-Maximization algorithm is proposed to perform the model's inference as well as a model selection criterion to choose the number of co-clusters.