Publications du laboratoire
(103) Production(s) de LALLICH S.


Computing the Mutual Constrained Independence Model
Auteur(s): Delacroix Thomas, Lenca Philippe, Lallich S.
Conference: ASMDA 2017 : 17th Conference of the Applied Stochastic Models and Data Analysis International Society (London, GB, 20170606)
Ref HAL: hal01582632_v1
Résumé: Developed for applications in itemset mining, the notion of Mutual Constrained Independence is a natural generalization of the notion of mutual independence. If the mutual independence model on a finite number of events can be seen as the least binding model for the probabilities of any finite intersection of these events, given the probabilities of each of these events, then the Mutual Constrained Independence Model on a finite number of events can be seen as the least binding model for the probabilities of any finite intersection of these events, given the probabilities of any number of such intersections of events. In this article, we present a first detailed and effective means of computing the Mutual Constrained Independence Model. We show the efficiency of our algorithm and the adequacy of the model by applying it to various examples. A test for the Mutual Constrained Independence Hypothesis is also presented.



ClusPath: a temporaldriven clustering to infer typical evolution paths
Auteur(s): Rizoiu M., Velcin J., Bonnevay S., Lallich S.
(Article) Publié:
Data Mining And Knowledge Discovery, vol. 30 p.13241349 (2016)
Résumé: We propose ClusPath, a novel algorithm for detecting general evolution tendencies in a population of entities. We show how abstract notions, such as the Swedish socioeconomical model (in a political dataset) or the companies fiscal optimization (in an economical dataset) can be inferred from lowlevel descriptive features. Such highlevel regularities in the evolution of entities are detected by combining spatial and temporal features into a spatiotemporal dissimilarity measure and using semisupervised clustering techniques. The relations between the evolution phases are modeled using a graph structure, inferred simultaneously with the partition, by using a “slow changing world” assumption. The idea is to ensure a smooth passage for entities along their evolution paths, which catches the long term trends in the dataset. Additionally, we also provide a method, based on an evolutionary algorithm, to tune the parameters of ClusPath to new, unseen datasets. This method assesses the fitness of a solution using four opposed quality measures and proposes a balanced compromise.



Comparison of two topological approaches for dealing with noisy labeling
Auteur(s): Rico F., Muhlenbach Fabrice, Zighed D. A., Lallich S.
(Article) Publié:
Neurocomputing, vol. 160 p.3  17 (2015)
Ref HAL: hal01524431_v1
DOI: 10.1016/j.neucom.2014.10.087
Résumé: This paper focuses on the detection of likely mislabeled instances in a learning dataset. In order to detect potentially mislabeled samples, two solutions are considered which are both based on the same framework of topological graphs. The first is a statistical approach based on Cut Edges Weighted statistics (CEW) in the neighborhood graph. The second solution is a Relaxation Technique (RT) that optimizes a local criterion in the neighborhood graph. The evaluations by ROC curves show good results since almost 90% of the mislabeled instances are retrieved for a cost of less than 20% of false positive. The removal of samples detected as mislabeled by our approaches generally leads to an improvement of the performances of classical machine learning algorithms.



Warehousing Complex Archaeological Objects
Auteur(s): Oztürk A., Eyango Louis, Waksman Sylvie Yona, Lallich S., Darmont J.
Conference: 9th International and Interdisciplinary Conference on Modeling and Using Context (CONTEXT 2015) (Larnaca, CY, 20151102)
Actes de conférence: Proceedings of the 9th International and Interdisciplinary Conference on Modeling and Using Context (CONTEXT 2015), vol. 9405 p.226239 (2015)
Ref HAL: hal01355158_v1
Ref Arxiv: 1608.06469
Ref. & Cit.: NASA ADS
Résumé: Data organization is a difficult and essential component in cultural heritage applications. Over the years, a great amount of archaeological ceramic data have been created and processed by various methods and devices. Such ceramic data are stored in databases that concur to increase the amount of available information rapidly. However , such databases typically focus on one type of ceramic descriptors, e.g., qualitative textual descriptions, petrographic or chemical analysis results, and do not interoperate. Thus, research involving archaeological ceramics cannot easily take advantage of combining all these types of information. In this application paper, we introduce an evolution of the Ceramom database that includes text descriptors of archaeological features, chemical analysis results, and various images, including petrographic and fabric images. To illustrate what new analyses are permitted by such a database, we source it to a data warehouse and present a sample online analysis processing (OLAP) scenario to gain deep understanding of ceramic context.



Constrained Independence for Detecting Interesting Patterns
Auteur(s): Delacroix Thomas, Boubekki Ahcène, LENCA Philippe, Lallich S.
Conference: DSAA 2015 : IEEE International Conference on Data Science and Advanced Analytics (Paris, FR, 20151019)
Actes de conférence: , vol. p.1  10 (2015)
Ref HAL: hal01247650_v1
DOI: 10.1109/DSAA.2015.7344897
Résumé: Among other criteria, a pattern may be interesting if it is not redundant with other discovered patterns. A general approach to determining redundancy is to consider a probabilistic model for frequencies of patterns, based on those of patterns already mined, and compare observed frequencies to the model. Such probabilistic models include the independence model, partition models or more complex models which are approached via randomization for a lack of an adequate tool in probability theory allowing a direct approach. We define constrained independence, a generalization to the notion of independence. This tool allows us to describe probabilistic models for evaluating redundancy in frequent itemset mining. We provide algorithms, integrated within the mining process, for determining nonredundant itemsets. Through experimentations, we show that the models used reveal high rates of redundancy among frequent itemsets and we extract the most interesting ones.



Guest editor's introduction: special issue on quality issues, measures of interestingness and evaluation of data mining models  Journal of Intelligent Information Systems
Auteur(s): LENCA Philippe, Lallich S.
Ouvrage: Springer, vol. 45(3) (2015) 3p.
Ref HAL: hal01243257_v1
Résumé: This special issue contains seven revised and extended papers from QIMIE'13, the third edition of the Quality Issues, Measures of Interestingness and Evaluation of data mining models Workshop, organized in association with PAKDD'13 conference (PacificAsia Conference on Knowledge Discovery and Data Mining, Gold Coast, Australia, April 1417, 2013)



Indices de qualité en clustering
Auteur(s): Lallich S., LENCA Philippe
Conférence invité: Journée thématique : clustering et coclustering (Issy Les Moulineaux, FR, 20151020)
Ref HAL: hal01230854_v1
Résumé: L'absence de vérité de terrain, entre autres, fait que l'évaluation d'un clustering est un problème non trivial pour lequel il est nécessaire d'utiliser des indices de qualité adaptés au but recherché et aux données. L'exposé présentera les éléments clés pour caractériser un indice de qualité, les principaux indices internes et externes et une approche axiomatique pour le choix d'un indice.
