Pour toute question concernant les séminaires, vous pouvez contacter
Nadia Kabachi (Nadia.Kabachi@univlyon1.fr) et
Jairo Cugliari (Jairo.Cugliari@univlyon2.fr)
(1) Séminaire(s)


Mar. 11/07/2017 10:00 K71, Bâtiment K, RdC DA SILVA Natalia (Iowa State University  UdelaR)
Projection Pursuit Classification Random Forests
Sommaire:
A random forest is an ensemble learning method built on bagged trees with random predictor selection. These features provide improved classification models because they produce information about the variable importance, predictive error, and proximity of observations. A new ensemble learning method for classification problems called projection pursuit random forest (PPF) will be presented. PPF uses the PPtree algorithm introduced in Lee et al.(2013). In PPF, trees are constructed by splitting on linear combinations of randomly chosen variables. Projection pursuit is used to choose a projection of the variables that best separate the classes. Utilizing linear combinations of variables to separate classes takes the correlation between variables into account which allows PPF to outperform the traditional random forest when separations between groups occur in combinations of variables. Previous work using oblique trees in the forest construction have shown positive results in terms of performance but are only for twoclass problems. The method presented here can be used in multiclass problems and is implemented into an R package, PPforest, which is available at https://github.com/natydasilva/PPforest. Additionally, a forest classifier is an example of an ensemble since it is produced by bagging multiple trees. The process of bagging and combining results from multiple trees produces numerous diagnostics which, with interactive graphics, can provide a lot of insight into class structure in high dimensions. Various aspects will be explored in this presentation, to assess model complexity, individual model contributions, variable importance and dimension reduction, and uncertainty in prediction associated with individual observations. The ideas are applied to the random forest algorithm and projection pursuit forest, but could be more broadly applied to other bagged ensembles. Interactive graphics are built in R using the ggplot2, plotly, and shiny packages. Pour plus d'informations, merci de contacter Cugliari J.
