Ent?te

Logo du LIFL

Depuis le 1er janvier 2015 le LIFL et le LAGIS forment le laboratoire CRIStAL

  1. Doctoral studies

Thesis of

Julie Hamon

Tuesday 26 November 2013
Amphithéâtre de l'IRCICA

Optimisation combinatoire pour la sélection de variables en régression en grande dimension : Application en génétique animale

Directeur de Thèse : Clarisse DHAENENS, Professeur, Université Lille 1 Julien JACQUES, Maître de conférences HDR, Université Lille 1 Rapporteurs : Charles BOUVEYRON, Professeur, Université Paris Descartes Frédéric LARDEUX, Maître de conférences HDR, Université d'Angers Membres : Laurence DUCHIEN, Professeur, Université Lille 1 Stéphane CHRÉTIEN, Maître de conférences, Université de Franche-Comté Claude GRENIER, Directeur du développement, Gènes Diffusion

Advances in high-throughput sequencing and genotyping technologies allow to measure large amounts of genomic information.
The aim of this work is dedicated to the animal genomic selection is to select a subset of relevant genetic markers to predict a quantitative trait, in a context where the number of genotyped animals is widely lower than the number of markers studied.
We introduce a state-of-the-art of existing methods to address the problem and then suggest to deal with the variable selection in high dimensional regression problem combining combinatorial optimization methods and statistical models.
We start by experimentally set two combinatorial optimization methods, the iterated local search and the genetic algorithm, combined with a linear multiple regression and we evaluate their relevance. In the context of animal genomic, family relationships between animals are known and can be an important information.
As our approach is flexible we suggest an adaptation to consider these familial relationships through the use of a mixed model. Moreover, the problem of over-fitting is particularly present in such data due to the large imbalance between the number of variables studied and the number of animals available, so we suggest an improvement of our approach in order to reduce this over-fitting.
The different suggested approaches are validated on data from the literature as well as on real data of Gènes Diffusion.

Ours

UMR 8022 - Laboratoire d'Informatique Fondamentale de Lille - Copyright © 2012 Sophie TISON - Crédits & Mentions légales

Page respectant XHTML et CSS.

Pour tout commentaire / Comments and remarks : webmaster