Bio-informatics / Bio-statistics


The Bioinformatics team is composed of 4 bioinformaticians. They take care of :

  • Management and analysis of next-generation sequencing data from the lab, including whole exome and whole genome sequencing studies for type II diabetes and obesity
  • Starting a service for analysis of external next-generation sequencing data (WES, WGS, RNA-seq, miRNA-seq, Met-seq, ChIP-seq, Capture-C, Hi-C, Single-cell, CNV detection from Exome…)
  • Selection and validation, or development, followed by integration of software tools for the acquisition, analysis and interpretation of results
  • Creation and maintain of analysis pipelines with cutting edge methods (Nextflow, Slurm, Docker, …)
  • Integration of new next-generation sequencing platforms
  • Maintenance of an integrative database (GOOD) and databases for genetics results
  • Management and extension of clinical databases
  • Management of computing resources (servers, storage space, databases, software)
  • Update of computing power and storage space (GPU computing, …)
  • Computing and bioinformatics support for EGID (European Genomic Institute for Diabetes)

Main future development projects are :

  • The development of an online interface for genetic results disposal



Who are we?

The biostatistics team is in charge of the statistical analyses and supports the researchers in the design of studies, for grant applications.
The team currently counts three members and is supervised by Mickaël Canouil (Ph.D.).

What do we do?

The genetics of obesity and type 2 diabetes (T2D) has made significant advances in the last years. A lot of variants consistently associated with metabolic traits have been discovered by several teams, including ours, thus supporting the appropriateness of the genetic approach for elucidation of the molecular basis of common diseases. However, the genetic varians discovered so far readily explain a small proportion of the overall expected heritability. In this context, the main objective of our team consists in bringing a strong methodological support to the different strategies tackled to explore the remaining part of the heritability. These strategies go from genes prioritisation, association testing of low-frequency/rare variants with the disease, to the exploration of epigenetic marks and its effect in the susceptibility of T2D. Moreover, the methodological development and the visualisation tools development, to analyse data generated through the next generation sequencing technologies (e.g., MethylSeq, RNAseq, etc.), keep a part of our activity.


We developed a series of scripts that automatically annotate and rank genes using results from our studies combined with data from publicly available resources (e.g., NCBI, dbSNP, HugeNavigator and UCSC). Our team has built a pipeline for testing association of rare variants with T2D or obesity. As individual methylation profile is being increasingly suspected to play a role in the susceptibility to T2D, we have been building a pipeline for genome-wide methylation analyses. Our expertise in the R environment and in the package “Shiny” allowed us to develop several web applications, used to analyse and to navigate through our data (e.g., NanoString, qPCR, etc.). We have expanded our expertise in genetic association studies beyond the metabolism-related traits to other phenotypic traits being either cancer or haematological traits through collaborations with international consortiums.


  • Genome-wide-association studies:
    • cross-sectional and longitudinal,
    • case-control and family.
  • Detection of chromosomal events (mosaic and CNV).
  • Genes Prioritisation.
  • Analyses of transcriptomics, epigenetics et metabolomics.

fig_biostat_2019Adapted from Canouil et al. (2018)


[2019] Canouil, M., Bouland, G. A., Bonnefond, A., Froguel, P., Hart, L. M. ’t, & Slieker, R. C. (2019). NACHO: An R package for quality control of NanoString nCounter data. Bioinformatics

[2018] Canouil, M., Balkau, B., Roussel, R., Froguel, P., & Rocheleau, G. (2018). Jointly modelling single nucleotide polymorphisms with longitudinal and time-to-event trait: An application to type 2 diabetes and fasting plasma glucose. Frontiers in Genetics9

[2016] Yengo, L., Arredouani, A., Marre, M., Roussel, R., Vaxillaire, M., Falchi, M., Haoudi, A., Tichet, J., Balkau, B., Bonnefond, A., & Froguel, P. (2016). Impact of statistical models on the prediction of type 2 diabetes using non-targeted metabolomics profiling. Molecular Metabolism5(10), 918–925.

[2016] Yengo, L., Jacques, J., Biernacki, C., & Canouil, M. (2016). Variable clustering in high-dimensional linear regression: The r package clere. The R Journal8(1), 92–106.