Bio-informatics / Bio-statistics


The Bioinformatics team is composed of 3 bioinformaticians. They take care of :

  • Management and analysis of next-generation sequencing data from the lab, including whole exome and whole genome sequencing studies for type II diabetes and obesity
  • Starting a service for analysis of external next-generation sequencing data (WES, WGS, RNA-seq, miRNA-seq, Met-seq, ChIP-seq, Capture-C, Hi-C, Single-cell, CNV detection from Exome…)
  • Selection and validation, or development, followed by integration of software tools for the acquisition, analysis and interpretation of results
  • Creation and maintain of analysis pipelines with cutting edge methods (Nextflow, Slurm, Docker, …)
  • Integration of new next-generation sequencing platforms
  • Maintenance of an integrative database (GOOD) and databases for genetics results
  • Management and extension of clinical databases
  • Management of computing resources (servers, storage space, databases, software)
  • Update of computing power and storage space (GPU computing, …)
  • Computing and bioinformatics support for EGID (European Genomic Institute for Diabetes)

Main future development projects are :

  • The development of an online interface for genetic results disposal



Who are we?

The group is composed of three biostatisticians:


Ou expertise :

  •  Customized statistical analyses, and involvement from study design to statistical analyses of results, within a vast array research projects, collaborations and/or services,
  • Implementation of automated pipelines (developed by Mickaël Canouil, Lijiao Ning and Mathilde Boissel:,
  • Performing genotype imputation with the Sanger Imputation Service,
  • (Single) omics analysis: genome, epigenome and transcriptome-wide association studies (GWAS, EWAS, TWAS),
  • Multi-omics analysis: eQTL, mQTL, meQTL, eQTM, mixOmics,
  • Gene-centric analysis of rare variants with various methods, including MiST, SKAT, Burden test,
  • Machine learning investigation using methods such as k-means, clustering, classification, K-fold validation, …,
  • Single-cell analysis (Single cell RNA-seq / Single cell ATAC-seq), clustering (via UMAP or PCA) and differential expression analysis,
  • Use of genetic scores: (Genome-wide) Polygenic risk scores listed on GWAS Catalog,
  • Mendelian randomization analyses,
  • Functional analyses on a given panel of genes or according to a pathway of interest (“Gene-set enrichment analyses” and “Over-representation analyses”), using GO, Reactome and KEGG databases,
  • Visualization of results (heatmap, QQplot, Volcano plot, Manhattan plot, …) using the R package {ggplot2},
  • Use of data from the UK BioBank (Knowledge of Data-Field and use of ICD-9 and ICD-10),
  • Database curation,
  • Participation in working groups and (international) consortia.


Our technical skills :

  • R (advanced),
  • Reproducibility, managing tool versions via containerization (Docker), script version management (Git), and the R package {renv}. Github page:
  • Web application development via the R package {Shiny},
  • R package development (CARoTraindmapaqdgapaq),
  • Automation of reports via the R package {Rmarkdown} or Quarto,
  • Participation in the writing of scientific articles, with a particular focus on statistical methodology.


Presentations at scientific events :