Bio-informatics / Bio-statistics

Bio-informatics

The Bioinformatics team is composed of 3 bioinformaticians. They take care of :

Management and analysis of next-generation sequencing data from the lab, including whole exome and whole genome sequencing studies for type II diabetes and obesity
Starting a service for analysis of external next-generation sequencing data (WES, WGS, RNA-seq, miRNA-seq, Met-seq, ChIP-seq, Capture-C, Hi-C, Single-cell, CNV detection from Exome…)
Selection and validation, or development, followed by integration of software tools for the acquisition, analysis and interpretation of results
Creation and maintain of analysis pipelines with cutting edge methods (Nextflow, Slurm, Docker, …)
Integration of new next-generation sequencing platforms
Maintenance of an integrative database (GOOD) and databases for genetics results
Management and extension of clinical databases
Management of computing resources (servers, storage space, databases, software)
Update of computing power and storage space (GPU computing, …)
Computing and bioinformatics support for EGID (European Genomic Institute for Diabetes)

Main future development projects are :

The group is composed of five biostatisticians:

Customized statistical analyses, and involvement from study design to statistical analyses of results, within a vast array research projects, collaborations and/or services,
Implementation of automated pipelines (developed by Mickaël Canouil, Lijiao Ning and Mathilde Boissel: https://github.com/mboissel/analysis-scripts-templates),
Performing genotype imputation with the Sanger Imputation Service,
(Single) omics analysis: genome, epigenome and transcriptome-wide association studies (GWAS, EWAS, TWAS),
Multi-omics analysis: eQTL, mQTL, meQTL, eQTM, mixOmics,
Gene-centric analysis of rare variants with various methods, including MiST, SKAT, Burden test,
Machine learning investigation using methods such as k-means, clustering, classification, K-fold validation, …,
Single-cell analysis (Single cell RNA-seq / Single cell ATAC-seq), clustering (via UMAP or PCA) and differential expression analysis,
Use of genetic scores: (Genome-wide) Polygenic risk scores listed on GWAS Catalog,
Mendelian randomization analyses,
Functional analyses on a given panel of genes or according to a pathway of interest (“Gene-set enrichment analyses” and “Over-representation analyses”), using GO, Reactome and KEGG databases,
Visualization of results (heatmap, QQplot, Volcano plot, Manhattan plot, …) using the R package {ggplot2},
Use of data from the UK BioBank (Knowledge of Data-Field and use of ICD-9 and ICD-10),
Database curation,
Participation in working groups and (international) consortia.

R (advanced),
Reproducibility, managing tool versions via containerization (Docker), script version management (Git), and the R package {renv}. Github page: https://github.com/umr1283.
Web application development via the R package {Shiny},
R package development (CARoT, rain, dmapaq, dgapaq),
Automation of reports via the R package {Rmarkdown} or Quarto,
Participation in the writing of scientific articles, with a particular focus on statistical methodology.