Bio-informatics
The Bioinformatics team is composed of 3 bioinformaticians. They take care of :
- Management and analysis of next-generation sequencing data from the lab, including whole exome and whole genome sequencing studies for type II diabetes and obesity
- Starting a service for analysis of external next-generation sequencing data (WES, WGS, RNA-seq, miRNA-seq, Met-seq, ChIP-seq, Capture-C, Hi-C, Single-cell, CNV detection from Exome…)
- Selection and validation, or development, followed by integration of software tools for the acquisition, analysis and interpretation of results
- Creation and maintain of analysis pipelines with cutting edge methods (Nextflow, Slurm, Docker, …)
- Integration of new next-generation sequencing platforms
- Maintenance of an integrative database (GOOD) and databases for genetics results
- Management and extension of clinical databases
- Management of computing resources (servers, storage space, databases, software)
- Update of computing power and storage space (GPU computing, …)
- Computing and bioinformatics support for EGID (European Genomic Institute for Diabetes)
Main future development projects are :
- The development of an online interface for genetic results disposal
Biostatistics
The Bioinformatics team is composed of 2 biostatisticians. They take care of
- Customized statistical analyses, and involvement from study design to statistical analyses of results, within a vast array research projects, collaborations and/or services,
- Implementation of automated pipelines (developed by Mickaël Canouil, Lijiao Ning and Mathilde Boissel: https://github.com/mboissel/analysis-scripts-templates),
- Performing genotype imputation with the Sanger Imputation Service,
- (Single) omics analysis: genome, epigenome and transcriptome-wide association studies (GWAS, EWAS, TWAS),
- Multi-omics analysis: eQTL, mQTL, meQTL, eQTM, mixOmics,
- Gene-centric analysis of rare variants with various methods, including MiST, SKAT, Burden test,
- Machine learning investigation using methods such as k-means, clustering, classification, K-fold validation, …,
- Single-cell analysis (Single cell RNA-seq / Single cell ATAC-seq), clustering (via UMAP or PCA) and differential expression analysis,
- Use of genetic scores: (Genome-wide) Polygenic risk scores listed on GWAS Catalog,
- Mendelian randomization analyses,
- Functional analyses on a given panel of genes or according to a pathway of interest (“Gene-set enrichment analyses” and “Over-representation analyses”), using GO, Reactome and KEGG databases,
- Visualization of results (heatmap, QQplot, Volcano plot, Manhattan plot, …) using the R package {ggplot2},
- Use of data from the UK BioBank (Knowledge of Data-Field and use of ICD-9 and ICD-10),
- Database curation,
- Participation in working groups and (international) consortia.
with technical skills :
- R (advanced),
- Reproducibility, managing tool versions via containerization (Docker), script version management (Git), and the R package {renv}. Github page: https://github.com/umr1283.
- Web application development via the R package {Shiny},
- R package development (CARoT, rain, dmapaq, dgapaq),
- Automation of reports via the R package {Rmarkdown} or Quarto,
- Participation in the writing of scientific articles, with a particular focus on statistical methodology.
And presentations at scientific events :
- Single-Cell Workshop Feedback,
- Shiny web application framework for R,
- Feedback on methylation data analysis,
- Analysis of rare variants from high-throughput sequencing data,
- Feedback on RNA-seq data Analyses,
- t-distributed Stochastic Neighbor Embedding (t-SNE) algorithm.