Fabrice Chatonnet (research engineer), Virginie Girault (research engineer), Fabienne Desmots Loyer (research engineer), Théo Labouret (Master student)
Accumulation of genomic data creates a need to develop pipelines to build regulatory networks linking gene expression and chromatin states. Regulatory databases (s.a. ENCODE, FANTOM5) and published regulatory networks (e.g. www.regulatorycircuits.org) are usually static, not reusable and do not provide user-friendly pipelines. Big repositories or meta-analyses use a statistical approach which is not applicable to small sample size datasets and closely related cell types. Furthermore current database structures and interaction methods do not apply to heterogeneous genomic datasets.
We hypothesize that the information contained in the dataset is hidden by implicit links between the different data types and that only dedicated formal methods of integration are able to reveal this information. We started a collaboration with IRISA's Dyliss research team in bio-informatics (Rennes) in 2017 to develop a novel network inference method that infers dynamic transcriptional regulatory networks (TRN) from gene expression (RNA-seq) and chromatin accessibility (ATACseq) applicable to small sample size dataset. The approach uses semantic web technologies, based on RDF language and SPARQL queries, to find links between regulators (transcription factors - TF), genomic regions (enhancers) and gene expression (Askomics, https://github.com/askomics).
The aim of this axis is to create a dynamic data-warehouse from experimental sequencing data that could be increased (with new data types) and enriched (by knowledge and annotations) at will (Figure 1).
As a proof-of-concept, published cell type-specific regulatory networks (www.regulatorycircuits.org) were re-generated by this approach (Louarn et al., 2019, IEEE eScience Conference).
Figure 1: Pipeline to infer transcriptional regulatory network
- To refine the currently developed pipeline by filtering the TRN to extract top regulators and taking combinational (cooperation / competition) TF effects into account, and apply the pipeline to define TRN in normal B cells (collaboration with P. Chappert, JC Weill – INEM, Paris).
- To investigate the regulatory networks in DLBCL of: i) PD-1+/TIM3+ terminally differentiated (TD) versus PD-1-/TIM3- early differentiated T cells, ii) PD-1+ TCF-1+ stem-like versus terminally differentiated CD4 and CD8 T cells (collaboration with BMS/Celgene and D. Olive IPC, Marseille). Immunotherapies, targeting immune checkpoints (s.a. PD-1, TIM3), have shown a certain efficacy in DLBCL patients. It was also shown that PD-1+ TCF-1+ stem-like CD8 T cells were identified as precursors of TD-CD8 T cells in solid cancers, and lead to a better prognosis under PD-1 blockade.
- To identify non-coding mutations that have an effective impact on regulatory networks in follicular lymphoma. FL is an incurable cancer with high heterogeneity. Current data on coding gene mutations are not sufficient to explain this diversity, nor its strong capacities of relapse, resistance and transformation into more aggressive cancers. We hypothesize that non-coding mutations associated with FL could be new key regulators in FL, hence explain its variability. We will integrate non-coding mutations in the pipeline. The identified FL regulators and mutations will be submitted to biological validation and transferred for mechanistic investigation, in collaboration with Axis IV.