10x at #EEsomics - From Single- to Multiomics: Applications and Challenges in Data Integration
At 10x we continue to play a role in the explosion in single cell transcriptomic analysis, but the next challenge the community faces is how to integrate the various “omics” datasets that can now be generated. With that in mind I headed off to Heidelberg this last week to attend the EMBL “From Single- to Multiomics: Applications and Challenges in Data Integration” conference. Having never visited Heidelberg before everyone told me “it’s a beautiful place”, and “make sure you see the sites”… Good advice if it hadn’t rained heavily and solidly for the 3 days I was there, but that shouldn’t detract from the quality of this event!
The first session was entitled "Recent advances in genomics, proteomics & metabolomics” and we got off to a great start with Ruedi Aebersold (ETH Zurich). Use of social media at conferences has become so widespread it’s easy to forget that some presenters may NOT want their work tweeted, particularly if it is yet to be published. But Ruedi started by saying “This data is unpublished, but I don’t mind if you tweet - it will force us to write it up!” He then stated the problem facing the community - “Biology is about phenotypes and their molecular origins; understanding this absolutely requires data integration.”
To illustrate his point he highlighted a case study illustrating a complex integration of CNV, mRNA, steady state proteome, protein turnover, and responsive proteome data. The example showed variation in gene expression in 14 HeLa cell lines but the correlation with any single data type was relatively low, thus “no single data type accurately describes the complexity of molecular systems.” As a result his group utilise the “proteotype” model in order to differentiate different cell types; essentially a SWATH-based proteomic (peptide) analysis that can be applied over time, the method allows focus on specific groups of peptides to classify the input cells. The group term this an "object centric analysis”, but also extend the approach to "complex-centric analysis” whereby the peaks in the analysis matrix are for proteins (rather than peptides) that co-elute. This method allows changes to protein complexes to be quantified in response to external stimulus or perturbation, and can also be applied to quantification of splice isoforms.
Ruth Huttenhain (UCSF) described a method to “spatio-temporally resolve protein interaction networks in living cells.” As a proof-of-principle G protein-coupled receptors (GPCRs) were tagged with APEX (engineered ascorbate peroxidase, see Rhee et al., Science 2013) and biotinylated to monitor the movement and localisation of the GPCRs. The method was then applied to DOR, with the DOR-APEX study illustrating 3 distinct states of protein interactions.
Advances in genome-wide approaches have uncovered unexpected complexities in gene transcription, such as ncRNAs, pausing of Pol II and divergent transcription from promoters. But Andreas Mayer (Max Planck) then discussed how his group have used NET-seq (Native Elongating Transcript Sequencing) to reveal a new feature of Pol LL transcriptional activity: convergent antisense transcription. This convergent transcriptional activity originates locally from an accessible chromatin region - 2 Pol II complexes form on opposite strands with the Pol II on the antisense strand move towards and eventually impeding the progress of the Pol II on the sense strand. Perhaps unsurprisingly this convergent behaviour is a feature of lowly-expressed genes with a significantly lower density of Pol II detected along the gene body.
Next up was Will Greenleaf (Stanford), the group that developed ATAC-seq (pronounced “attack”-seq, as in “attacking” the DNA!). His work showed that for classification of cell types (i.e. the correlation with established cell surface markers), single cell ATAC-seq correlated more strongly than single cell RNA-seq. They also examined the correlation between scATAC-seq and scRNA-seq to study regulatory elements, but acknowledged the 2 omics were derived from separate cell samples and that ideally both data should come from a single cell.
Day 2 got underway with Matthias Heinemann (Groningen) discussing “Cause & Consequences of Replicative Ageing in Budding Yeast”. The conceptual view of ageing postulates the environment impacts cellular state (phenotype) which interacts (impairs) with regulatory systems, but this is incomplete; there is also an ageing “system”, and so we need to examine the phenotype as a function of age. By attaching budding yeast to columns and eluting medium across the columns, the eluate contains daughter cells. This yields transcriptomes and proteomes from mixed-cell samples, so how should these samples be analysed in the context of ageing? Matthias' group start by correlating transcription and the proteome - the correlation starts high (0.7) but uncouples as a function of age, with protein biogenesis-related genes appearing to act as a "causal force" during ageing. So what are “old” cells doing in terms of their metabolism? They assayed metabolite levels/activities at 4 different time points in aged cells - globally, levels drop with age. Furthermore, growth rate, oxygen and glucose uptake all dramatically decrease as a function of age, along with a switch from fermentation to respiration. So the revised ageing model indicates the translational machinery works too hard, cells grow in size, metabolic flux drops, switch to respiration, and drastic metabolic rearrangements. But how does this limit replicative lifespan? Once cells becomes too big, and metabolic flux too low, cells are simply unable to undergo further replication.
Switching to post-translational modifications Pedro Beltrao (EMBL-EBI, United Kingdom) presented “CNVs associated with changes in post-translational regulation in cancer.” Looking at autosomal gene dosage compensation in cancer phenotypes and using matched genomic, transcriptome and proteomic data, Pedro showed that in tumour samples, 23-33% genes have attenuated protein changes associated with changes in gene dosage; changes in protein degradation rate was the suggested likely underlying mechanism. It seems that proteins frequently exist in complexes with the "free proteins" being degraded, but protected when in complex. "We think this is a very common mechanism of control of unassembled complex subunits.." For more details see the publication “Widespread Post-transcriptional Attenuation of Genomic Copy-Number Variation in Cancer”.
Adam Rubin (Stanford) got Day 3 underway by combining scATAC-seq with CRISPR screens in 200+ single cells to map epigenomic regulatory networks in an attempt to explain how cells interact with each other to form complex tissues. By using gRNAs to alter motif accessibility Adam's work can use single and mutliplexed transcription factor targeting to study how different knockouts alter the cellular differentiation trajectories.
Following that we have arguably the best example of using multi-omic data: "Joint profiling of chromatin accessibility, DNA methylation and transcription in single cells” from Ricard Argelaguet (EMBL-EBI). Their method, scNMT-seq, assays chromatin accessibility, DNA methylation, and the transcriptome. Read more in their biorxiv preprint.
Overall the meeting illustrated that there is a long way to go in terms of obtaining and integrating the various "omics", but the potential for unraveling the complexities of what determines cell types and cell fate decisions in health and disease was clear. The meeting returns in 2 years and I look forward to seeing how things have progressed.
Learn more about 10x Genomics solutions for single-cell analysis: