Principles and Workflow of Whole Genome Bisulfite Sequencing
Principles of whole genome bisulfite sequencing
Epigenetic studies have confirmed that DNA-methylation modification of specific gene regions plays an important role in chromosome conformation and gene expression regulation. Methylation of DNA cytosine residues at the C5 (5meC) is a common epigenetic mark in many eukaryotes and is widely found in CpG or CpHpG (H=A, T, C). There are mainly three approaches, including endonuclease digestion, affinity enrichment, and bisulfite conversion (Table 1). Almost all sequence-specific DNA methylation analysis approaches require a methylation-dependent treatment before amplification or hybridization to maintain fidelity. Various molecular biology techniques, such as next-generation sequencing (NGS), are subsequently performed to detect 5meC residues.
Table 1. Main principles of NGS-based methylation analysis.
Some restriction enzymes, such as HpaII and SmaI, are inhibited by 5meC in the CpG.
Affinity enrichment uses antibodies specific for 5meC or methyl-binding proteins with affinity for profiling of DNA methylation.
Sodium bisulfite chemically turns unmethylated cytosine into uracil, hence enabling methylation detection.
*MCA: methylated CpG island amplification; *HELP: HpaII tiny fragment enrichment by ligation-mediated PCR; *MSCC: methylation-sensitive cut counting; *MeDIP-seq: methylated DNA immunoprecipitation; *MIRA: methylated CpG island recovery assay; *RRBS: reduced representation bisulfite sequencing; *WGBS: whole genome bisulfite sequencing; *BSPP: bisulfite padlock probes.
Bisulfite conversion spurred a revolution in genome methylation analysis in 1990s. Since bisulfite can convert un-methylated cytosines in the genome into uracils and then replaced by thymines during PCR amplification, which can be distinguished from the cytosine originally modified by methylation by counting cytosines and thymines for each position after sequencing (Figure 1). Whole genome bisulfite sequencing (WGBS), as a research method of great significance in this field, applies a combination of bisulfite treatment and next/third generation sequencing technologies (mostly, shotgun sequencing) to study DNA methylation at genomic level.
Figure 1. Bisulfite conversion and PCR amplification prior to DNA sequencing.
Advantages of whole genome bisulfite sequencing
- Making genome-wide methylation profiling possible at a single-base level.
- Assessing the methylation status of almost every CpG locus, including intergenic “gene deserts”, partial methylation domains, and remote regulatory elements.
- Revealing absolute DNA methylation levels and methylation sequence background.
Workflow of whole genome bisulfite sequencing
In short, the basic steps of whole genome bisulfite sequencing (WGBS) include DNA extraction, bisulfite conversion, library preparation, sequencing, and bioinformatics analysis. Here we use Illumina HiSeq as our example to illustrate the workflow of WGBS.
Figure 2. The workflow of whole genome bisulfite sequencing (Khanna et al. 2013).
- DNA Extraction
Firstly, approximately 1-5 mg of tissue samples collected from humans, animals, plants or microorganisms are prepared for DNA. In general, samples for whole-genome bisulfite sequencing need to meet the following four characteristics.
- Hypomethylation (as shown in Figure 3, studies have shown that once the number of CpG sites in a region increases, the sequencing data of WGBS begins to decrease);
iii. Its reference genome has been assembled to the scaffold level at least;
- Relatively complete genome annotations. And then, apply a suitable kit to extract high-purity and high-molecular-weight DNA. The extracted DNA should have a mass of no less than 5 μg, a concentration of no less than 50 ng/ul, and an OD260/280 of 1.8 to 2.0.
Figure 3. Conventional WGBS technology has low coverage of methylation sites (Raine et al. 2016)
- Bisulfite Conversion
Bisulfite conversion is considered to be the “gold standard” for DNA methylation analysis, the principles have been shown in Figure 4. For this method, BS-induced DNA degradation may lead to depletion of genomic regions enriched for unmethylated cytosines. Therefore, it is important to assess the amount of DNA degradation under reaction conditions, and how this affects the desired amplicon should also be considered. Olova et al. (2018) found that DNA degradation is strong in bisulfite conversion protocols that utilize high denaturation or high bisulfite molarity. There are several kits available in the market (Table 2).
Figure 4. Bisulfite-mediated deamination of cytosine (Hayatsu et al. 2004).
Table 2. Bisulfite conversion protocols and parameters.
Zymo EZ DNA Methylation Lightning Kit
Heat-based; 99 °C
EpiTect Bisulfite kit (Qiagen)
Heat-based; 99 °C
EZ DNA Methylation Kit (Zymo Research)
Alkaline-based; 37 °C
- Library Preparation
Take the EpiGnomeTM Methyl-Seq Kit (Epicentre) as an example (as shown in Figure 5), bisulfite-treated single-stranded DNA is random-primed using a polymerase capable of reading uracil nucleotides, to synthesize DNA containing a specific sequence tag. The 3’ end of the newly synthesized DNA strand is then selectively labeled with a second specific sequence, thus a two-marker DNA molecular with a known sequence tag at the 5’ and 3’ ends can be obtained. Illumina P7 and P5 adapters are subsequently added by PCR at the 5 and 3 ends prior to DNA sequencing.
Figure 5. Workflow for the EpiGnomeTM Methyl-Seq Kit.
Hiseq sequencing technology, a novel sequencing method based on sequencing-by-synthesis (SBS), is widely applied for WGBS. The bridge amplification on a flow cell is achieved by using a single molecule array. Since the new reversible blocking technique can synthesize only one base at a time and label the fluorophore, the corresponding laser is used to excite the fluorophore, and the excitation light can be captured to read the base information. Paired-end 150 bp strategy is typically employed in WGBS to sequence 250-300 bp insertion bisulfite-treated DNA libraries. In addition to Illumina HiSeq, PacBio SMRT, Nanopore, Roche 454, and other Illumina platforms are also commonly used for this purpose.
- Data Analysis
A series of analyses can be performed for the sequencing results. Five main types of information analysis are listed in Table 3. In addition, methylation density analysis, differentially methylated region (DMR) analysis, DMR annotation and enrichment analysis (GO/KEGG) and clustering analysis can also be performed. The common bioinformatic resources of WGBS include BDPC, CpGcluster, CpGFinder, Epinexus, MethTools, mPod, QUMA, and TCGA Data Portal.
Table 3. Main types of WGBS data analysis.
Alignment against reference genome
Tools, such as SOAP software, are used to compare the reads with the reference genome sequence, and only the aligned reads will be used for the analysis of methylation information. Align reads allowing C-C matches and C-T mismatches.
Determine mC position throughout the genome. mC ratios are computed by considering read quality and multi-locus mapping probabilities. Discard small-probability alignment that has a low reliability of alignment.
Sequence depth and coverage analysis
An image reflecting the relationship between gene coverage and sequencing depth determines whether methylation discovery can be made with a certain degree of confidence at specific base positions.
Methylation level analysis
The methylation level of each methylated C base is calculated as follows: 100*reads/total reads. The genome-wide average methylation level reflects the overall characteristics of the genomic methylation profile.
Global trends of methylome
The distribution ratio of CG, CHGG and CHH in methylated C bases reflects the characteristics of whole genome methylation maps of specific species to some extent.