RRBS
This workflow is designed for the analysis of Reduced Representation Bisulfite Sequencing (RRBS) data. It performs adapter trimming, alignment, methylation quantification, and quality control.
Workflow Inputs
- Workflow Name (Optional): A name for your workflow to help identify it in the list of processed workflows.
- FASTQ Folder: The folder containing your FASTQ files, including forward (R1), reverse (R2), and UMI (I1) reads.
Files should follow the naming convention
<Sample Name>_R1.fastq.gz
,<Sample Name>_R2.fastq.gz
, and<Sample Name>_I1.fastq.gz
. - Sample Genome Index (TAR): Path to the reference genome index for the sample. The default is a pre-built rat genome index, but you can override it with your own.
- Spike-in Genome Index (TAR): Path to the reference genome index for the spike-in control (e.g., lambda phage). The default is a pre-built lambda genome index, but you can override it with your own.
- PhiX Genome Index (TAR): Path to the Bowtie2 reference index for PhiX. The default is a pre-built PhiX index, but you can override it with your own.
- Output Report Name: A name or prefix to use for the generated reports and pipeline outputs.
Workflow Steps
The RRBS pipeline consists of several steps, each with additional configurable options:
1. Pre-Trim FastQC
This step assesses the initial quality of raw reads before adapter trimming.
- Additional Configurable Options:
pretrim_fastqc_ncpu
: Number of CPU cores.pretrim_fastqc_ramGB
: Memory allocation in GB.pretrim_fastqc_disk
: Disk space allocation in GB.
2. Attach UMI
Appends UMI information from the I1 file to the read names in R1 and R2 files for downstream analysis.
- Additional Configurable Options:
attach_umi_ncpu
: Number of CPU cores.attach_umi_ramGB
: Memory allocation in GB.attach_umi_disk
: Disk space allocation in GB.
3. Trim Galore (Regular Adapters)
Removes standard adapter sequences from the reads.
- Additional Configurable Options:
trim_reg_adapt_ncpu
: Number of CPU cores.trim_reg_adapt_ramGB
: Memory allocation in GB.trim_reg_adapt_disk
: Disk space allocation in GB.
4. Trim Diversity Adapters
Removes NuGen-specific diversity adapters.
- Additional Configurable Options:
trim_diversity_adapt_ncpu
: Number of CPU cores.trim_diversity_adapt_ramGB
: Memory allocation in GB.trim_diversity_adapt_disk
: Disk space allocation in GB.
5. Post-Trim FastQC
Evaluates the quality of reads after adapter trimming.
- Additional Configurable Options:
posttrim_fastqc_ncpu
: Number of CPU cores.posttrim_fastqc_ramGB
: Memory allocation in GB.posttrim_fastqc_disk
: Disk space allocation in GB.
6. MultiQC
Aggregates and summarizes results from FastQC and Trim Galore steps.
- Additional Configurable Options:
multiqc_ncpu
: Number of CPU cores.multiqc_ramGB
: Memory allocation in GB.multiqc_disk
: Disk space allocation in GB.
7. Align Trimmed Reads (Sample & Spike-in)
Aligns trimmed reads to both the sample and spike-in reference genomes using Bismark.
- Additional Configurable Options (for both Sample and Spike-in):
align_trim_sample/spike_in_ncpu
: Number of CPU cores.align_trim_sample/spike_in_ramGB
: Memory allocation in GB.align_trim_sample/spike_in_disk
: Disk space allocation in GB.
8. Mark UMI Duplicates (Sample & Spike-in)
Identifies and tags UMI duplicates in the aligned BAM files.
- Additional Configurable Options (for both Sample and Spike-in):
tag_udup_sample/spike_in_ncpu
: Number of CPU cores.tag_udup_sample/spike_in_ramGB
: Memory allocation in GB.tag_udup_sample/spike_in_disk
: Disk space allocation in GB.
9. Mark PCR Duplicates (Sample & Spike-in)
Identifies and marks PCR duplicates using Picard's MarkDuplicates.
- Additional Configurable Options (for both Sample and Spike-in):
mark_dup_sample/spike_in_ncpu
: Number of CPU cores.mark_dup_sample/spike_in_ramGB
: Memory allocation in GB.mark_dup_sample/spike_in_disk
: Disk space allocation in GB.
10. Quantify Methylation (Sample & Spike-in)
Quantifies methylation levels using Bismark's methylation extractor.
- Additional Configurable Options (for both Sample and Spike-in):
quant_methyl_sample/spike_in_ncpu
: Number of CPU cores.quant_methyl_sample/spike_in_ramGB
: Memory allocation in GB.quant_methyl_sample/spike_in_disk
: Disk space allocation in GB.
11. Bowtie2 PhiX Alignment
Aligns reads to the PhiX genome using Bowtie2 to assess the level of PhiX contamination.
- Additional Configurable Options:
bowtie2_phix_ncpu
: Number of CPU cores.bowtie2_phix_ramGB
: Memory allocation in GB.bowtie2_phix_disk
: Disk space allocation in GB.
12. SAMTools Mapped
Calculates the percentage of reads mapped to different chromosomes and contigs.
- Additional Configurable Options:
chrinfo_ncpu
: Number of CPU cores.chrinfo_ramGB
: Memory allocation in GB.chrinfo_disk
: Disk space allocation in GB.
13. Collect QC Metrics
Gathers and summarizes quality control metrics from various steps in the pipeline.
- Additional Configurable Options:
collect_qc_ncpu
: Number of CPU cores.collect_qc_ramGB
: Memory allocation in GB.collect_qc_disk
: Disk space allocation in GB.
14. Merge Results
Combines reports and QC metrics files from all samples.
- Additional Configurable Options:
merge_results_ncpu
: Number of CPU cores.merge_results_ramGB
: Memory allocation in GB.merge_results_disk
: Disk space allocation in GB.
Workflow Outputs
The RRBS pipeline produces several outputs, including:
- QC Report: A comprehensive report containing quality control metrics from all steps of the pipeline.
- Alignment BAM files: Aligned reads in BAM format for both sample and spike-in.
- Methylation quantification files: Files containing methylation levels for each cytosine in the genome.
- Bismark reports: Reports generated by Bismark, including alignment and methylation extraction summaries.
- MultiQC report: Aggregated report summarizing results from FastQC and Trim Galore.
Additional Notes
- The default settings for the workflow are suitable for most RRBS datasets, but you may need to adjust them based on the specific characteristics of your data and computational resources.
- You can find more detailed information about each step of the pipeline and the tools used in the corresponding tool documentation.