In Detail: ATAC-seq pipeline¶
Description¶
The ATAC-seq pipeline now available on Genialis Expressions is an end-to-end solution for the analysis of ATAC-seq samples. This ATAC-seq pipeline closely follows the official ENCODE DCC pipeline. It is comprised of three steps; alignment (Bowtie2) with reference genomes for mouse, rat, or human (Ensembl, version 100 and 92) and extended with references for common spike-in standards (ERCC and SIRV), pre-peakcall QC with ENCODE 3 proposed QC metrics (NRF, PBC bottlenecking coefficients, NSC, and RSC), and peak-calling (MACS2) with QC (number of peaks, fraction of reads in peaks - FRiP, number of reads in peaks, and if promoter regions BED file is provided, number of reads in promoter regions, fraction of reads in promoter regions, number of peaks in promoter regions, and fraction of reads in promoter regions).
Pipeline Details - Tools and Parameters¶
Unless stated otherwise, all parameter values are set to tool defaults.
Read alignment using Bowtie2 (v2.3.4.1)
The Bowtie2 process will align your sample reads to a reference genome. The outputs from this process will get you an alignment file (BAM) which can be downloaded to view in a genome browser such as IGV. It will also give you a statistic file that you can view to see how well your reads are aligned to the reference genome and will give you a good idea of how well your sequencing worked.
The Bowtie2 process will generate the following outputs:
Alignment file (.bam)
Index BAI (.bam.bai)
Unmapped reads (.fastq.gz)
Statistics (.stats.txt)
BigWig file (.bw)
Species
Build
Peak calling using MACS2.0 (v2.1.1.20160309)
The MACS2.0 process is a peak calling process carried out once your reads are aligned to a reference genome. It is commonly used for identifying transcription factor binding sites in ChIP-seq analysis, but is also used to locate open chromatin regions on DNA in ATAC-seq. The peaks files and BigBed files can also be downloaded and put into a genome browser for viewing.
The MACS2.0 process will generate the following outputs:
Called peaks (.xls) - a tabular file which contains information about called peaks. Additional information includes pileup and fold enrichment
Narrow peaks (.gz) - BED6+4 format file which contains the peak locations together with peak summit, p-value and q-value
QC report (.txt)
Pre-peak QC report (case) - This txt file will give you basic QC information on the total number of reads, mapped reads, etc.
Filtered tagAlign (case)
Filtered BAM (case)
Filtered BAM index (case)
Pre-peak QC report (control) - This txt file will give you basic QC information on the total number of reads, mapped reads, etc.
Filtered tagAlign (control)
Filtered BAM (control)
Filtered BAM index (control)
Narrow peaks (BigBed)
Peak summits - peak summits locations for every peak. To find the motifs at the binding sites, this file is recommended
Peak summits tbi index for JBrowse
Summits (bigBed)
Broad peaks
Broad peaks (bed12/gappedPeak)
Treatment pileup (bedGraph)
Treatment pileup (bigWig)
Control lambda (bedGraph)
Control lambda (bigwig)
Model
Species
Build