In Detail: CHIP-seq pipeline¶
Description¶
This ChIP-Seq pipeline closely follows the AQUAS (ENCODE 3) pipeline. It is based on the alignment of short reads using BWA mem and peak calling with MACS 2.0. We recommend providing a bed file with promoter regions (‘Promoter regions BED file’) for additional QC metrics (number/fraction of peaks mapped to promoters, number/fraction of reads mapped to promoters) during the setup of the MACS 2.0 process.
Analysis is done in two separate steps, mapping using (BWA ALN) with reference genomes for mouse, rat, or human (UCSC, version hg38 or hg 19 and Ensembl, version 100 or 92) and peak calling with MACS 2.0 or MACS 2.0-ROSE2.
Pipeline Details - Tools and Parameters¶
Unless stated otherwise, all parameter values are set to tool defaults.
Read alignment using BWA ALN (v0.7.17-r1188)
The BWA ALN process will align your sample reads to a reference genome. The outputs from this process will get you an alignment file (BAM) which can be downloaded to view in a genome browser such as IGV. It will also give you a statistical file that you can view to see how well your reads are aligned to the reference genome and will give you a good idea of how well your sequencing worked.
The BWA process will generate the following outputs:
Alignment file (.bam)
Index BAI (.bam.bai)
Unmapped reads (.fastq.gz)
Statistics (.stats.txt)
BigWig file (.bw)
Species
Build
Peak calling using MACS2.0 (v2.1.1.20160309)
The MACS2.0 process is a peak calling process carried out once your reads are aligned to a reference genome. It is commonly used for identifying transcription factor binding sites in ChIP-seq analysis. The peaks files and BigBed files can also be downloaded and put into a genome browser for viewing.
The MACS2.0 process will generate the following outputs:
Called peaks (.xls) - a tabular file which contains information about called peaks. Additional information includes pileup and fold enrichment
Narrow peaks (.gz) - BED6+4 format file which contains the peak locations together with peak summit, p-value and q-value
QC report (.txt) - This txt file will give you information on the FRiP: fraction of reads in peaks score and number of peaks that were called.
FRiP: useful metric to measure global ChIP enrichment, gives information on success of IP, good quality FRiP > 5% (most of the time).
Pre-peak QC report (case) (.txt) - This txt file will give you basic QC information on the total number of reads, mapped reads, etc.
Filtered tagAlign (case) (.gz)
Pre-peak QC report (control) (.gz)
Filtered tagAlign (control) (.gz)
Narrow peaks (BigBed) (.bb)
Peak summits (.bed.gz) - peak summits locations for every peak. To find the motifs at the binding sites, this file is recommended
Peak summits tbi index for JBrowse (.bed.gz.tbi)
Summits (bigBed) (.bb)
Broad peaks
Broad peaks (bed12/gappedPeak)
Treatment pileup (bedGraph) (.bdg) - bedGraph format for treatment sample
Treatment pileup (bigWig) (.bw)
Control lambda (bedGraph) (.bdg) - bedGraph format for input sample
Control lambda (bigwig) (.bw)
Model
Species
Build
NOTE: The current parameters of the MACS2.0 process is optimized for narrow peaks.