In Detail: Variant Detection in Data Obtained with Swift Biosciences Kits¶
Description¶
Raw paired-end Illumina sequencing reads in fastq format were trimmed using Trimmomatic (Bolger et al. 2014), which removed adapter sequences and reads that were assessed as too short. Quality control of raw and trimmed reads is done with fastqc. Trimmed reads were then mapped using BWA mem aligner (Li and Durbin, 2009), and resulting alignment (BAM) was sorted by position. Primerclip (created by Swift Biosciences) was used to further clip PCR primers. Sorting and read group was added using Picard AddOrReplaceReadGroups. Recalculating MD tags was done using samtools (Li et al. 2009). Part of statistics for the final report were calculated using bedtools coverage (Quinlan and Hall, 2010) and Picard CollectTargetedPcrMetrics. Local alignment of reads was optimized using GATK RealignerTargetCreator to define intervals to targets for local realignment, IndelRealigner to perform the actual realignment, and BaseRecalibrator to detect possible systematic errors in base quality scores (McKenna et al. 2010). Variants were called using GATK’s HaplotypeCaller and LoFreq (Wilm et al. 2012). Overlapping indels were removed using a custom script. Called variants were annotated against GRCh37.75 genome and dbSNP database (build 138). A PDF report was built by combining different quality control metrics, alignment statistics, and variant annotations.
Pipeline Details - Tools¶
Versions of tools and key (or changed from default) values of parameters are given in brackets.
Trimmomatic (v0.36)
seed_mismatches(2)palindrome_clip_threshold(30)simple_clip_threshold(10)trailing(3)minlen(40)
BWA mem (0.7.17-r1188)
-M(true)
primerclip (0.0.2, Swift Biosciences)
Picard AddOrReplaceReadGroups (2.8.1)
RGID (snpID)
RGLB (accelAmp)
RGPL (illumina)
RGPU (Miseq)
samtools calmd (1.7)
bedtools coverage (2.26.0)
Picard CollectTargetedPcrMetrics (2.8.1)
AI (AMPLICON_INTERVALS) and TI (TARGET_INTERVALS) are provided based on bed file provided by the platform or uploaded by user
GATK RealignerTargetCreator (3.6-44-ge7d1cd2)
input: VCF file(s) of known indels
GATK IndelRealigner (3.6-44-ge7d1cd2)
input: VCF file(s) of known indels
GATK BaseRecalibrator (3.6-44-ge7d1cd2)
input: VCF file(s) of known variants
GATK HaplotypeCaller (3.6-44-ge7d1cd2)
-mbq(20)--stand-call-conf(20)--max-reads-per-alignment-start(50)
Lofreq (2.1.3.1)
--call-indels
snpEff (4.3k)
stand_call_conf(20) (GATK)mbq(20) (GATK)min_bq(20) (Lofreq)min_alt_bq(20) (Lofreq)