Pipelines Overview

General RNA-Seq analysis pipeline (STAR)

The default General RNA-seq pipeline is comprised of three steps:

  • preprocessing (BBDuk),

  • alignment and quantification (STAR), and

  • normalization (rnanorm).

Evaluation of contaminant rRNA and globin sequences is done on subsampled reads using STAR. Reference files such as genomes and annotations are available for different organisms (human, mouse, rat) and sources (ENSEMBL, UCSC), and they optionally include references for ERCC spike-in controls.

You can find a more detailed description of the pipeline in the In Detail: General RNA-Seq analysis pipeline (STAR) article.

How to run the pipeline

Start by importing your sequencing data and navigate to the Pipelines tab in the Actions card within the collection of your choice. Under the RNA-Seq section choose the General RNA-Seq pipeline (STAR) or an appropriate version of the pipeline based on the library preparation kit that was used to generate the data.

Selecting the pipeline will take you to the Create Analysis page. In the Basic Parameters card you can first define a custom name for the output data object and then select your samples of interest (you will see all the samples belonging to the collection listed in a table). There should be some prefilled input fields below the sample table, such as the species from which your samples came from, assay type, the number of reads in subsampled alignment file or kit selection. The set of basic parameters may vary depending on the exact RNA pipeline of your choice, but make sure the settings are right for your data.

From here on you can also view and modify advanced parameters by expanding the Advanced Parameters card but we recommend starting off with the default parameters.

Once you’re done click the ‘RUN ANALYSIS’ button at the bottom of the page. This will start the process and automatically redirect you to the Collection Details page where you will see new data objects and results appear in the Samples and the Data Objects cards.

Results

Expression files can be visualized within our comprehensive RNA-Seq visualizations. They can also be downloaded for individual samples or as a multi-sample expression matrix. To do the latter, select multiple samples from the Samples card and choose ‘Results’ upon download, then search for the matrix in the downloaded bundle.

A MultiQC report is automatically generated for each sample in the Samples card, and it combines the stats of the raw reads, the mapping stats, and the amount of rRNA and globin reads (which can help you evaluate the success of your depletion).

You can also create a multi-sample report. To do so, find the ‘MultiQC’ tool in the Tool Catalog tab in the Actions card and then select all the read and alignment objects for all of the samples. Your multi-sample QC report should be available within seconds. If ERCC spike-ins were used, you can also run the “Spike-ins quality control” process. Simply select your samples and the ERCC mix used. The report illustrates the correlation between expected and measured concentration of individual ERCC transcripts in each of the samples.

ChIP-Seq pipeline

This ChIP-Seq pipeline closely follows the AQUAS (ENCODE 3) pipeline. It is based on the alignment of short reads using BWA mem and peak calling with MACS 2.0. We recommend providing a bed file with promoter regions (‘Promoter regions BED file’) for additional QC metrics (number/fraction of peaks mapped to promoters, number/fraction of reads mapped to promoters) during the setup of the MACS 2.0 process.

How to run the pipeline

Start by importing your sequencing data. To properly run the pipeline, experimental design has to be set through a graphical interface - specifically, you need to assign a background to each of the experimental samples. In order to do that, navigate to the Collection Details page and click the ‘SET SAMPLE GROUPS’ icon button in the Actions card. In the sample table select all experimental samples with the same background, click the ‘ASSIGN’ button to open the modal window and select Background in the sample relation category dropdown. In the ‘Define sample relation details’ section select the appropriate background and confirm by clicking the ‘ASSIGN’ button. Repeat this process to define all samples, each background group will appear in a separate column.

Now navigate to the Pipelines tab within the Actions card within the collection of your choice. Begin aligning your reads by selecting the ChIPseq (BWA ALN) pipeline listed under the Epigenetics section, which will take you to the Create Analysis page.

In the Basic Parameters card you can first define a custom name for the output data object and then select your samples of interest (you will see all the samples belonging to the collection listed in a table). There should be some prefilled input fields below the sample table, such as the species from which your samples came from and source of genome and annotation. Make sure the settings are right for your data.

From here on you can also view and modify advanced parameters by expanding the Advanced Parameters card where you can for example indicate whether spike-ins were utilized in your sample preparation.

Once you’re done click the ‘RUN ANALYSIS’ button at the bottom of the page. This will start the process and automatically redirect you to the Collection Details page where you will see new data objects and results appear in the Samples and the Data Objects cards.

Next, you can run the peak calling process, MACS 2.0 or MACS 2.0 with ROSE2. Once you select the appropriate version of the ChIP-Seq pipeline listed under the Epigenetics section, you will again be taken to the Create Analysis page.

Select the aligned reads of interest and a Promoter regions BED file if one is available in the Basic Parameters card. From here on you can also view and modify advanced parameters by expanding the Advanced Parameters card.

Once you’re done click the ‘RUN ANALYSIS’ button at the bottom of the page. This will start the process and automatically redirect you to the Collection Details page where you will see new data objects and results appear in the Samples and the Data Objects cards.

Results

The pipeline will create the following data objects that can be found in the Data Objects card:

  • a BAM file (alignment of reads to a reference genome) and associated mapping statistics,

  • called peaks in a form of tables and downloadable genome browser tracks

  • associated QC reports: Called peaks (.xls), Narrow peaks (.gz), QC report (.txt), Pre-peak QC report (case) (.txt), Filtered tagAlign (case) (.gz), Pre-peak QC report (control) (.gz), Filtered tagAlign (control) (.gz), Narrow peaks (BigBed) (.bb), Peak summits (.bed.gz), Peak summits tbi index for JBrowse (.bed.gz.tbi), Summits (bigBed) (.bb), Broad peaks, Broad peaks (bed12/gappedPeak), Treatment pileup (bedGraph) (.bdg), Treatment pileup (bigWig) (.bw), Control lambda (bedGraph) (.bdg), Control lambda (bigwig) (.bw).

You can also create a multi-sample report. To do so, find the MultiQC tool in the Tool Catalog tab in the Actions card and then select three objects for each of the samples (FASTQ file, BWA ALN, and MACS 2.0). Your multisample QC report should be available within seconds.

ATAC-Seq pipeline

The ATAC-seq pipeline now available on Genialis Expressions is an end-to-end solution for the analysis of ATAC-seq samples. This ATAC-seq pipeline closely follows the official ENCODE DCC pipeline. It is comprised of three steps:

  • alignment (Bowtie2),

  • pre-peakcall QC with ENCODE 3 proposed QC metrics (NRF, PBC bottlenecking coefficients, NSC, and RSC), and

  • peak-calling (MACS2) with QC (number of peaks, fraction of reads in peaks - FRiP, number of reads in peaks, and if promoter regions BED file is provided, number of reads in promoter regions, fraction of reads in promoter regions, number of peaks in promoter regions, and fraction of reads in promoter regions).

How to run the pipeline

Start by importing your sequencing data and then navigate to the Pipelines tab in the Actions card. Now choose the ATAC-Seq pipeline listed under the Epigenetics section.

Selecting the pipeline will take you to the Create Analysis page. To run the analysis with the ENCODE defaults, you only need to select your samples and the genome version in the Basic Parameters card, but you may also choose to modify the parameters by expanding the Advanced Parameters card.

Once you’re done click the ‘RUN ANALYSIS’ button at the bottom of the page. This will start the process and automatically redirect you to the Collection Details page where you will see new data objects and results appear in the Samples and the Data Objects cards.

Results

The pipeline produces an alignment file (BAM) which can be downloaded and viewed in a genome browser such as IGV. It comes with the basic mapping statistics. Open chromatin regions are reported in tabular format as well as in BigBed format for easy import and visualization in a genome browser.

As with ChIP-Seq results, you may want to create a single QC report for all processed samples. To do so, find the MultiQC tool in the Tool Catalog tab in the Actions card and then select three objects for each of the samples (FASTQ file, BWA ALN, and MACS 2.0). Your multisample QC report should be available within seconds.

WGBS pipeline

The current Methyl-seq pipeline (Whole Genome Bisulfite Sequencing) available on Genialis Expressions uses WALT (Wildcard Alignment Tool) aligner. Methylation levels at each genomic cytosine is calculated using methcounts. Finally, hypo-methylated regions are identified using hmr. Both methcounts and hmr are part of the MethPipe package from the Smith Lab.

How to run the pipeline

Start by importing your sequencing data. Now navigate to the Pipelines tab in the Action card. Now choose the Whole genome bisulfite sequencing (WGBS) listed under the Epigenetics section.

Selecting the pipeline will take you to the Create Analysis page. In the Basic Parameters card you can first define a custom name for the output data object and then select your samples of interest (you will see all the samples belonging to the collection listed in a table). There should be some prefilled input fields below the sample table, such as the species from which your samples came from, the reference version and your spike-in sequence name. Make sure the settings are right for your data.

From here on you can also view and modify advanced parameters by expanding the Advanced Parameters card but we recommend starting off with the default parameters.

Once you’re done click the ‘RUN ANALYSIS’ button at the bottom of the page. This will start the process and automatically redirect you to the Collection Details page where you will see new data objects and results appear in the Samples and the Data Objects cards.

Results

The pipeline will generate:

  • alignment object (WALT): BAM and mapping stats.

  • methylation levels (methcounts): output in tabular format and BigWig for import into genome browser and QC statistics based on Picard.

  • hypo-methylated regions (hmr): output BED file.

  • MultiQC report with metrics.

The QC metrics contain statistics for evaluating the performance of your sequencing experiments that include statistics on read quality and methylation status.

Haven’t found what you were looking for?

Your own tools and scripts (Python, R) can be wrapped into processes and woven into pipelines. Contact us at support@genialis.com.