What’s under the hood of Genialis Expressions¶
Overview¶
Genialis Server handles large quantities of biological data, performs complex data analysis, organizes results, and automatically documents your work in a reproducible fashion. It is the heart of Genialis platform and provides data management, Resolwe workflow engine for job execution, fast and robust import of big data, and real-time interactive data visualizations.
Genialis Server supports workflow plug-ins. We provide a Resolwe Bioinformatics plugin with hundreds of bioinformatics tools and pipelines readily available. The server may be extended upon request with client-specific plugins.
Genialis Server also hosts the frontend—a web app that runs in the browser. Interaction with Genialis Server is supported through public RESTful API and Web Socket API.
Auxiliary services installed with Genialis Server include real-time platform monitoring, automated backup, error tracking in Sentry, and Analytics.
Data management¶
The two key concepts are Data and Process. A Process represents an algorithm (a job) that transforms inputs into outputs. It serves as a blueprint for one step in an analysis. Data is an instance of a Process and represents a complete record of an executed computation. It captures the inputs (files, arguments, parameters, etc.), the algorithm used, and the outputs (files, images, numbers, etc.).
As analyses typically produce many Data objects, Genialis Server provides two organizational structures for managing them:
A Sample represents a biological entity. It contains user annotations and Data objects associated with this biological entity. In practice, all Data objects in the Sample are derived from an initial single Data object. In a typical RNA-Seq experiment, for example, a Sample would contain the following Data objects: raw reads, preprocessed reads, alignment (BAM file), and expressions. A Data object can belong to only one Sample.
A Collection is a group of Samples. In addition to Samples and their Data, Collections may contain Data objects that store other analysis results. An example of such Data objects are are differential expression results, which are generated from multiple Samples and therefore cannot belong to a single Sample. While Samples and Data objects may be put into multiple Collections, such practice is discouraged.
Workflow engine Resolwe¶
Resolwe is an open source workflow engine that supports executing jobs on various cluster architectures like SLURM and Kubernetes. The engine resolves dependencies between processes (jobs or tasks), and executes them on worker nodes. Jobs run in Docker containers, ensuring reproducibility across executions. Results are stored in a PostgreSQL database and a clustered file system.
You can read more about Resolwe in the documentation or the Resolwe code repository.
Plugin: Resolwe Bioinformatics¶
Resolwe Bioinformatics is a collection of bioinformatics tools for the Resolwe workflow engine. Pipelines include:
RNA-Seq pipelines (polyA-selection, total-RNA, 3’-prime sequencing) with count-based (STAR, featureCounts) and pseudo-alignment-based (Salmon) quantification methods
Genomic variant detection workflows using RNA-seq data and custom (patent-pending) variant site evaluation methodology
Standardized WES/WGS/panel-based DNA-Seq pipelines, ChIP-seq, ATAC-seq, Cut&Run, and WGBS data analysis.
All pipelines are developed in accordance with community standards and are rigorously validated and tested. The code is open source and available in the Resolwe Bio code repository.. A complete list of tools is provided in the Resolwe Bio documentation.. If a required tool or pipeline is not available, custom or proprietary tools can be added upon request.
ReSDK: A Python-based standard development kit for Genialis Expressions¶
ReSDK for Python enables programmatic interaction with Genialis Server. It can be used to upload and inspect biomedical datasets, add annotations, run analyses, and perform other automated tasks.
A comprehensive ReSDK documentation is available, including tutorials to help users get started quickly. The source code is available on ReSDK code repository.