Component Details
The AfterQC project is designed for comprehensive quality control and preprocessing of sequencing data. Its main flow involves an Application Orchestrator that manages the overall pipeline, delegating tasks to components responsible for reading and writing data, performing initial read preprocessing and quality assessment, detecting and processing structural anomalies like bubbles, and finally generating detailed quality control reports for user interpretation.
Application Orchestrator
The Application Orchestrator component, primarily after.py, serves as the central control unit for the AfterQC pipeline. It is responsible for parsing command-line arguments, discovering input FASTQ files, initiating parallel processing jobs, and coordinating the overall workflow by delegating tasks to the Read Preprocessing & Quality Control and Structural Anomaly Processing components.
Data I-O & Utilities
The Data I/O & Utilities component, encompassing fastq.py and util.py, provides fundamental functionalities for handling FASTQ formatted sequencing data, including reading, writing, and manipulation. Additionally, it offers a collection of general-purpose helper functions and common utilities used across various parts of the AfterQC project, promoting code reusability and maintainability.
Read Preprocessing & Quality Control
This component, comprising preprocesser.py, qualitycontrol.py, and barcodeprocesser.py, is responsible for the initial stages of data preparation and core quality assessment. It performs tasks such as trimming, filtering based on quality, N-base content, and sequence length, processes barcodes for demultiplexing, and conducts comprehensive quality analysis including read quality, GC content, base content, and kmer distribution.
Structural Anomaly Processing
The Structural Anomaly Processing component, including bubbledetector.py, bubbleprocesser.py, circledetector.py, and debubble.py, focuses on identifying, detecting, and processing specific structural artifacts and anomalies within sequencing data. This involves detecting 'bubbles' and circular DNA/RNA structures, and implementing algorithms for 'debubbling' to refine or remove these artifacts, thereby improving data quality.
Reporting & Visualization
The Reporting & Visualization component, primarily qcreporter.py, is dedicated to generating comprehensive and user-friendly quality control reports. It takes the processed results and metrics from the Read Preprocessing & Quality Control component and visualizes them in an understandable format, often producing HTML reports with plots and summary statistics for easy interpretation of sequencing data quality.