CodeBoarding

Initializing diagram...

Component Details

The AfterQC project is designed for comprehensive quality control and preprocessing of sequencing data. Its main flow involves an Application Orchestrator that manages the overall pipeline, delegating tasks to components responsible for reading and writing data, performing initial read preprocessing and quality assessment, detecting and processing structural anomalies like bubbles, and finally generating detailed quality control reports for user interpretation.

Application Orchestrator

The Application Orchestrator component, primarily after.py, serves as the central control unit for the AfterQC pipeline. It is responsible for parsing command-line arguments, discovering input FASTQ files, initiating parallel processing jobs, and coordinating the overall workflow by delegating tasks to the Read Preprocessing & Quality Control and Structural Anomaly Processing components.

Data I-O & Utilities

The Data I/O & Utilities component, encompassing fastq.py and util.py, provides fundamental functionalities for handling FASTQ formatted sequencing data, including reading, writing, and manipulation. Additionally, it offers a collection of general-purpose helper functions and common utilities used across various parts of the AfterQC project, promoting code reusability and maintainability.

Read Preprocessing & Quality Control

This component, comprising preprocesser.py, qualitycontrol.py, and barcodeprocesser.py, is responsible for the initial stages of data preparation and core quality assessment. It performs tasks such as trimming, filtering based on quality, N-base content, and sequence length, processes barcodes for demultiplexing, and conducts comprehensive quality analysis including read quality, GC content, base content, and kmer distribution.

Structural Anomaly Processing

The Structural Anomaly Processing component, including bubbledetector.py, bubbleprocesser.py, circledetector.py, and debubble.py, focuses on identifying, detecting, and processing specific structural artifacts and anomalies within sequencing data. This involves detecting 'bubbles' and circular DNA/RNA structures, and implementing algorithms for 'debubbling' to refine or remove these artifacts, thereby improving data quality.