diff --git a/.codeboarding/External_Tool_Wrappers.md b/.codeboarding/External_Tool_Wrappers.md new file mode 100644 index 00000000..9a52b8ba --- /dev/null +++ b/.codeboarding/External_Tool_Wrappers.md @@ -0,0 +1,135 @@ +```mermaid + +graph LR + + PysamDispatcher["PysamDispatcher"] + + _pysam_dispatch["_pysam_dispatch"] + + SamtoolsError["SamtoolsError"] + + Samtools_Wrappers["Samtools Wrappers"] + + Bcftools_Wrappers["Bcftools Wrappers"] + + Samtools_Wrappers -- "utilizes" --> PysamDispatcher + + Bcftools_Wrappers -- "utilizes" --> PysamDispatcher + + PysamDispatcher -- "calls" --> _pysam_dispatch + + PysamDispatcher -- "raises" --> SamtoolsError + + _pysam_dispatch -- "can cause" --> SamtoolsError + +``` + + + +[![CodeBoarding](https://img.shields.io/badge/Generated%20by-CodeBoarding-9cf?style=flat-square)](https://github.com/CodeBoarding/GeneratedOnBoardings)[![Demo](https://img.shields.io/badge/Try%20our-Demo-blue?style=flat-square)](https://www.codeboarding.org/demo)[![Contact](https://img.shields.io/badge/Contact%20us%20-%20contact@codeboarding.org-lightgrey?style=flat-square)](mailto:contact@codeboarding.org) + + + +## Details + + + +This component provides a Pythonic wrapper and a robust dispatch mechanism for executing external bioinformatics command-line tools, specifically `samtools` and `bcftools`. It allows users to leverage the full functionality of these powerful C-based utilities directly from their Python scripts, abstracting away the complexities of subprocess management and command-line argument construction. It acts as a crucial bridge between the high-level Python application logic and the low-level C utilities. + + + +### PysamDispatcher + +This component provides the core dispatch mechanism for executing `samtools` and `bcftools` commands. It acts as a high-level Python interface that translates Python function calls into arguments suitable for the underlying C utilities, managing the execution flow, capturing output, and handling error codes. It centralizes the logic for invoking external tools. + + + + + +**Related Classes/Methods**: + + + +- `pysam.utils.PysamDispatcher` (1:1) + + + + + +### _pysam_dispatch + +This is the crucial Foreign Function Interface (FFI) layer, implemented in Cython. It is directly responsible for calling the compiled C utilities (`samtools`, `bcftools`) and managing the low-level interaction between Python and C, including argument passing, execution, and raw result retrieval. It's the direct bridge to the external C libraries. + + + + + +**Related Classes/Methods**: + + + +- `pysam.libcutils._pysam_dispatch` (1:1) + + + + + +### SamtoolsError + +This component defines a custom exception class specifically for errors that originate from the underlying `samtools`, `bcftools` utilities, or HTSlib. It ensures robust error management by providing specific error contexts to the Python user, allowing for more granular error handling within the Python application. + + + + + +**Related Classes/Methods**: + + + +- `pysam.utils.SamtoolsError` (1:1) + + + + + +### Samtools Wrappers + +This module provides specific Python functions that wrap individual `samtools` commands (e.g., `view`, `sort`, `index`). These functions offer a user-friendly, Pythonic interface to `samtools`, abstracting away the command-line syntax and internally utilizing the `PysamDispatcher` for execution. + + + + + +**Related Classes/Methods**: + + + +- `pysam.samtools` (1:1) + + + + + +### Bcftools Wrappers + +Similar to `Samtools Wrappers`, this module provides specific Python functions that wrap individual `bcftools` commands (e.g., `call`, `view`, `norm`). These functions offer a user-friendly, Pythonic interface to `bcftools`, abstracting away the command-line syntax and internally utilizing the `PysamDispatcher` for execution. + + + + + +**Related Classes/Methods**: + + + +- `pysam.bcftools` (1:1) + + + + + + + + + +### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq) \ No newline at end of file diff --git a/.codeboarding/Genomic_Data_Models.md b/.codeboarding/Genomic_Data_Models.md new file mode 100644 index 00000000..f3e2d38b --- /dev/null +++ b/.codeboarding/Genomic_Data_Models.md @@ -0,0 +1,89 @@ +```mermaid + +graph LR + + Genomic_Data_Models["Genomic Data Models"] + + Aligned_Segment_Model["Aligned Segment Model"] + + Variant_Record_Models["Variant Record Models"] + + Tabix_Entry_Proxies["Tabix Entry Proxies"] + + Genomic_Data_Models -- "wraps/abstracts" --> Cython_Bindings_HTSlib_Bindings_ + + File_Handling_Components -- "produces/iterates over" --> Genomic_Data_Models + + High_Level_Python_API -- "consumes/utilizes" --> Genomic_Data_Models + + click Genomic_Data_Models href "https://github.com/pysam-developers/pysam/blob/master/.codeboarding//Genomic_Data_Models.md" "Details" + +``` + + + +[![CodeBoarding](https://img.shields.io/badge/Generated%20by-CodeBoarding-9cf?style=flat-square)](https://github.com/CodeBoarding/GeneratedOnBoardings)[![Demo](https://img.shields.io/badge/Try%20our-Demo-blue?style=flat-square)](https://www.codeboarding.org/demo)[![Contact](https://img.shields.io/badge/Contact%20us%20-%20contact@codeboarding.org-lightgrey?style=flat-square)](mailto:contact@codeboarding.org) + + + +## Details + + + +The feedback indicated that source code for `pysam.libcalignedsegment.AlignedSegment`, `pysam.libcbcf.VariantRecord`, and `pysam.libctabixproxies.NamedTupleProxy` could not be retrieved. This is likely due to these being Cython-generated modules, which are compiled C extensions rather than pure Python source files directly accessible by the `getPythonSourceCode` tool. Therefore, a direct line-by-line verification of their internal implementation details is not possible with the available tools. The descriptions below are based on the conceptual role and public API of these components within the `pysam` library, as they serve as Pythonic abstractions over low-level HTSlib data structures. + + + +### Genomic Data Models [[Expand]](./Genomic_Data_Models.md) + +The "Genomic Data Models" component in `pysam` is crucial for providing Pythonic abstractions over low-level HTSlib data structures, enabling developers to interact with genomic data without direct C pointer manipulation. This component acts as a bridge between the raw data accessed via Cython bindings and the higher-level Python API. + + + + + +**Related Classes/Methods**: _None_ + + + +### Aligned Segment Model + +Represents a single aligned read from SAM/BAM/CRAM files. It provides attributes and methods to access read properties like sequence, quality scores, mapping position, CIGAR string, and flags in a Pythonic way. This class is the primary data model for individual sequencing reads, abstracting the complex binary format of alignment files. + + + + + +**Related Classes/Methods**: _None_ + + + +### Variant Record Models + +A collection of classes (e.g., `VariantRecordInfo`, `VariantRecordFormat`, `VariantRecordFilter`, `VariantRecordSamples`) that collectively represent a single variant call entry from VCF/BCF files. These classes provide structured access to the INFO, FORMAT, FILTER, and sample-specific fields of a variant record. These models encapsulate the diverse and structured information within a variant call, making it accessible and manipulable in Python. + + + + + +**Related Classes/Methods**: _None_ + + + +### Tabix Entry Proxies + +Base and derived classes that provide a tuple-like or namedtuple-like interface to individual entries parsed from generic tabix-indexed text files (e.g., BED, GFF, GTF, or VCF). They allow accessing fields by index or by name, depending on the specific proxy. These proxies offer a flexible and generic way to represent structured text data from various genomic file formats, leveraging the tabix indexing capabilities. + + + + + +**Related Classes/Methods**: _None_ + + + + + + + +### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq) \ No newline at end of file diff --git a/.codeboarding/Indexing_Querying.md b/.codeboarding/Indexing_Querying.md new file mode 100644 index 00000000..ffc1bc75 --- /dev/null +++ b/.codeboarding/Indexing_Querying.md @@ -0,0 +1,237 @@ +```mermaid + +graph LR + + Alignment_File_API["Alignment File API"] + + Variant_File_API["Variant File API"] + + Tabix_File_API["Tabix File API"] + + HTSlib_Core_Bindings["HTSlib Core Bindings"] + + Aligned_Segment_Iterator_Bindings["Aligned Segment & Iterator Bindings"] + + Variant_Record_Iterator_Bindings["Variant Record & Iterator Bindings"] + + Tabix_Proxy_Bindings["Tabix Proxy Bindings"] + + HTSlib_C_Library["HTSlib C Library"] + + Alignment_File_API -- "calls methods on" --> HTSlib_Core_Bindings + + Alignment_File_API -- "returns objects from" --> Aligned_Segment_Iterator_Bindings + + Variant_File_API -- "calls methods on" --> HTSlib_Core_Bindings + + Variant_File_API -- "returns objects from" --> Variant_Record_Iterator_Bindings + + Tabix_File_API -- "calls methods on" --> HTSlib_Core_Bindings + + Tabix_File_API -- "returns objects from" --> Tabix_Proxy_Bindings + + HTSlib_Core_Bindings -- "makes FFI calls to" --> HTSlib_C_Library + + Aligned_Segment_Iterator_Bindings -- "receives data from" --> HTSlib_Core_Bindings + + Variant_Record_Iterator_Bindings -- "receives data from" --> HTSlib_Core_Bindings + + Tabix_Proxy_Bindings -- "receives data from" --> HTSlib_Core_Bindings + +``` + + + +[![CodeBoarding](https://img.shields.io/badge/Generated%20by-CodeBoarding-9cf?style=flat-square)](https://github.com/CodeBoarding/GeneratedOnBoardings)[![Demo](https://img.shields.io/badge/Try%20our-Demo-blue?style=flat-square)](https://www.codeboarding.org/demo)[![Contact](https://img.shields.io/badge/Contact%20us%20-%20contact@codeboarding.org-lightgrey?style=flat-square)](mailto:contact@codeboarding.org) + + + +## Details + + + +The `Indexing & Querying` component in `pysam` is responsible for enabling efficient, region-based retrieval of data from various genomic file formats (SAM/BAM, VCF/BCF, Tabix). It achieves this by managing the creation, loading, and utilization of specialized genomic indices. This component provides high-level Python interfaces for users to perform queries, which are then translated through Cython bindings into low-level calls to the highly optimized HTSlib C library. This layered architecture ensures both ease of use for Python developers and the high performance required for large-scale genomic datasets. + + + +The component operates in a layered fashion: + +1. **High-Level Python API:** Users interact with classes like `AlignmentFile`, `VariantFile`, and `TabixFile` to open and query genomic data. These classes expose intuitive methods (e.g., `fetch()`) for specifying genomic regions. + +2. **Cython Bindings:** When a query is initiated, the call is passed to the corresponding Cython modules (e.g., `libcalignmentfile.pyx`, `libcbcf.pyx`, `libctabix.pyx`). These modules act as a bridge, handling data type conversions between Python and C, managing memory, and making direct Foreign Function Interface (FFI) calls to the underlying HTSlib C functions. + +3. **HTSlib C Library:** This is the core engine where the actual indexing and querying logic resides. HTSlib provides highly optimized C functions for reading indexed genomic files, navigating to specific regions using the index, and efficiently retrieving records. It manages the index data structures and performs low-level file I/O. + +4. **Data Representation:** The retrieved C-level data is then converted back into Python objects (e.g., `AlignedSegment`, `VariantRecord`, or structured Tabix proxies) by the Cython bindings, making the queried data accessible and usable within Python. + + + +The primary purpose is to significantly enhance the performance of data access for large genomic files. By leveraging indices, the component allows users to quickly retrieve only the data relevant to a specific genomic region, avoiding the need to parse entire files. This is critical for bioinformatics workflows that frequently involve targeted analysis of genomic loci. + + + +### Alignment File API + +Provides the high-level Python interface for interacting with SAM/BAM files. It includes methods for opening files, creating/loading BAM indices, and performing region-based queries to retrieve alignment records. It manages `AlignmentFile` objects and their associated iterators. + + + + + +**Related Classes/Methods**: + + + +- `pysam.libcalignmentfile` (1:1) + +- `pysam.libcalignmentfile` (1:1) + +- `pysam.Pileup` (1:1) + + + + + +### Variant File API + +Offers the high-level Python interface for handling VCF/BCF files. This includes functionalities for reading, writing, and querying variant records by genomic region, relying on BCF or Tabix indices. It also exposes classes for managing BCF and Tabix indices directly. + + + + + +**Related Classes/Methods**: + + + +- `pysam.libcbcf` (1:1) + +- `pysam.libcbcf` (1:1) + + + + + +### Tabix File API + +Provides the high-level Python interface for generic tab-separated files that are indexed with Tabix. It enables users to open, read, and perform region-based queries on such files, which are common in genomics (e.g., BED, GFF, custom annotation files). + + + + + +**Related Classes/Methods**: + + + +- `pysam.libctabix` (1:1) + +- `pysam.libctabix` (1:1) + + + + + +### HTSlib Core Bindings + +This Cython module provides the foundational, low-level bindings to the HTSlib C library. It exposes core HTSlib functions and data structures necessary for file handling, index management (loading, creating), and iterator creation for region-based queries across all file types (SAM/BAM, VCF/BCF, Tabix). + + + + + +**Related Classes/Methods**: + + + +- `pysam.libchtslib` (1:1) + +- `pysam.libchtslib` (1:1) + +- `pysam.libchtslib` (1:1) + + + + + +### Aligned Segment & Iterator Bindings + +These Cython modules define the `AlignedSegment` object, representing a single read in a BAM/SAM file, and the various iterator classes (e.g., `IteratorRowRegion`, `IteratorColumnRegion`) that enable efficient traversal of alignment records within specified genomic regions. They handle the conversion of C-level alignment data into Python objects. + + + + + +**Related Classes/Methods**: + + + +- `pysam.libcalignedsegment` (1:1) + +- `pysam.libcalignedsegment` (1:1) + +- `pysam.libcalignedsegment` (1:1) + +- `pysam.libcalignmentfile` (1:1) + + + + + +### Variant Record & Iterator Bindings + +This component defines the `VariantRecord` object and the iterators (`BCFIterator`, `TabixIterator`) used to traverse variant records within VCF/BCF files. It handles the conversion of C-level variant data into Python objects, including parsing INFO, FORMAT, and SAMPLE fields. + + + + + +**Related Classes/Methods**: + + + +- `pysam.libcbcf` (1:1) + + + + + +### Tabix Proxy Bindings + +This module provides proxy classes (e.g., `BedProxy`, `GTFProxy`, `VCFProxy`) that wrap raw tabix-parsed lines into more structured, named-tuple-like Python objects. This enhances the usability of data retrieved from generic Tabix-indexed files by providing convenient attribute access. + + + + + +**Related Classes/Methods**: + + + +- `pysam.libctabixproxies` (1:1) + +- `pysam.libctabixproxies` (1:1) + +- `pysam.libctabixproxies` (1:1) + + + + + +### HTSlib C Library + +The core C library (HTSlib) that `pysam` wraps. It contains the highly optimized algorithms and data structures for creating, loading, and querying genomic indices (BAM, Tabix, BCF) and for performing high-performance region-based data access. This is where the actual file parsing and index lookup logic resides. + + + + + +**Related Classes/Methods**: _None_ + + + + + + + +### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq) \ No newline at end of file diff --git a/.codeboarding/Pileup_Analysis.md b/.codeboarding/Pileup_Analysis.md new file mode 100644 index 00000000..44d1c619 --- /dev/null +++ b/.codeboarding/Pileup_Analysis.md @@ -0,0 +1,277 @@ +```mermaid + +graph LR + + High_Level_File_I_O_API["High-Level File I/O API"] + + Pileup_Alignment_Processing["Pileup & Alignment Processing"] + + Core_Cython_Bindings_FFI_Layer_["Core Cython Bindings (FFI Layer)"] + + HTSlib_Samtools_Bcftools_C_Libraries["HTSlib/Samtools/Bcftools C Libraries"] + + Data_Representation_Objects["Data Representation Objects"] + + Build_Installation_System["Build & Installation System"] + + Testing_Framework["Testing Framework"] + + Documentation_System["Documentation System"] + + Utility_Helper_Functions["Utility & Helper Functions"] + + High_Level_File_I_O_API -- "uses" --> Core_Cython_Bindings_FFI_Layer_ + + High_Level_File_I_O_API -- "provides" --> Pileup_Alignment_Processing + + Pileup_Alignment_Processing -- "produces" --> Data_Representation_Objects + + Pileup_Alignment_Processing -- "leverages" --> Core_Cython_Bindings_FFI_Layer_ + + Core_Cython_Bindings_FFI_Layer_ -- "wraps" --> HTSlib_Samtools_Bcftools_C_Libraries + + Data_Representation_Objects -- "are constructed by" --> Core_Cython_Bindings_FFI_Layer_ + + Build_Installation_System -- "compiles/links" --> Core_Cython_Bindings_FFI_Layer_ + + Build_Installation_System -- "integrates" --> HTSlib_Samtools_Bcftools_C_Libraries + + Testing_Framework -- "validates" --> High_Level_File_I_O_API + + Testing_Framework -- "validates" --> Pileup_Alignment_Processing + + Documentation_System -- "describes" --> High_Level_File_I_O_API + + Documentation_System -- "describes" --> Pileup_Alignment_Processing + + Utility_Helper_Functions -- "support" --> High_Level_File_I_O_API + + Utility_Helper_Functions -- "support" --> Pileup_Alignment_Processing + +``` + + + +[![CodeBoarding](https://img.shields.io/badge/Generated%20by-CodeBoarding-9cf?style=flat-square)](https://github.com/CodeBoarding/GeneratedOnBoardings)[![Demo](https://img.shields.io/badge/Try%20our-Demo-blue?style=flat-square)](https://www.codeboarding.org/demo)[![Contact](https://img.shields.io/badge/Contact%20us%20-%20contact@codeboarding.org-lightgrey?style=flat-square)](mailto:contact@codeboarding.org) + + + +## Details + + + +An overview of the fundamental components within `pysam`, focusing on the `Pileup Analysis` subsystem, along with their responsibilities, source files, and interactions. These components are chosen because they represent distinct layers and functionalities crucial for a Python library that wraps high-performance C libraries for bioinformatics. + + + +### High-Level File I/O API + +This component provides the primary, user-friendly Python interface for interacting with various genomic data file formats such as BAM/SAM/CRAM (for alignments), VCF/BCF (for variants), and FASTA/FASTQ (for sequences), as well as Tabix-indexed generic files. It abstracts away the complexities of file parsing and low-level data access, offering Pythonic objects and methods for common operations. + + + + + +**Related Classes/Methods**: + + + +- `pysam.libcalignmentfile` (1:1) + +- `pysam.libcsamfile` (1:1) + +- `pysam.libcbcf` (1:1) + +- `pysam.libctabix` (1:1) + +- `pysam.libcfaidx` (1:1) + + + + + +### Pileup & Alignment Processing + +This component is dedicated to generating and analyzing pileup data from alignment files. It encapsulates the complex logic of iterating through genomic positions, identifying aligned reads, and detecting variations like indels and substitutions. It provides Pythonic objects (`PileupColumn`, `PileupRead`) to represent pileup columns and reads for further analysis, abstracting the low-level C library interactions. + + + + + +**Related Classes/Methods**: + + + +- `pysam.libcalignmentfile:pileup` (1:1) + +- `pysam.libcalignedsegment:AlignedSegment` (1:1) + +- `pysam.libcalignedsegment:PileupColumn` (1:1) + +- `pysam.libcalignedsegment:PileupRead` (1:1) + + + + + +### Core Cython Bindings (FFI Layer) + +This is the critical Foreign Function Interface (FFI) layer, implemented in Cython (`libc*.pyx` modules). It directly binds to and wraps the underlying HTSlib, Samtools, and Bcftools C libraries. This layer is responsible for exposing low-level C functions and data structures to Python, handling memory management, and performing efficient type conversions between Python and C. + + + + + +**Related Classes/Methods**: + + + +- `pysam.libchtslib` (1:1) + +- `pysam.libcalignedsegment` (1:1) + +- `pysam.libcalignmentfile` (1:1) + +- `pysam.libcsamfile` (1:1) + +- `pysam.libcbcf` (1:1) + +- `pysam.libcfaidx` (1:1) + +- `pysam.libctabix` (1:1) + +- `pysam.libctabixproxies` (1:1) + + + + + +### HTSlib/Samtools/Bcftools C Libraries + +These are the external, foundational C libraries that perform the actual heavy lifting of genomic data manipulation. They provide the core algorithms for file parsing, alignment processing, variant calling, and indexing. `pysam` acts as a high-level Python wrapper around these highly optimized C implementations. + + + + + +**Related Classes/Methods**: + + + +- `HTSlib/Samtools/Bcftools` (1:1) + + + + + +### Data Representation Objects + +These are Python classes that encapsulate and provide structured, Pythonic access to individual data elements parsed from genomic files. Examples include `AlignedSegment` (representing a single aligned read), `PileupColumn`, `PileupRead`, `VariantRecord`, and various proxy objects for Tabix-indexed data. They abstract the raw C data structures into more usable Python objects. + + + + + +**Related Classes/Methods**: + + + +- `pysam.libcalignedsegment:AlignedSegment` (1:1) + +- `pysam.libcalignedsegment:PileupColumn` (1:1) + +- `pysam.libcalignedsegment:PileupRead` (1:1) + +- `pysam.libcbcf:VariantRecord` (1:1) + +- `pysam.libctabixproxies:ProxyObject` (1:1) + + + + + +### Build & Installation System + +This component manages the entire process of compiling the Cython source code, linking it with the vendored or system-installed C libraries (HTSlib, Samtools, Bcftools), and packaging the `pysam` library for distribution and installation via tools like `pip`. + + + + + +**Related Classes/Methods**: + + + +- `setup.py` (1:1) + +- `setup.cfg` (1:1) + +- `pysam.config` (1:1) + + + + + +### Testing Framework + +A comprehensive suite of unit and integration tests (primarily using `pytest`) that validate the correctness, performance, and robustness of the `pysam` library across its various functionalities and interfaces. It ensures that changes do not introduce regressions and that the complex interactions between Python, Cython, and C are reliable. + + + + + +**Related Classes/Methods**: + + + +- `tests` (1:1) + + + + + +### Documentation System + +This component is responsible for generating comprehensive user documentation for the `pysam` library, including detailed API references, practical tutorials, and illustrative examples. It typically uses Sphinx to build the documentation from reStructuredText or Markdown source files. + + + + + +**Related Classes/Methods**: + + + +- `docs` (1:1) + + + + + +### Utility & Helper Functions + +A collection of miscellaneous Python functions and classes that provide common utilities, error handling mechanisms, and convenience wrappers. These functions support overall library functionality and enhance the user experience, even if not directly part of the core file I/O or pileup logic. + + + + + +**Related Classes/Methods**: + + + +- `pysam.__init__` (1:1) + +- `pysam.utils` (1:1) + +- `pysam.version` (1:1) + + + + + + + + + +### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq) \ No newline at end of file diff --git a/.codeboarding/on_boarding.md b/.codeboarding/on_boarding.md new file mode 100644 index 00000000..d0c331a8 --- /dev/null +++ b/.codeboarding/on_boarding.md @@ -0,0 +1,153 @@ +```mermaid + +graph LR + + HTSlib_Bindings["HTSlib Bindings"] + + Genomic_Data_Models["Genomic Data Models"] + + Indexing_Querying["Indexing & Querying"] + + Pileup_Analysis["Pileup Analysis"] + + External_Tool_Wrappers["External Tool Wrappers"] + + HTSlib_Bindings -- "provides raw data for" --> Genomic_Data_Models + + Genomic_Data_Models -- "encapsulate data read by" --> HTSlib_Bindings + + HTSlib_Bindings -- "performs low-level indexed file operations for" --> Indexing_Querying + + Indexing_Querying -- "orchestrates indexed access via" --> HTSlib_Bindings + + HTSlib_Bindings -- "provides alignment data streams for" --> Pileup_Analysis + + Pileup_Analysis -- "consumes alignment data from" --> HTSlib_Bindings + + Indexing_Querying -- "returns instances of" --> Genomic_Data_Models + + Genomic_Data_Models -- "are the structured output of indexed queries from" --> Indexing_Querying + + Genomic_Data_Models -- "provide structured records for" --> Pileup_Analysis + + Pileup_Analysis -- "generates objects based on" --> Genomic_Data_Models + + Indexing_Querying -- "optimizes data retrieval for" --> Pileup_Analysis + + Pileup_Analysis -- "leverages indexed access for efficiency from" --> Indexing_Querying + + click Genomic_Data_Models href "https://github.com/pysam-developers/pysam/blob/master/.codeboarding//Genomic_Data_Models.md" "Details" + + click Indexing_Querying href "https://github.com/pysam-developers/pysam/blob/master/.codeboarding//Indexing_Querying.md" "Details" + + click Pileup_Analysis href "https://github.com/pysam-developers/pysam/blob/master/.codeboarding//Pileup_Analysis.md" "Details" + + click External_Tool_Wrappers href "https://github.com/pysam-developers/pysam/blob/master/.codeboarding//External_Tool_Wrappers.md" "Details" + +``` + + + +[![CodeBoarding](https://img.shields.io/badge/Generated%20by-CodeBoarding-9cf?style=flat-square)](https://github.com/CodeBoarding/GeneratedOnBoardings)[![Demo](https://img.shields.io/badge/Try%20our-Demo-blue?style=flat-square)](https://www.codeboarding.org/demo)[![Contact](https://img.shields.io/badge/Contact%20us%20-%20contact@codeboarding.org-lightgrey?style=flat-square)](mailto:contact@codeboarding.org) + + + +## Details + + + +One paragraph explaining the functionality which is represented by this graph. What the main flow is and what is its purpose. + + + +### HTSlib Bindings + +This foundational component provides the direct, low-level Cython bindings to the HTSlib C library. It is responsible for efficient reading, writing, and indexing of common genomic file formats such as SAM/BAM/CRAM, VCF/BCF, FASTA/FASTQ, and Tabix-indexed generic text files. It acts as the primary bridge between Python's ease of use and C's computational power for large-scale genomic data operations. + + + + + +**Related Classes/Methods**: + + + + + + + +### Genomic Data Models [[Expand]](./Genomic_Data_Models.md) + +This component defines Pythonic data structures and classes that represent individual genomic records parsed from the underlying HTSlib Bindings. These abstractions (e.g., aligned reads, variant calls, tabix entries) allow developers to easily access, manipulate, and interpret the biological information contained within the files without needing to interact directly with C pointers or low-level data structures. + + + + + +**Related Classes/Methods**: + + + + + + + +### Indexing & Querying [[Expand]](./Indexing_Querying.md) + +This component manages the creation, loading, and utilization of genomic indices (e.g., BAM index, Tabix index, BCF index) for efficient region-based data retrieval. It provides iterators and methods to query specific genomic regions or retrieve records based on their coordinates, significantly enhancing performance for large datasets. + + + + + +**Related Classes/Methods**: + + + + + + + +### Pileup Analysis [[Expand]](./Pileup_Analysis.md) + +This component is dedicated to generating and analyzing pileup data from alignment files. It handles the complex logic of iterating through genomic positions, identifying aligned reads, and detecting variations like indels and substitutions. It provides Pythonic objects to represent pileup columns and reads for further analysis. + + + + + +**Related Classes/Methods**: + + + + + + + +### External Tool Wrappers [[Expand]](./External_Tool_Wrappers.md) + +This component provides a Pythonic wrapper and a robust dispatch mechanism for executing external bioinformatics command-line tools, specifically `samtools` and `bcftools`. It allows users to leverage the full functionality of these powerful C-based utilities directly from their Python scripts, abstracting away the complexities of subprocess management and command-line argument construction. + + + + + +**Related Classes/Methods**: + + + +- `pysam.utils` + +- `pysam.samtools` + +- `pysam.bcftools` + + + + + + + + + +### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq) \ No newline at end of file