Skip to content

[Docs] Visualizing pysam's codebase #1354

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
135 changes: 135 additions & 0 deletions .codeboarding/External_Tool_Wrappers.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
```mermaid

graph LR

PysamDispatcher["PysamDispatcher"]

_pysam_dispatch["_pysam_dispatch"]

SamtoolsError["SamtoolsError"]

Samtools_Wrappers["Samtools Wrappers"]

Bcftools_Wrappers["Bcftools Wrappers"]

Samtools_Wrappers -- "utilizes" --> PysamDispatcher

Bcftools_Wrappers -- "utilizes" --> PysamDispatcher

PysamDispatcher -- "calls" --> _pysam_dispatch

PysamDispatcher -- "raises" --> SamtoolsError

_pysam_dispatch -- "can cause" --> SamtoolsError

```



[![CodeBoarding](https://img.shields.io/badge/Generated%20by-CodeBoarding-9cf?style=flat-square)](https://github.com/CodeBoarding/GeneratedOnBoardings)[![Demo](https://img.shields.io/badge/Try%20our-Demo-blue?style=flat-square)](https://www.codeboarding.org/demo)[![Contact](https://img.shields.io/badge/Contact%20us%20-%[email protected]?style=flat-square)](mailto:[email protected])



## Details



This component provides a Pythonic wrapper and a robust dispatch mechanism for executing external bioinformatics command-line tools, specifically `samtools` and `bcftools`. It allows users to leverage the full functionality of these powerful C-based utilities directly from their Python scripts, abstracting away the complexities of subprocess management and command-line argument construction. It acts as a crucial bridge between the high-level Python application logic and the low-level C utilities.



### PysamDispatcher

This component provides the core dispatch mechanism for executing `samtools` and `bcftools` commands. It acts as a high-level Python interface that translates Python function calls into arguments suitable for the underlying C utilities, managing the execution flow, capturing output, and handling error codes. It centralizes the logic for invoking external tools.





**Related Classes/Methods**:



- <a href="https://github.com/pysam-developers/pysam/blob/master/pysam/utils.py#L1-L1" target="_blank" rel="noopener noreferrer">`pysam.utils.PysamDispatcher` (1:1)</a>





### _pysam_dispatch

This is the crucial Foreign Function Interface (FFI) layer, implemented in Cython. It is directly responsible for calling the compiled C utilities (`samtools`, `bcftools`) and managing the low-level interaction between Python and C, including argument passing, execution, and raw result retrieval. It's the direct bridge to the external C libraries.





**Related Classes/Methods**:



- `pysam.libcutils._pysam_dispatch` (1:1)





### SamtoolsError

This component defines a custom exception class specifically for errors that originate from the underlying `samtools`, `bcftools` utilities, or HTSlib. It ensures robust error management by providing specific error contexts to the Python user, allowing for more granular error handling within the Python application.





**Related Classes/Methods**:



- <a href="https://github.com/pysam-developers/pysam/blob/master/pysam/utils.py#L1-L1" target="_blank" rel="noopener noreferrer">`pysam.utils.SamtoolsError` (1:1)</a>





### Samtools Wrappers

This module provides specific Python functions that wrap individual `samtools` commands (e.g., `view`, `sort`, `index`). These functions offer a user-friendly, Pythonic interface to `samtools`, abstracting away the command-line syntax and internally utilizing the `PysamDispatcher` for execution.





**Related Classes/Methods**:



- <a href="https://github.com/pysam-developers/pysam/blob/master/pysam/samtools.py#L1-L1" target="_blank" rel="noopener noreferrer">`pysam.samtools` (1:1)</a>





### Bcftools Wrappers

Similar to `Samtools Wrappers`, this module provides specific Python functions that wrap individual `bcftools` commands (e.g., `call`, `view`, `norm`). These functions offer a user-friendly, Pythonic interface to `bcftools`, abstracting away the command-line syntax and internally utilizing the `PysamDispatcher` for execution.





**Related Classes/Methods**:



- <a href="https://github.com/pysam-developers/pysam/blob/master/pysam/bcftools.py#L1-L1" target="_blank" rel="noopener noreferrer">`pysam.bcftools` (1:1)</a>









### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq)
89 changes: 89 additions & 0 deletions .codeboarding/Genomic_Data_Models.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
```mermaid

graph LR

Genomic_Data_Models["Genomic Data Models"]

Aligned_Segment_Model["Aligned Segment Model"]

Variant_Record_Models["Variant Record Models"]

Tabix_Entry_Proxies["Tabix Entry Proxies"]

Genomic_Data_Models -- "wraps/abstracts" --> Cython_Bindings_HTSlib_Bindings_

File_Handling_Components -- "produces/iterates over" --> Genomic_Data_Models

High_Level_Python_API -- "consumes/utilizes" --> Genomic_Data_Models

click Genomic_Data_Models href "https://github.com/pysam-developers/pysam/blob/master/.codeboarding//Genomic_Data_Models.md" "Details"

```



[![CodeBoarding](https://img.shields.io/badge/Generated%20by-CodeBoarding-9cf?style=flat-square)](https://github.com/CodeBoarding/GeneratedOnBoardings)[![Demo](https://img.shields.io/badge/Try%20our-Demo-blue?style=flat-square)](https://www.codeboarding.org/demo)[![Contact](https://img.shields.io/badge/Contact%20us%20-%[email protected]?style=flat-square)](mailto:[email protected])



## Details



The feedback indicated that source code for `pysam.libcalignedsegment.AlignedSegment`, `pysam.libcbcf.VariantRecord`, and `pysam.libctabixproxies.NamedTupleProxy` could not be retrieved. This is likely due to these being Cython-generated modules, which are compiled C extensions rather than pure Python source files directly accessible by the `getPythonSourceCode` tool. Therefore, a direct line-by-line verification of their internal implementation details is not possible with the available tools. The descriptions below are based on the conceptual role and public API of these components within the `pysam` library, as they serve as Pythonic abstractions over low-level HTSlib data structures.



### Genomic Data Models [[Expand]](./Genomic_Data_Models.md)

The "Genomic Data Models" component in `pysam` is crucial for providing Pythonic abstractions over low-level HTSlib data structures, enabling developers to interact with genomic data without direct C pointer manipulation. This component acts as a bridge between the raw data accessed via Cython bindings and the higher-level Python API.





**Related Classes/Methods**: _None_



### Aligned Segment Model

Represents a single aligned read from SAM/BAM/CRAM files. It provides attributes and methods to access read properties like sequence, quality scores, mapping position, CIGAR string, and flags in a Pythonic way. This class is the primary data model for individual sequencing reads, abstracting the complex binary format of alignment files.





**Related Classes/Methods**: _None_



### Variant Record Models

A collection of classes (e.g., `VariantRecordInfo`, `VariantRecordFormat`, `VariantRecordFilter`, `VariantRecordSamples`) that collectively represent a single variant call entry from VCF/BCF files. These classes provide structured access to the INFO, FORMAT, FILTER, and sample-specific fields of a variant record. These models encapsulate the diverse and structured information within a variant call, making it accessible and manipulable in Python.





**Related Classes/Methods**: _None_



### Tabix Entry Proxies

Base and derived classes that provide a tuple-like or namedtuple-like interface to individual entries parsed from generic tabix-indexed text files (e.g., BED, GFF, GTF, or VCF). They allow accessing fields by index or by name, depending on the specific proxy. These proxies offer a flexible and generic way to represent structured text data from various genomic file formats, leveraging the tabix indexing capabilities.





**Related Classes/Methods**: _None_







### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq)
Loading