Python API Reference#
High-level API#
Data Source interface#
The following functions return a data source object to read from files that may be larger than memory.
Sequence formats
|
Create a FASTA file data source. |
|
Create a FASTQ file data source. |
Alignment formats
|
Create a SAM file data source. |
|
Create a BAM file data source. |
|
Create a CRAM file data source. |
Variant call formats
|
Create a VCF file data source. |
|
Create a BCF file data source. |
Interval feature formats
|
Create a GTF file data source. |
|
Create a GFF3 file data source. |
|
Create a BED file data source. |
UCSC BBI formats
|
Create a BigWig file data source. |
|
Create a BigBed file data source. |
Arrow IPC readers#
The following functions convert genomic file formats to the Arrow IPC (aka Feather) format as raw bytes. Indexed files support genomic range queries.
Arrow IPC readers
|
Return Arrow IPC format from a FASTA file. |
|
Return Arrow IPC format from a FASTQ file. |
|
Return Arrow IPC format from a SAM file. |
|
Return Arrow IPC format from a BAM file. |
|
Return Arrow IPC format from a CRAM file. |
|
Return Arrow IPC format from a VCF file. |
|
Return Arrow IPC format from a BCF file. |
|
Return Arrow IPC format from a GTF file. |
|
Return Arrow IPC format from a GFF file. |
|
Return Arrow IPC format from a BED file. |
|
Return Arrow IPC format from a BigWig file. |
|
Return Arrow IPC format from a BigBed file. |
Low-level API#
Scanners
The following classes are wrappers of the Rust “scanner” objects that can read a genomic file format as a stream of Apache Arrow RecordBatches.
|
A FASTA file scanner. |
|
A FASTQ file scanner. |
|
A SAM file scanner. |
|
A BAM file scanner. |
|
A CRAM file scanner. |
|
A VCF file scanner. |
|
A BCF file scanner. |
|
A GTF file scanner. |
|
A GFF file scanner. |
|
A BED file scanner. |
|
A BigWig file scanner. |
|
A BigBed file scanner. |
|
A BBI file zoom level scanner. |
PyArrow adapters
The following classes provide a PyArrow Dataset interface over a stream of Arrow RecordBatches supplied by Oxbow’s low-level scanners. PyArrow Datasets allow working with large datasets that do not fit in memory.
|
A Fragment that emits RecordBatches from a reproducible source. |
|
A PyArrow Dataset composed of one or more BatchReaderFragments. |
Data source classes
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|