Python API Reference#

High-level API#

Data Source interface#

The following functions return a data source object to read from files that may be larger than memory.

Sequence formats

oxbow.from_fasta(source[, compression, ...])

Create a FASTA file data source.

oxbow.from_fastq(source[, compression, ...])

Create a FASTQ file data source.

Alignment formats

oxbow.from_sam(source[, compression, ...])

Create a SAM file data source.

oxbow.from_bam(source[, compression, ...])

Create a BAM file data source.

Variant call formats

oxbow.from_vcf(source[, compression, ...])

Create a VCF file data source.

oxbow.from_bcf(source[, compression, ...])

Create a BCF file data source.

Interval feature formats

oxbow.from_gtf(source[, compression, ...])

Create a GTF file data source.

oxbow.from_gff(source[, compression, ...])

Create a GFF3 file data source.

oxbow.from_bed(source[, bed_schema, ...])

Create a BED file data source.

UCSC BBI formats

oxbow.from_bigwig(source, *[, fields, ...])

Create a BigWig file data source.

oxbow.from_bigbed(source[, schema, fields, ...])

Create a BigBed file data source.

Arrow IPC readers#

The following functions convert genomic file formats to the Arrow IPC (aka Feather) format as raw bytes. Indexed files support genomic range queries.

Arrow IPC readers

oxbow.read_fasta(src[, regions, index, gzi, ...])

Return Arrow IPC format from a FASTA file.

oxbow.read_fastq(src[, fields, compressed])

Return Arrow IPC format from a FASTQ file.

oxbow.read_sam(src[, region, index, fields, ...])

Return Arrow IPC format from a SAM file.

oxbow.read_bam(src[, region, index, fields, ...])

Return Arrow IPC format from a BAM file.

oxbow.read_vcf(src[, region, index, fields, ...])

Return Arrow IPC format from a VCF file.

oxbow.read_bcf(src[, region, index, fields, ...])

Return Arrow IPC format from a BCF file.

oxbow.read_gtf(src[, region, index, fields, ...])

Return Arrow IPC format from a GTF file.

oxbow.read_gff(src[, region, index, fields, ...])

Return Arrow IPC format from a GFF file.

oxbow.read_bed(src, bed_schema[, region, ...])

Return Arrow IPC format from a BED file.

oxbow.read_bigwig(src[, region, fields])

Return Arrow IPC format from a BigWig file.

oxbow.read_bigbed(src[, bed_schema, region, ...])

Return Arrow IPC format from a BigBed file.

Low-level API#

Scanners

The following classes are wrappers of the Rust “scanner” objects that can read a genomic file format as a stream of Apache Arrow RecordBatches.

oxbow.core.PyFastaScanner(src[, compressed])

A FASTA file scanner.

oxbow.core.PyFastqScanner(src[, compressed])

A FASTQ file scanner.

oxbow.core.PySamScanner(src[, compressed])

A SAM file scanner.

oxbow.core.PyBamScanner(src[, compressed])

A BAM file scanner.

oxbow.core.PyVcfScanner(src[, compressed])

A VCF file scanner.

oxbow.core.PyBcfScanner(src[, compressed])

A BCF file scanner.

oxbow.core.PyGtfScanner(src[, compressed])

A GTF file scanner.

oxbow.core.PyGffScanner(src[, compressed])

A GFF file scanner.

oxbow.core.PyBedScanner(src, bed_schema[, ...])

A BED file scanner.

oxbow.core.PyBigWigScanner(src)

A BigWig file scanner.

oxbow.core.PyBigBedScanner(src[, schema])

A BigBed file scanner.

oxbow.core.PyBBIZoomScanner(src, bbi_type, ...)

A BBI file zoom level scanner.

PyArrow adapters

The following classes provide a PyArrow Dataset interface over a stream of Arrow RecordBatches supplied by Oxbow’s low-level scanners. PyArrow Datasets allow working with large datasets that do not fit in memory.

oxbow.arrow.BatchReaderFragment(...[, ...])

A Fragment that emits RecordBatches from a reproducible source.

oxbow.arrow.BatchReaderDataset(fragments[, ...])

A PyArrow Dataset composed of one or more BatchReaderFragments.

Data source classes

oxbow.core.FastaFile(source[, compressed, ...])

oxbow.core.FastqFile(source[, compressed, ...])

oxbow.core.SamFile(source[, compressed, ...])

oxbow.core.BamFile(source[, compressed, ...])

oxbow.core.VcfFile(source[, compressed, ...])

oxbow.core.BcfFile(source[, compressed, ...])

oxbow.core.GtfFile(source[, compressed, ...])

oxbow.core.GffFile(source[, compressed, ...])

oxbow.core.BedFile(source[, bed_schema, ...])

oxbow.core.BigWigFile(source, *[, fields, ...])

oxbow.core.BigBedFile(source[, schema, ...])