oxbow.core.PyBigBedScanner

oxbow.core.PyBigBedScanner#

class oxbow.core.PyBigBedScanner(src, schema=None, fields=None)#

A BigBed file scanner.

Parameters:

src (str or file-like) – The path to the BigBed file or a file-like object.
schema (str, optional) – The BED schema to use for parsing BigBed records. May be a bed[m[+[n]]] string, "bedgraph", or "autosql". If not specified, the BigBed is interpreted as BED3+, where all ancillary fields are returned as a single lumped string field named “rest”. If “autosql”, the file’s AutoSql definition is used to parse the records, if it exists.
fields (list[str], optional) – Names of the fields to include in the schema.

Methods

`__init__`()
`chrom_names`()	Return the names of the reference sequences.
`chrom_sizes`()	Return the names of the reference sequences and their lengths in bp.
`field_names`()	Return the names of the fields.
`get_zoom`(zoom_level[, fields])	Return a scanner for a specific zoom level.
`read_autosql`()	Return the raw autosql schema definition.
`scan`([columns, batch_size, limit])	Scan batches of records from the file's base-level records.
`scan_query`(region[, columns, batch_size, limit])	Scan batches of records from a genomic range query.
`schema`()	Return the Arrow schema.
`zoom_levels`()	Return the zoom/reduction level resolutions.

chrom_sizes()#: Return the names of the reference sequences and their lengths in bp.

get_zoom(zoom_level, fields=None)#

Return a scanner for a specific zoom level.

Parameters:

zoom_level (int) – The resolution (in bp) of the zoom level to scan.
fields (list[str], optional) – Names of the fields to include in the zoom schema.

Returns:

A scanner for the specified zoom level.

Return type:

PyBBIZoomScanner

read_autosql()#

Return the raw autosql schema definition.

scan(columns=None, batch_size=1024, limit=None)#

Scan batches of records from the file’s base-level records.

Parameters:

columns (list[str], optional) – Names of the columns to project.
batch_size (int, optional [default: 1024]) – The number of records to include in each batch.
limit (int, optional) – The maximum number of records to scan. If None, records are scanned until EOF.

Returns:

An iterator yielding Arrow record batches.

Return type:

arro3 RecordBatchReader (pycapsule)

scan_query(region, columns=None, batch_size=1024, limit=None)#

Scan batches of records from a genomic range query.

Parameters:

region (str) – Genomic region in the format “chr:start-end”.
columns (list[str], optional) – Names of the columns to project.
batch_size (int, optional [default: 1024]) – The number of records to include in each batch.
limit (int, optional) – The maximum number of records to scan. If None, all records intersecting the query range are scanned.

Returns:

An iterator yielding Arrow record batches.

Return type:

arro3 RecordBatchReader (pycapsule)

schema()#

Return the Arrow schema.