oxbow.core.PyBigBedScanner#
- class oxbow.core.PyBigBedScanner(src, schema='bed3+')#
A BigBed file scanner.
- Parameters:
src (str or file-like) – The path to the BigBed file or a file-like object.
schema (str, optional) – The BED schema to use for parsing BigBed records. May be a
bed[m[+[n]]]string,"bedgraph", or"autosql". If not specified, the BigBed is interpreted as BED3+, where all ancillary fields are returned as a single lumped string field named “rest”. If “autosql”, the file’s AutoSql definition is used to parse the records, if it exists.
- __init__()#
Methods
__init__()Return the names of the reference sequences.
Return the names of the reference sequences and their lengths in bp.
Return the names of the fields.
get_zoom(zoom_level)Return a scanner for a specific zoom level.
Return the raw autosql schema definition.
scan([fields, batch_size, limit])Scan batches of records from the file's base-level records.
scan_query(region[, fields, batch_size, limit])Scan batches of records from a genomic range query.
schema([fields])Return the Arrow schema.
Return the zoom/reduction level resolutions.
- chrom_names()#
Return the names of the reference sequences.
- chrom_sizes()#
Return the names of the reference sequences and their lengths in bp.
- field_names()#
Return the names of the fields.
- get_zoom(zoom_level)#
Return a scanner for a specific zoom level.
- Parameters:
zoom_level (int) – The resolution (in bp) of the zoom level to scan.
- Returns:
A scanner for the specified zoom level.
- Return type:
- read_autosql()#
Return the raw autosql schema definition.
- Return type:
str
- scan(fields=None, batch_size=1024, limit=None)#
Scan batches of records from the file’s base-level records.
- Parameters:
fields (list[str], optional) – Names of the fixed fields to project.
batch_size (int, optional [default: 1024]) – The number of records to include in each batch.
limit (int, optional) – The maximum number of records to scan. If None, records are scanned until EOF.
- Returns:
An iterator yielding Arrow record batches.
- Return type:
arro3 RecordBatchReader (pycapsule)
- scan_query(region, fields=None, batch_size=1024, limit=None)#
Scan batches of records from a genomic range query.
- Parameters:
region (str) – Genomic region in the format “chr:start-end”.
fields (list[str], optional) – Names of the fixed fields to project.
batch_size (int, optional [default: 1024]) – The number of records to include in each batch.
- Returns:
An iterator yielding Arrow record batches.
- Return type:
arro3 RecordBatchReader (pycapsule)
- schema(fields=None)#
Return the Arrow schema.
- Parameters:
fields (list[str], optional) – Names of the fields to project.
- Return type:
arro3 Schema (pycapsule)
- zoom_levels()#
Return the zoom/reduction level resolutions.