oxbow.core.PyBedScanner#
- class oxbow.core.PyBedScanner(src, bed_schema, compressed=False)#
A BED file scanner.
- Parameters:
stc (str or file-like) – The path to the BED file or a file-like object.
bed_schema (str) – The BED schema specifier, e.g., “bed6+3”.
compressed (bool, optional [default: False]) – Whether the source is BGZF-compressed. If None, it is assumed to be uncompressed.
Notes
The BED schema specifier can be one of the following (case-insensitive):
bed: Equivalent toBED6.bed{n}: n standard fields and 0 custom fields.bed{n}+{m}: n standard fields followed by m custom fields.bed{n}+: n standard fields followed by an undefined number of custom fields.
While the 12 standard fields have defined types, custom fields are intepreted as text.
bed{n}+custom fields are collapsed into a single field named rest.- __init__()#
Methods
__init__()Return the names of the BED fields.
scan([fields, batch_size, limit])Scan batches of records from the file.
scan_query(region[, index, fields, ...])Scan batches of records from a genomic range query on a BGZF-encoded file.
schema([fields])Return the Arrow schema.
- field_names()#
Return the names of the BED fields.
- scan(fields=None, batch_size=1024, limit=None)#
Scan batches of records from the file.
- Parameters:
fields (list[str], optional) – Names of the BED fields to project.
batch_size (int, optional [default: 1024]) – The number of records to include in each batch.
limit (int, optional) – The maximum number of records to scan. If None, records are scanned until EOF.
- Returns:
An iterator yielding Arrow record batches.
- Return type:
arro3 RecordBatchReader (pycapsule)
- scan_query(region, index=None, fields=None, batch_size=1024, limit=None)#
Scan batches of records from a genomic range query on a BGZF-encoded file.
This operation requires an index file.
- Parameters:
region (str) – Genomic region in the format “chr:start-end”.
index (path or file-like, optional) – The index file to use for querying the region. If None and the source was provided as a path, we will attempt to load the index from the same path with an additional extension.
fields (list[str], optional) – Names of the BED fields to project.
batch_size (int, optional [default: 1024]) – The number of records to include in each batch.
- Returns:
An iterator yielding Arrow record batches.
- Return type:
arro3 RecordBatchReader (pycapsule)
- schema(fields=None)#
Return the Arrow schema.
- Parameters:
fields (list[str], optional) – Names of the BED fields to project.
tag_defs (list[tuple[str, str]], optional) – Definitions of tag fields to project.
- Return type:
arro3 Schema (pycapsule)