oxbow.core.PyCramScanner#
- class oxbow.core.PyCramScanner(src, compressed=None, fields=None, tag_defs=None, reference=None, reference_index=None, coords=None)#
A CRAM file scanner.
- Parameters:
src (str or file-like) – The path to the CRAM file or a file-like object.
fields (str or list[str] or None, optional [default: "*"]) – Standard SAM fields to include.
"*"for all,Noneto omit, or a list of field names.tag_defs (list[tuple[str, str]], optional [default: None]) – Tag definitions for the
"tags"struct column.Noneomits the tags column. Use thetag_defs()method to discover definitions.coords (Literal["01", "11"], optional [default: "11"]) – Coordinate system for returning positions and interpreting query ranges. “01” for 0-based half-open, “11” for 1-based closed.
- __init__()#
Methods
__init__()Return the names of the reference sequences.
Return the names of the reference sequences and their lengths in bp.
Return the names of the standard SAM fields.
model()Return the string representation of the alignment model.
scan([columns, batch_size, limit])Scan batches of records from the file.
scan_query(region[, index, columns, ...])Scan batches of records from a genomic range.
schema()Return the Arrow schema.
tag_defs([scan_rows])Discover tag definitions by sniffing scan_rows records.
- chrom_names()#
Return the names of the reference sequences.
- chrom_sizes()#
Return the names of the reference sequences and their lengths in bp.
- field_names()#
Return the names of the standard SAM fields.
- model()#
Return the string representation of the alignment model.
- scan(columns=None, batch_size=1024, limit=None)#
Scan batches of records from the file.
- Parameters:
columns (list[str], optional) – Names of the top-level columns to project.
batch_size (int, optional [default: 1024]) – The number of records to include in each batch.
limit (int, optional) – The maximum number of records to scan. If None, records are scanned until EOF.
- Returns:
An iterator yielding Arrow record batches.
- Return type:
arro3 RecordBatchReader (pycapsule)
- scan_query(region, index=None, columns=None, batch_size=1024, limit=None)#
Scan batches of records from a genomic range.
This operation requires an index file.
- Parameters:
region (str) – Genomic range string in the format “chr:start-end”, “chr:[start,end]” or “chr:[start,end)”.
index (path or file-like, optional) – The index file to use for querying the region. If None and the source was provided as a path, we will attempt to load the index from the same path with an additional extension.
columns (list[str], optional) – Names of the top-level columns to project.
batch_size (int, optional [default: 1024]) – The number of records to include in each batch.
limit (int, optional) – The maximum number of records to scan. If None, all records intersecting the query range are scanned.
- Returns:
An iterator yielding Arrow record batches.
- Return type:
arro3 RecordBatchReader (pycapsule)
- schema()#
Return the Arrow schema.
- Return type:
arro3 Schema (pycapsule)
- tag_defs(scan_rows=1024)#
Discover tag definitions by sniffing scan_rows records.
The reader stream is reset to its original position after scanning.
- Parameters:
scan_rows (int, optional [default: 1024]) – The number of records to scan.
- Returns:
A list of tag definitions, where each definition is a tuple of the tag name and the SAM tag type code.
- Return type:
list[tuple[str, str]]