oxbow.core.PyFastqScanner#
- class oxbow.core.PyFastqScanner(src, compressed=False, fields=None)#
A FASTQ file scanner.
- Parameters:
src (str or file-like) – The path to the FASTQ file or a file-like object.
compressed (bool, optional [default: False]) – Whether the source is GZIP-compressed.
fields (list[str], optional) – Names of the fixed fields to project.
- __init__()#
Methods
__init__()Return the names of the fixed fields.
scan([columns, batch_size, limit])Scan the source as record batches.
scan_byte_ranges(byte_ranges[, columns, ...])Scan batches of records from specified byte ranges in the file.
scan_virtual_ranges(vpos_ranges[, columns, ...])Scan batches of records from virtual position ranges in a BGZF file.
schema()Return the Arrow schema.
- field_names()#
Return the names of the fixed fields.
- scan(columns=None, batch_size=1024, limit=None)#
Scan the source as record batches.
- Parameters:
columns (list[str], optional) – Names of the columns to project.
batch_size (int, optional [default: 1024]) – The number of records to include in each batch.
limit (int, optional) – The maximum number of records to scan. If None, records are scanned until EOF.
- Returns:
An iterator yielding Arrow record batches.
- Return type:
arro3 RecordBatchReader (pycapsule)
- scan_byte_ranges(byte_ranges, columns=None, batch_size=1024, limit=None)#
Scan batches of records from specified byte ranges in the file.
- Parameters:
byte_ranges (list[tuple[int, int]]) – List of (start, end) byte position tuples to read from.
columns (list[str], optional) – Names of the columns to project.
batch_size (int, optional [default: 1024]) – The number of records to include in each batch.
limit (int, optional) – The maximum number of records to scan.
- Return type:
arro3 RecordBatchReader (pycapsule)
- scan_virtual_ranges(vpos_ranges, columns=None, batch_size=1024, limit=None)#
Scan batches of records from virtual position ranges in a BGZF file.
- Parameters:
vpos_ranges (list[tuple[vpos, vpos]]) – List of virtual position ranges as pairs.
columns (list[str], optional) – Names of the columns to project.
batch_size (int, optional [default: 1024]) – The number of records to include in each batch.
limit (int, optional) – The maximum number of records to scan.
- Return type:
arro3 RecordBatchReader (pycapsule)
- schema()#
Return the Arrow schema.
- Return type:
arro3 Schema (pycapsule)