oxbow.core.PyFastaScanner#

class oxbow.core.PyFastaScanner(src, compressed=False, fields=None)#

A FASTA file scanner.

Parameters:
  • src (str or file-like) – The path to the FASTA file or a file-like object.

  • compressed (bool, optional [default: False]) – Whether the source is BGZF-compressed.

  • fields (list[str], optional) – Names of the fixed fields to project.

__init__()#

Methods

__init__()

field_names()

Return the names of the fixed fields.

scan([columns, batch_size, limit])

Scan the source as record batches.

scan_query(regions[, index, gzi, columns, ...])

Scan sequence slices as record batches from a list of genomic ranges.

schema()

Return the Arrow schema.

field_names()#

Return the names of the fixed fields.

scan(columns=None, batch_size=1, limit=None)#

Scan the source as record batches.

Parameters:
  • columns (list[str], optional) – Names of the columns to project.

  • batch_size (int, optional [default: 1]) – The number of records to include in each batch.

  • limit (int, optional) – The maximum number of records to scan. If None, records are scanned until EOF.

Return type:

arro3 RecordBatchReader (pycapsule)

Notes

Since reference sequences are often large, the default batch size is set to 1.

scan_query(regions, index=None, gzi=None, columns=None, batch_size=1024)#

Scan sequence slices as record batches from a list of genomic ranges.

Parameters:
  • regions (list[str]) – Genomic ranges in the format “chr:start-end”.

  • index (path or file-like, optional) – The FAI index file.

  • gzi (path or file-like, optional) – A GZI index file for BGZF-encoded sources.

  • columns (list[str], optional) – Names of the columns to project.

  • batch_size (int, optional [default: 1024]) – The number of records to include in each batch.

Return type:

arro3 RecordBatchReader (pycapsule)

schema()#

Return the Arrow schema.

Return type:

arro3 Schema (pycapsule)