oxbow.core.PyBcfScanner#

class oxbow.core.PyBcfScanner(src, compressed=True)#

A BCF file scanner.

Parameters:
  • src (str or file-like) – The path to the BCF file or a file-like object.

  • compressed (bool, optional [default: True]) – Whether the source is BGZF-compressed. If None, it is assumed to be compressed.

__init__()#

Methods

__init__()

chrom_names()

Return the names of the reference sequences.

chrom_sizes()

Return the names of the reference sequences and their lengths in bp.

field_names()

Return the names of the fixed fields.

genotype_field_defs()

Return the definitions of the FORMAT fields.

genotype_field_names()

Return the definitions of the FORMAT fields.

info_field_defs()

Return the definitions of the INFO fields.

info_field_names()

Return the names of the INFO fields.

sample_names()

Return the names of the samples.

scan([fields, info_fields, genotype_fields, ...])

Scan batches of records from the file.

scan_query(region[, index, fields, ...])

Scan batches of records from a genomic range query on a BGZF-encoded file.

schema([fields, info_fields, ...])

Return the Arrow schema.

chrom_names()#

Return the names of the reference sequences.

chrom_sizes()#

Return the names of the reference sequences and their lengths in bp.

field_names()#

Return the names of the fixed fields.

genotype_field_defs()#

Return the definitions of the FORMAT fields.

genotype_field_names()#

Return the definitions of the FORMAT fields.

info_field_defs()#

Return the definitions of the INFO fields.

info_field_names()#

Return the names of the INFO fields.

sample_names()#

Return the names of the samples.

scan(fields=None, info_fields=None, genotype_fields=None, samples=None, genotype_by=None, batch_size=1024, limit=None)#

Scan batches of records from the file.

Parameters:
  • fields (list[str], optional) – Names of the fixed fields to project.

  • info_fields (list[str], optional) – Names of the INFO fields to project.

  • genotype_fields (list[str], optional) – Names of the sample-specific genotype fields to project.

  • samples (list[str], optional) – Names of the samples to include in the genotype fields.

  • genotype_by (Literal["sample", "field"], optional [default: "sample"]) – How to project the genotype fields. If “sample”, the columns correspond to the samples. If “field”, the columns correspond to the genotype fields.

  • batch_size (int, optional [default: 1024]) – The number of records to include in each batch.

  • limit (int, optional) – The maximum number of records to scan. If None, records are scanned until EOF.

Returns:

An iterator yielding Arrow record batches.

Return type:

arro3 RecordBatchReader (pycapsule)

scan_query(region, index=None, fields=None, info_fields=None, genotype_fields=None, samples=None, genotype_by=None, batch_size=1024, limit=None)#

Scan batches of records from a genomic range query on a BGZF-encoded file.

This operation requires an index file.

Parameters:
  • region (str) – Genomic region in the format “chr:start-end”.

  • index (path or file-like, optional) – The index file to use for querying the region. If None and the source was provided as a path, we will attempt to load the index from the same path with an additional extension.

  • fields (list[str], optional) – Names of the fixed fields to project.

  • info_fields (list[str], optional) – Names of the INFO fields to project.

  • genotype_fields (list[str], optional) – Names of the sample-specific genotype fields to project.

  • samples (list[str], optional) – Names of the samples to include in the genotype fields.

  • genotype_by (Literal["sample", "field"], optional [default: "sample"]) – How to project the genotype fields. If “sample”, the columns correspond to the samples. If “field”, the columns correspond to the genotype fields.

  • batch_size (int, optional [default: 1024]) – The number of records to include in each batch.

Returns:

An iterator yielding Arrow record batches.

Return type:

arro3 RecordBatchReader (pycapsule)

schema(fields=None, info_fields=None, genotype_fields=None, samples=None, genotype_by=None)#

Return the Arrow schema.

Parameters:
  • fields (list[str], optional) – Names of the fixed fields to project.

  • info_fields (list[str], optional) – Names of the INFO fields to project.

  • genotype_fields (list[str], optional) – Names of the sample-specific genotype fields to project.

  • samples (list[str], optional) – Names of the samples to include in the genotype fields.

  • genotype_by (Literal["sample", "field"], optional [default: "sample"]) – How to project the genotype fields. If “sample”, the columns correspond to the samples. If “field”, the columns correspond to the genotype fields.

Return type:

arro3 Schema (pycapsule)