oxbow.core.PyBcfScanner#
- class oxbow.core.PyBcfScanner(src, compressed=True)#
A BCF file scanner.
- Parameters:
src (str or file-like) – The path to the BCF file or a file-like object.
compressed (bool, optional [default: True]) – Whether the source is BGZF-compressed. If None, it is assumed to be compressed.
- __init__()#
Methods
__init__()Return the names of the reference sequences.
Return the names of the reference sequences and their lengths in bp.
Return the names of the fixed fields.
Return the definitions of the FORMAT fields.
Return the definitions of the FORMAT fields.
Return the definitions of the INFO fields.
Return the names of the INFO fields.
Return the names of the samples.
scan([fields, info_fields, genotype_fields, ...])Scan batches of records from the file.
scan_query(region[, index, fields, ...])Scan batches of records from a genomic range query on a BGZF-encoded file.
schema([fields, info_fields, ...])Return the Arrow schema.
- chrom_names()#
Return the names of the reference sequences.
- chrom_sizes()#
Return the names of the reference sequences and their lengths in bp.
- field_names()#
Return the names of the fixed fields.
- genotype_field_defs()#
Return the definitions of the FORMAT fields.
- genotype_field_names()#
Return the definitions of the FORMAT fields.
- info_field_defs()#
Return the definitions of the INFO fields.
- info_field_names()#
Return the names of the INFO fields.
- sample_names()#
Return the names of the samples.
- scan(fields=None, info_fields=None, genotype_fields=None, samples=None, genotype_by=None, batch_size=1024, limit=None)#
Scan batches of records from the file.
- Parameters:
fields (list[str], optional) – Names of the fixed fields to project.
info_fields (list[str], optional) – Names of the INFO fields to project.
genotype_fields (list[str], optional) – Names of the sample-specific genotype fields to project.
samples (list[str], optional) – Names of the samples to include in the genotype fields.
genotype_by (Literal["sample", "field"], optional [default: "sample"]) – How to project the genotype fields. If “sample”, the columns correspond to the samples. If “field”, the columns correspond to the genotype fields.
batch_size (int, optional [default: 1024]) – The number of records to include in each batch.
limit (int, optional) – The maximum number of records to scan. If None, records are scanned until EOF.
- Returns:
An iterator yielding Arrow record batches.
- Return type:
arro3 RecordBatchReader (pycapsule)
- scan_query(region, index=None, fields=None, info_fields=None, genotype_fields=None, samples=None, genotype_by=None, batch_size=1024, limit=None)#
Scan batches of records from a genomic range query on a BGZF-encoded file.
This operation requires an index file.
- Parameters:
region (str) – Genomic region in the format “chr:start-end”.
index (path or file-like, optional) – The index file to use for querying the region. If None and the source was provided as a path, we will attempt to load the index from the same path with an additional extension.
fields (list[str], optional) – Names of the fixed fields to project.
info_fields (list[str], optional) – Names of the INFO fields to project.
genotype_fields (list[str], optional) – Names of the sample-specific genotype fields to project.
samples (list[str], optional) – Names of the samples to include in the genotype fields.
genotype_by (Literal["sample", "field"], optional [default: "sample"]) – How to project the genotype fields. If “sample”, the columns correspond to the samples. If “field”, the columns correspond to the genotype fields.
batch_size (int, optional [default: 1024]) – The number of records to include in each batch.
- Returns:
An iterator yielding Arrow record batches.
- Return type:
arro3 RecordBatchReader (pycapsule)
- schema(fields=None, info_fields=None, genotype_fields=None, samples=None, genotype_by=None)#
Return the Arrow schema.
- Parameters:
fields (list[str], optional) – Names of the fixed fields to project.
info_fields (list[str], optional) – Names of the INFO fields to project.
genotype_fields (list[str], optional) – Names of the sample-specific genotype fields to project.
samples (list[str], optional) – Names of the samples to include in the genotype fields.
genotype_by (Literal["sample", "field"], optional [default: "sample"]) – How to project the genotype fields. If “sample”, the columns correspond to the samples. If “field”, the columns correspond to the genotype fields.
- Return type:
arro3 Schema (pycapsule)