oxbow.from_bcf#
- oxbow.from_bcf(source: str | Path | Callable[[], IO[bytes] | str], compression: Literal['bgzf', None] = 'bgzf', *, fields: list[str] | None = None, info_fields: list[str] | None = None, samples: list[str] | None = None, genotype_fields: list[str] | None = None, genotype_by: Literal['sample', 'field'] = 'sample', regions: str | list[str] | None = None, index: str | Path | Callable[[], IO[bytes] | str] | None = None, batch_size: int = 131072) BcfFile#
Create a BCF file data source.
- Parameters:
source (str, pathlib.Path, or Callable) – The URI or path to the BCF file, or a callable that opens the file as a file-like object.
compression (Literal["bgzf", None], default: "bgzf") – Compression of the source bytestream. By default, BCF sources are assumed to be BGZF-compressed. If None, the source is assumed to be uncompressed. For more custom decoding, provide a callable
sourceinstead.fields (list[str], optional) – Specific fixed fields to project. By default, all fixed fields are included.
info_fields (list[str], optional [default: None]) – INFO fields to project. These will be nested under an “info” column. If None, all INFO fields declared in the header are included. To omit all INFO fields, set
info_fields=[].samples (list[str], optional [default: None]) – A subset of samples to include in the genotype output. If None, all samples declared in the header are included. To omit all sample genotype data, set
samples=[].genotype_fields (list[str], optional [default: None]) – Genotype (aka “FORMAT”) fields to project for each sample. If None, all FORMAT fields declared in the header are included.
genotype_by (Literal["sample", "field"], optional [default: "sample"]) – Determines how genotype-specific data is organized. If “sample”, each sample is provided as a separate column with nested FORMAT fields. If “field”, each FORMAT field is provided as a separate column with nested sample name fields.
regions (str | list[str], optional) – One or more genomic regions to query. Only applicable if an associated index file is available.
index (str, optional) – An optional index file associated with the BCF file. If
sourceis a URI or path, is BGZF-compressed, and the index file shares the same name with a “.csi” extension, the index file is automatically detected.batch_size (int, optional [default: 131072]) – The number of records to read in each batch.
- Returns:
A data source object representing the BCF file.
- Return type:
Notes
The Binary Call Format (BCF) is a binary representation of the Variant Call Format (VCF), designed for efficient storage and processing of genomic variant data. It is commonly used in large-scale sequencing projects.
See also
from_vcfCreate a VCF file data source.