oxbow.from_cram

Contents

oxbow.from_cram#

oxbow.from_cram(source: str | Path | Callable[[], IO[bytes] | str], *, fields: Literal['*'] | list[str] | None = '*', tag_defs: list[tuple[str, str]] | None = None, regions: str | list[str] | None = None, index: str | Path | Callable[[], IO[bytes] | str] | None = None, reference: str | Path | Callable[[], IO[bytes] | str] | None = None, reference_index: str | Path | Callable[[], IO[bytes] | str] | None = None, batch_size: int = 131072) CramFile#

Create a CRAM file data source.

Changed in version 0.7.0: The tag_scan_rows parameter was removed and tag definitions are no longer discovered by default. The tag_defs parameter now defaults to omitting tag definitions (None). To perform tag discovery, use the with_tags() method on the returned data source, which accepts a scan_rows parameter to control how many records are scanned.

Parameters:
  • source (str, pathlib.Path, or Callable) – The URI or path to the CRAM file, or a callable that opens the file as a file-like object.

  • fields (list[str] or "*", optional [default: "*"]) – Standard SAM fields to include. By default, all standard fields are included.

  • tag_defs (list[tuple[str, str]], optional [default: None]) – Definitions for tags to project. These will be nested in a “tags” column. If None, tag definitions are omitted. To discover tag definitions, use the with_tags() method on the returned data source.

  • regions (str | list[str], optional) – One or more genomic regions to query. Only applicable if an associated index file is available.

  • index (str, pathlib.Path, or Callable, optional) – An optional index file associated with the CRAM file. If source is a URI or path and the index file shares the same name with a “.crai” the index file is automatically detected.

  • reference (str, pathlib.Path, or Callable, optional) – The URI or path to the FASTA reference file used for CRAM encoding, or a callable that opens the file as a file-like object. Required if the CRAM file does not contain an embedded reference.

  • reference_index (str, pathlib.Path, or Callable, optional) – The URI or path to the FASTA reference index file, or a callable that opens the file as a file-like object. If reference is provided as a URI or path and the index file shares the same name with a “.fai” extension, the index file is automatically detected.

  • batch_size (int, optional [default: 131072]) – The number of records to read in each batch.

Returns:

A data source object representing the CRAM file.

Return type:

CramFile

Notes

CRAM is a compressed binary format for storing sequence alignments.

See also

from_sam

Create a SAM file data source.

from_bam

Create a BAM file data source.