oxbow.from_cram#
- oxbow.from_cram(source: str | Path | Callable[[], IO[bytes] | str], *, fields: list[str] | None = None, tag_defs: list[tuple[str, str]] | None = None, tag_scan_rows: int = 1024, regions: str | list[str] | None = None, index: str | Path | Callable[[], IO[bytes] | str] | None = None, reference: str | Path | Callable[[], IO[bytes] | str] | None = None, reference_index: str | Path | Callable[[], IO[bytes] | str] | None = None, batch_size: int = 131072) CramFile#
Create a CRAM file data source.
- Parameters:
source (str, pathlib.Path, or Callable) – The URI or path to the CRAM file, or a callable that opens the file as a file-like object.
fields (list[str], optional) – Specific fixed fields to project. By default, all fixed fields are included.
tag_defs (list[tuple[str, str]], optional [default: None]) – Definitions for variable tag fields to project. These will be nested in a “tags” column. If None, tag definitions are discovered by scanning records in the file, which is controlled by the
tag_scan_rowsparameter. To omit tags entirely, settag_defs=[].tag_scan_rows (int, optional [default: 1024]) – Number of rows to scan for tag definitions.
regions (str | list[str], optional) – One or more genomic regions to query. Only applicable if an associated index file is available.
index (str, pathlib.Path, or Callable, optional) – An optional index file associated with the CRAM file. If
sourceis a URI or path and the index file shares the same name with a “.crai” the index file is automatically detected.reference (str, pathlib.Path, or Callable, optional) – The URI or path to the FASTA reference file used for CRAM encoding, or a callable that opens the file as a file-like object. Required if the CRAM file does not contain an embedded reference.
reference_index (str, pathlib.Path, or Callable, optional) – The URI or path to the FASTA reference index file, or a callable that opens the file as a file-like object. If
referenceis provided as a URI or path and the index file shares the same name with a “.fai” extension, the index file is automatically detected.batch_size (int, optional [default: 131072]) – The number of records to read in each batch.
- Returns:
A data source object representing the CRAM file.
- Return type:
Notes
CRAM is a compressed binary format for storing sequence alignments.