oxbow.from_sam#
- oxbow.from_sam(source: str | Path | Callable[[], IO[bytes] | str], compression: Literal['infer', 'bgzf', 'gzip', None] = 'infer', *, fields: list[str] | None = None, tag_defs: list[tuple[str, str]] | None = None, tag_scan_rows: int = 1024, regions: str | list[str] | None = None, index: str | Path | Callable[[], IO[bytes] | str] | None = None, batch_size: int = 131072) SamFile#
Create a SAM file data source.
- Parameters:
source (str, pathlib.Path, or Callable) – The URI or path to the SAM file, or a callable that opens the file as a file-like object.
compression (Literal["infer", "bgzf", "gzip", None], default: "infer") – Compression of the source bytestream. If “infer” and
sourceis a URI or path, the file’s compression is guessed based on the extension, where “.gz” or “.bgz” is interpreted as BGZF. Pass “gzip” to decode regular GZIP. If None, the source bytestream is assumed to be uncompressed. For more customized decoding, provide a callablesourceinstead.fields (list[str], optional) – Specific fixed fields to project. By default, all fixed fields are included.
tag_defs (list[tuple[str, str]], optional [default: None]) – Definitions for variable tag fields to project. These will be nested in a “tags” column. If None, tag definitions are discovered by scanning records in the file, which is controlled by the
tag_scan_rowsparameter. To omit tags entirely, settag_defs=[].tag_scan_rows (int, optional [default: 1024]) – Number of rows to scan for tag definitions.
regions (str | list[str], optional) – One or more genomic regions to query. Only applicable if an associated index file is available.
index (str, pathlib.Path, or Callable, optional) – An optional index file associated with the SAM file. If
sourceis a URI or path, is BGZF-compressed, and the index file shares the same name with a “.tbi” or “.csi” extension, the index file is automatically detected.batch_size (int, optional [default: 131072]) – The number of records to read in each batch.
- Returns:
A data source object representing the SAM file.
- Return type:
Notes
Sequence Alignment Map (SAM) is a widely used text-based format for storing biological sequences aligned to a reference sequence.