oxbow.from_fasta

Contents

oxbow.from_fasta#

oxbow.from_fasta(source: str | Path | Callable[[], IO[bytes] | str], compression: Literal['infer', 'bgzf', 'gzip', None] = 'infer', *, fields: Literal['*'] | list[str] | None = '*', coords: Literal['01', '11'] = '11', regions: str | list[str] | None = None, index: str | Path | Callable[[], IO[bytes] | str] | None = None, gzi: str | Path | Callable[[], IO[bytes] | str] | None = None, batch_size: int = 1) FastaFile#

Create a FASTA file data source.

Parameters:
  • source (str, pathlib.Path, or Callable) – The URI or path to the FASTA file, or a callable that opens the file as a file-like object.

  • compression (Literal["infer", "bgzf", "gzip", None], default: "infer") – Compression of the source bytestream. If “infer” and source is a URI or path, the file’s compression is guessed based on the extension, where “.gz” or “.bgz” is interpreted as BGZF. Pass “gzip” to decode regular GZIP. If None, the source bytestream is assumed to be uncompressed. For more customized decoding, provide a callable source instead.

  • fields (list[str], optional) – Specific fields to project. By default, all fields are included.

  • regions (list[str], optional) – Provide one or more genomic ranges to slice subsequences as output records. Only applicable if an associated index file is available.

  • index (str, pathlib.Path, or Callable, optional) – An optional FAI index file associated with the FASTA file. If source is a URI or path and the index file shares the same name with a “.fai” extension, the index file is automatically detected. If the FASTA file is BGZF-compressed, a GZI index file is also required.

  • gzi (str, pathlib.Path, or Callable, optional) – An optional GZI index file associated with a BGZF-compressed FASTA file. This is required in addition to the FAI index file for random access.

  • batch_size (int, optional [default: 1]) – The number of records to read in each batch. Since sequences for FASTA files can be very long, the default batch size is set to 1 to generate one sequence record at a time.

Returns:

A data source object representing the FASTA file.

Return type:

FastaFile

See also

from_fastq

Create a FASTQ file data source.