oxbow.from_fasta#
- oxbow.from_fasta(source: str | Path | Callable[[], IO[bytes] | str], compression: Literal['infer', 'bgzf', 'gzip', None] = 'infer', *, fields: Literal['*'] | list[str] | None = '*', coords: Literal['01', '11'] = '11', regions: str | list[str] | None = None, index: str | Path | Callable[[], IO[bytes] | str] | None = None, gzi: str | Path | Callable[[], IO[bytes] | str] | None = None, batch_size: int = 1) FastaFile#
Create a FASTA file data source.
- Parameters:
source (str, pathlib.Path, or Callable) – The URI or path to the FASTA file, or a callable that opens the file as a file-like object.
compression (Literal["infer", "bgzf", "gzip", None], default: "infer") – Compression of the source bytestream. If “infer” and
sourceis a URI or path, the file’s compression is guessed based on the extension, where “.gz” or “.bgz” is interpreted as BGZF. Pass “gzip” to decode regular GZIP. If None, the source bytestream is assumed to be uncompressed. For more customized decoding, provide a callablesourceinstead.fields (list[str], optional) – Specific fields to project. By default, all fields are included.
regions (list[str], optional) – Provide one or more genomic ranges to slice subsequences as output records. Only applicable if an associated index file is available.
index (str, pathlib.Path, or Callable, optional) – An optional FAI index file associated with the FASTA file. If
sourceis a URI or path and the index file shares the same name with a “.fai” extension, the index file is automatically detected. If the FASTA file is BGZF-compressed, a GZI index file is also required.gzi (str, pathlib.Path, or Callable, optional) – An optional GZI index file associated with a BGZF-compressed FASTA file. This is required in addition to the FAI index file for random access.
batch_size (int, optional [default: 1]) – The number of records to read in each batch. Since sequences for FASTA files can be very long, the default batch size is set to 1 to generate one sequence record at a time.
- Returns:
A data source object representing the FASTA file.
- Return type:
See also
from_fastqCreate a FASTQ file data source.