oxbow.from_gff

Contents

oxbow.from_gff#

oxbow.from_gff(source: str | Path | Callable[[], IO[bytes] | str], compression: Literal['infer', 'bgzf', 'gzip', None] = 'infer', *, fields: Literal['*'] | list[str] | None = '*', attribute_defs: list[tuple[str, str]] | None = None, coords: Literal['01', '11'] = '11', regions: str | list[str] | None = None, index: str | Path | Callable[[], IO[bytes] | str] | None = None, batch_size: int = 131072) GffFile#

Create a GFF3 file data source.

Changed in version 0.7.0: The attribute_scan_rows parameter was removed and attribute definitions are no longer discovered by default. The attribute_defs parameter now defaults to omitting attribute definitions (None). To perform attribute discovery, use the with_attributes() method on the returned data source, which accepts a scan_rows parameter to control how many records are scanned.

Parameters:
  • source (str, pathlib.Path, or Callable) – The URI or path to the GFF file, or a callable that opens the file as a file-like object.

  • compression (Literal["infer", "bgzf", "gzip", None], default: "infer") – Compression of the source bytestream. If “infer” and source is a URI or path, the file’s compression is guessed based on the extension, where “.gz” or “.bgz” is interpreted as BGZF. Pass “gzip” to decode regular GZIP. If None, the source bytestream is assumed to be uncompressed. For more customized decoding, provide a callable source instead.

  • fields (list[str] or "*", optional [default: "*"]) – Specific fixed fields to project. By default, all fixed fields are included.

  • attribute_defs (list[tuple[str, str]], optional [default: None]) – Definitions for variable attribute fields to project. These will be nested in an “attributes” column. If None, attribute definitions are omitted. To discover attribute definitions, use the with_attributes() method on the returned data source.

  • regions (str | list[str], optional) – One or more genomic regions to query. Only applicable if an associated index file is available.

  • index (str, pathlib.Path, or Callable, optional) – An optional index file associated with the GFF file. If source is a URI or path, is BGZF-compressed, and the index file shares the same name with a “.tbi” or “.csi” extension, the index file is automatically detected.

  • batch_size (int, optional [default: 131072]) – The number of records to read in each batch.

Returns:

A data source object representing the GFF file.

Return type:

GffFile

See also

from_gtf

Create a GTF file data source.

from_bed

Create a BED file data source.

from_bigbed

Create a BigBed file data source.

from_bigwig

Create a BigWig file data source.