oxbow.core.BigWigFile#
- class oxbow.core.BigWigFile(source: str | Callable[[], IO[bytes] | str], *, fields: Literal['*'] | list[str] | None = '*', coords: Literal['01', '11'] = '01', regions: str | list[str] | None = None, batch_size: int = 131072)#
- __init__(source: str | Callable[[], IO[bytes] | str], *, fields: Literal['*'] | list[str] | None = '*', coords: Literal['01', '11'] = '01', regions: str | list[str] | None = None, batch_size: int = 131072)[source]#
Methods
__init__(source, *[, fields, coords, ...])batches()Generate record batches from the data source.
dataset()Convert the data source into a dataset.
dd([find_divisions])Convert the data source to a Dask DataFrame.
Get fragments of the data source.
pd()Convert the dataset to a Pandas DataFrame.
pl([lazy])Convert the data source to a Polars DataFrame or LazyFrame.
regions(regions)Query one or more genomic ranges within the data source.
scanner()Create a low-level scanner for the data source.
to_dask([find_divisions])Convert the data source to a Dask DataFrame.
to_duckdb(conn)Convert the data source into a DuckDB Relation.
to_ipc()Serialize the data source as an Arrow IPC stream.
Convert the dataset to a Pandas DataFrame.
to_polars([lazy])Convert the data source to a Polars DataFrame or LazyFrame.
zoom(resolution, *[, fields, regions, ...])Create a data source for a BBI file zoom level.
Attributes
List of reference sequence names.
List of reference sequence names and their lengths in bp.
The top-level column names of the projection.
The arrow schema of the projection.
List of zoom levels available.
- batches() Generator#
Generate record batches from the data source.
- Yields:
RecordBatch – A record batch from the data source.
- property chrom_names: list[str]#
List of reference sequence names.
- property chrom_sizes: list[tuple[str, int]]#
List of reference sequence names and their lengths in bp.
- property columns: list[str]#
The top-level column names of the projection.
- dataset() BatchReaderDataset#
Convert the data source into a dataset.
A dataset is a collection of fragments that can be processed as a single logical entity.
- Returns:
A dataset representation of the data source.
- Return type:
- dd(find_divisions=False)#
Convert the data source to a Dask DataFrame.
- Parameters:
find_divisions (bool, optional) – If True, find divisions for the Dask DataFrame, by default False.
- Returns:
A Dask DataFrame representation of the data source.
- Return type:
dask.dataframe.DataFrame
- fragments() list[BatchReaderFragment]#
Get fragments of the data source.
Fragments represent parts of the data source that can be processed independently.
- Returns:
A list of fragments representing parts of the data source.
- Return type:
list of BatchReaderFragment
- pd()#
Convert the dataset to a Pandas DataFrame.
- Returns:
A Pandas DataFrame representation of the dataset.
- Return type:
pandas.DataFrame
- pl(lazy=False)#
Convert the data source to a Polars DataFrame or LazyFrame.
- Parameters:
lazy (bool, optional [default: False]) – If True, returns a LazyFrame.
- Returns:
A polars representation of the data source.
- Return type:
polars.DataFrame | polars.LazyFrame
- regions(regions: str | list[str]) Self[source]#
Query one or more genomic ranges within the data source.
This method creates a new instance of the data source with the same parameters, overriding the regions to select from the data source.
- Parameters:
regions (str | list[str]) – The regions to select from the data source. This can be a single region or a list of regions.
- Return type:
DataSource
Notes
Genomic range strings can be in the following formats:
UCSC-style
"chr:start-end": intepreted using the coordinate system of the data source.Bracket-style
"chr:[start,end]": explicitly 1-based, end-inclusive.Bracket-style
"chr:[start,end)": explicitly 0-based, end-exclusive.
- scanner() Any#
Create a low-level scanner for the data source.
- property schema: Schema#
The arrow schema of the projection.
- to_dask(find_divisions=False)#
Convert the data source to a Dask DataFrame.
- Parameters:
find_divisions (bool, optional) – If True, find divisions for the Dask DataFrame, by default False.
- Returns:
A Dask DataFrame representation of the data source.
- Return type:
dask.dataframe.DataFrame
- to_duckdb(conn)#
Convert the data source into a DuckDB Relation.
- Parameters:
conn (duckdb.DuckDBPyConnection) – The DuckDB connection.
- Returns:
A DuckDB Relation representation of the data source.
- Return type:
duckdb.DuckDBPyRelation
- to_ipc() bytes#
Serialize the data source as an Arrow IPC stream.
- Returns:
The serialized data source in Arrow IPC format.
- Return type:
bytes
- to_pandas()#
Convert the dataset to a Pandas DataFrame.
- Returns:
A Pandas DataFrame representation of the dataset.
- Return type:
pandas.DataFrame
- to_polars(lazy=False)#
Convert the data source to a Polars DataFrame or LazyFrame.
- Parameters:
lazy (bool, optional [default: False]) – If True, returns a LazyFrame.
- Returns:
A polars representation of the data source.
- Return type:
polars.DataFrame | polars.LazyFrame
- zoom(resolution: int, *, fields: Literal['*'] | list[str] | None = '*', regions: str | list[str] | None = None, batch_size: int = 131072) BbiZoom#
Create a data source for a BBI file zoom level.
- Parameters:
resolution (int) – The resolution / reduction level for zoomed data, in bp.
- Returns:
A data source representing a BBI file zoom level.
- Return type:
BbiZoom
- property zoom_levels#
List of zoom levels available.