dataversefs¶
A read-only fsspec filesystem backend for the Borealis Dataverse platform.
dataversefs lets Python data science libraries — Xarray, Pandas, Dask — access
Dataverse-hosted datasets directly via dataverse:// URIs. No full-file downloads
required.
Features¶
- Zarr-native — mount a Dataverse dataset as a filesystem root so Zarr's internal paths resolve correctly
- Byte-range fetching — only the chunks Xarray/Dask actually needs are downloaded,
via HTTP
Rangerequests - S3 redirect caching — the Borealis 303 → S3 pre-signed URL redirect is resolved once per file and cached, so chunk reads go straight to S3
- Dask-compatible —
AsyncFileSystembase with__getstate__/__setstate__for safe pickling across Dask workers - No credentials required for public datasets
Quick Start¶
Install the package:
pip install dataversefs
Mount a dataset and list files:
import fsspec
fs = fsspec.filesystem(
"dataverse",
host="borealisdata.ca",
pid="doi:10.5683/SP3/7HF3IC", # real demo dataset
token="your-api-token", # omit for public datasets
)
fs.ls("") # list root of the dataset
fs.ls("dual_heading.zarr") # inspect a Zarr store inside
Open a Zarr store with Xarray in one line:
import xarray as xr
ds = xr.open_zarr(
"dataverse://dual_heading.zarr",
storage_options={
"host": "borealisdata.ca",
"pid": "doi:10.5683/SP3/7HF3IC",
},
consolidated=False,
)
print(ds)
Documentation Overview¶
| Section | What you'll find |
|---|---|
| Tutorials | Linear walk-throughs from installation to first Zarr read |
| How-to Guides | Task-focused recipes: authentication, Xarray, Pandas |
| Reference | Full API: constructor, methods, URI scheme |
| Explanation | Design decisions: flat-to-hierarchical tree, S3 redirect caching, async/Dask |
| Roadmap | Planned and considered future features |
Installation¶
pip install dataversefs
# or with uv
uv add dataversefs
Requires Python 3.12+.