Skip to content

dataversefs

A read-only fsspec filesystem backend for the Borealis Dataverse platform.

dataversefs lets Python data science libraries — Xarray, Pandas, Dask — access Dataverse-hosted datasets directly via dataverse:// URIs. No full-file downloads required.

Features

  • Zarr-native — mount a Dataverse dataset as a filesystem root so Zarr's internal paths resolve correctly
  • Byte-range fetching — only the chunks Xarray/Dask actually needs are downloaded, via HTTP Range requests
  • S3 redirect caching — the Borealis 303 → S3 pre-signed URL redirect is resolved once per file and cached, so chunk reads go straight to S3
  • Dask-compatibleAsyncFileSystem base with __getstate__/__setstate__ for safe pickling across Dask workers
  • No credentials required for public datasets

Quick Start

Install the package:

pip install dataversefs

Mount a dataset and list files:

import fsspec

fs = fsspec.filesystem(
    "dataverse",
    host="borealisdata.ca",
    pid="doi:10.5683/SP3/7HF3IC",   # real demo dataset
    token="your-api-token",           # omit for public datasets
)

fs.ls("")          # list root of the dataset
fs.ls("dual_heading.zarr")   # inspect a Zarr store inside

Open a Zarr store with Xarray in one line:

import xarray as xr

ds = xr.open_zarr(
    "dataverse://dual_heading.zarr",
    storage_options={
        "host": "borealisdata.ca",
        "pid": "doi:10.5683/SP3/7HF3IC",
    },
    consolidated=False,
)
print(ds)

Documentation Overview

Section What you'll find
Tutorials Linear walk-throughs from installation to first Zarr read
How-to Guides Task-focused recipes: authentication, Xarray, Pandas
Reference Full API: constructor, methods, URI scheme
Explanation Design decisions: flat-to-hierarchical tree, S3 redirect caching, async/Dask
Roadmap Planned and considered future features

Installation

pip install dataversefs
# or with uv
uv add dataversefs

Requires Python 3.12+.