Getting Started¶
This tutorial walks you from a fresh environment to reading data from a Borealis Dataverse dataset using dataversefs.
Prerequisites¶
- Python 3.12+
- A Borealis account (optional — public datasets work without one)
1. Install¶
pip install dataversefs
uv add dataversefs
Verify the entry point is registered:
import fsspec
print(fsspec.filesystem("dataverse")) # should not raise ImportError
2. Get an API Token (optional)¶
Public datasets work without a token. For restricted datasets:
- Log in to borealisdata.ca
- Go to Account → API Token
- Copy the token
Store it safely — never hard-code it in scripts. A .env file works well:
# .env
DATAVERSE_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
3. Mount the Filesystem¶
import fsspec
fs = fsspec.filesystem(
"dataverse",
host="borealisdata.ca",
pid="doi:10.5683/SP3/7HF3IC", # demo dataset with 465 files
token="your-token-here", # omit for public datasets
)
The filesystem is scoped to the dataset identified by pid. All paths you pass to
fs.ls(), fs.open(), etc. are relative to that dataset's root.
4. List Files¶
# Top-level contents
entries = fs.ls("")
for e in entries:
print(e["name"], e["type"])
You should see output like:
# Output:
# README.md file
# dual_heading.zarr directory
# dual_heading_2.zarr directory
# ...
Drill into a subdirectory:
fs.ls("dual_heading.zarr")
5. Read a File¶
content = fs.cat("README.md")
print(content.decode())
6. Open a Zarr Store with Xarray¶
import xarray as xr
ds = xr.open_zarr(
"dataverse://dual_heading.zarr",
storage_options={
"host": "borealisdata.ca",
"pid": "doi:10.5683/SP3/7HF3IC",
},
consolidated=False,
)
print(ds)
!!! note Use consolidated=False unless you have generated a consolidated Zarr metadata
file. Borealis datasets typically do not include .zmetadata.
Variables in ds are backed by Dask arrays — computation is lazy until you call
.compute(). The first .compute() call triggers HTTP Range requests to fetch actual
data and may take a few seconds depending on the dataset size and your connection.