Open a Zarr Store with Xarray¶
Overview¶
Zarr stores on Borealis appear as directories whose paths follow the Zarr spec
(.zattrs, .zgroup, var/0.0, etc.). dataversefs exposes them as a virtual
filesystem so Xarray can traverse the tree without downloading the entire dataset
upfront.
Basic Usage¶
Create an fs object first, then pass fs.get_mapper() to xr.open_zarr.
import fsspec
import xarray as xr
fs = fsspec.filesystem(
"dataverse",
host="borealisdata.ca",
pid="doi:10.5683/SP3/7HF3IC",
)
store = fs.get_mapper("dual_heading.zarr")
ds = xr.open_zarr(store, consolidated=False)
print(ds)
!!! important Always pass consolidated=False. Borealis datasets do not include a
consolidated .zmetadata file by default, so Zarr must query each metadata key
individually.
!!! note "Works in Jupyter, scripts, and Dask" File downloads use sync requests in a
thread executor, so they are safe to call from any event loop — Jupyter's, zarr's
internal loop, fsspec's background loop, or a Dask worker. No special workarounds are
needed on Windows or any other platform.
With Authentication¶
import os
import fsspec, xarray as xr
from dotenv import load_dotenv
load_dotenv()
fs = fsspec.filesystem(
"dataverse",
host="borealisdata.ca",
pid="doi:10.5683/SP3/7HF3IC",
token=os.environ["DATAVERSE_API_TOKEN"],
)
store = fs.get_mapper("dual_heading.zarr")
ds = xr.open_zarr(store, consolidated=False)
Lazy Loading and Dask¶
Variables in the returned dataset are backed by Dask arrays:
print(ds.heading) # shows Dask array — no data loaded yet
mean = ds.heading.mean().compute() # triggers actual network reads
Only the chunks needed for the computation are fetched, using HTTP Range requests directly against the Borealis S3 backend.
Common Issues¶
FileNotFoundError on .zmetadata : Set consolidated=False. Zarr tries to read
.zmetadata first; if it doesn't exist and consolidated is not False, it raises an
error.
Slow first open : The routing table is built by fetching the full dataset JSON from Dataverse. For datasets with hundreds of files this is a single API call but may take a second or two.
Pre-signed URL expiry : Borealis pre-signed S3 URLs expire (typically after 1 hour). For very long-running Dask jobs, create a new filesystem instance to refresh the URL cache.