Download Files to Local Disk¶
DataverseFileSystem inherits get() and get_file() from fsspec, which write
remote files to your local filesystem. Use these when you need a permanent local copy
or want to pre-fetch data before an offline session.
Single file¶
import fsspec
fs = fsspec.filesystem(
"dataverse",
host="borealisdata.ca",
pid="doi:10.5683/SP3/EXAMPLE",
)
# Download one file to the current directory
fs.get("data/values.csv", "values.csv")
# Or use get_file() for a single explicit path
fs.get_file("data/values.csv", "local/data/values.csv")
get_file() is the single-file variant; get() accepts paths, lists, or globs.
Multiple files¶
Pass a list of paths to download several files at once:
fs.get(["data/a.csv", "data/b.csv"], "local/")
The files are written to local/a.csv and local/b.csv.
Mirror a directory¶
Set recursive=True to copy an entire virtual directory tree:
fs.get("results/", "local/results/", recursive=True)
dataversefs walks the virtual directory tree reconstructed from the dataset's
directoryLabel metadata. The local directory structure mirrors the remote one.
Glob-selected files¶
Use fs.glob() to build a list, then pass it to get():
csv_paths = fs.glob("**/*.csv")
fs.get(csv_paths, "local/csvs/")
All matched files land flat in local/csvs/ unless you mirror a subdirectory with
recursive=True.
Streaming vs. downloading¶
| Method | Behavior | Use when |
|---|---|---|
fs.open() / fs.cat_file() |
Streams bytes on demand via HTTP Range requests | You access data once, in a pipeline (Xarray, Pandas) |
fs.get() / fs.get_file() |
Downloads the full file to disk | You'll access data multiple times, need a local copy, or work offline |
For iterative analysis of large files, consider the file cache instead — it transparently downloads on first access and serves from disk thereafter.