Skip to content

Download Files to Local Disk

DataverseFileSystem inherits get() and get_file() from fsspec, which write remote files to your local filesystem. Use these when you need a permanent local copy or want to pre-fetch data before an offline session.


Single file

import fsspec

fs = fsspec.filesystem(
    "dataverse",
    host="borealisdata.ca",
    pid="doi:10.5683/SP3/EXAMPLE",
)

# Download one file to the current directory
fs.get("data/values.csv", "values.csv")

# Or use get_file() for a single explicit path
fs.get_file("data/values.csv", "local/data/values.csv")

get_file() is the single-file variant; get() accepts paths, lists, or globs.


Multiple files

Pass a list of paths to download several files at once:

fs.get(["data/a.csv", "data/b.csv"], "local/")

The files are written to local/a.csv and local/b.csv.


Mirror a directory

Set recursive=True to copy an entire virtual directory tree:

fs.get("results/", "local/results/", recursive=True)

dataversefs walks the virtual directory tree reconstructed from the dataset's directoryLabel metadata. The local directory structure mirrors the remote one.


Glob-selected files

Use fs.glob() to build a list, then pass it to get():

csv_paths = fs.glob("**/*.csv")
fs.get(csv_paths, "local/csvs/")

All matched files land flat in local/csvs/ unless you mirror a subdirectory with recursive=True.


Streaming vs. downloading

Method Behavior Use when
fs.open() / fs.cat_file() Streams bytes on demand via HTTP Range requests You access data once, in a pipeline (Xarray, Pandas)
fs.get() / fs.get_file() Downloads the full file to disk You'll access data multiple times, need a local copy, or work offline

For iterative analysis of large files, consider the file cache instead — it transparently downloads on first access and serves from disk thereafter.