Skip to content

Read a CSV with Pandas

Basic Usage

import fsspec
import pandas as pd

fs = fsspec.filesystem(
    "dataverse",
    host="borealisdata.ca",
    pid="doi:10.5683/SP3/7HF3IC",
)

with fs.open("data/values.csv", "rb") as f:
    df = pd.read_csv(f)

print(df.head())

fs.open() returns a file-like object that streams bytes on demand. Pandas reads from it without downloading the full file to disk first.

With Authentication

import os
import fsspec, pandas as pd
from dotenv import load_dotenv

load_dotenv()

fs = fsspec.filesystem(
    "dataverse",
    host="borealisdata.ca",
    pid="doi:10.5683/SP3/YOURDOI",
    token=os.environ["DATAVERSE_TOKEN"],
)

with fs.open("results/output.csv") as f:
    df = pd.read_csv(f)

Using the dataverse:// URI Directly

Pandas (via fsspec) can open URIs directly:

import pandas as pd

df = pd.read_csv(
    "dataverse://data/values.csv",
    storage_options={
        "host": "borealisdata.ca",
        "pid": "doi:10.5683/SP3/7HF3IC",
    },
)

Note

Direct URI support depends on your Pandas version. Pandas 1.5+ passes storage_options through to fsspec. If you get an error, use the explicit fs.open() form shown above.

Reading Multiple Files

import io
import pandas as pd

paths = [e["name"] for e in fs.ls("data") if e["name"].endswith(".csv")]

frames = []
for path in paths:
    with fs.open(path, "rb") as f:
        frames.append(pd.read_csv(f))

combined = pd.concat(frames, ignore_index=True)