Read a CSV with Pandas¶
Basic Usage¶
import fsspec
import pandas as pd
fs = fsspec.filesystem(
"dataverse",
host="borealisdata.ca",
pid="doi:10.5683/SP3/7HF3IC",
)
with fs.open("data/values.csv", "rb") as f:
df = pd.read_csv(f)
print(df.head())
fs.open() returns a file-like object that streams bytes on demand. Pandas reads from it without downloading the full file to disk first.
With Authentication¶
import os
import fsspec, pandas as pd
from dotenv import load_dotenv
load_dotenv()
fs = fsspec.filesystem(
"dataverse",
host="borealisdata.ca",
pid="doi:10.5683/SP3/YOURDOI",
token=os.environ["DATAVERSE_TOKEN"],
)
with fs.open("results/output.csv") as f:
df = pd.read_csv(f)
Using the dataverse:// URI Directly¶
Pandas (via fsspec) can open URIs directly:
import pandas as pd
df = pd.read_csv(
"dataverse://data/values.csv",
storage_options={
"host": "borealisdata.ca",
"pid": "doi:10.5683/SP3/7HF3IC",
},
)
Note
Direct URI support depends on your Pandas version. Pandas 1.5+ passes
storage_options through to fsspec. If you get an error, use the explicit
fs.open() form shown above.
Reading Multiple Files¶
import io
import pandas as pd
paths = [e["name"] for e in fs.ls("data") if e["name"].endswith(".csv")]
frames = []
for path in paths:
with fs.open(path, "rb") as f:
frames.append(pd.read_csv(f))
combined = pd.concat(frames, ignore_index=True)