FakeQuake Seismic Data via dataversefs¶
This notebook demonstrates how to access the FakeQuake Cascadia Subduction Zone earthquake simulation dataset hosted on Borealis Dataverse using dataversefs.
Dataset: doi:10.5683/SP3/CZECEG
Host: borealisdata.ca
Contents: 112 simulated Cascadia rupture scenarios — rupture files, MiniSEED
waveforms, station metadata, and fault model files.
Dataset Layout¶
02_FinalDataPackage/
CascadiaRuptureFiles/ ← rupture .log + .rupt files (zipped)
CascadiaWaveForms/ ← MiniSEED waveforms by network/instrument (zipped)
03_StationInfo/ ← station CSVs and .gflist files
04_ModelInfo/ ← velocity model, fault geometry
05_Figures/ ← slip pattern PNGs per event
Prerequisites¶
uv add dataversefs obspy pandas matplotlib pillow python-dotenv
Create a .env file with your API token:
DATAVERSE_API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
import io
import os
import zipfile
import fsspec
import matplotlib.pyplot as plt
import obspy
import pandas as pd
from dotenv import load_dotenv
from fsspec.implementations.zip import ZipFileSystem
from PIL import Image
import dataversefs # noqa: F401 — registers 'dataverse' fsspec protocol
load_dotenv()
HOST = "borealisdata.ca"
PID = "doi:10.5683/SP3/CZECEG"
TOKEN = os.environ.get("DATAVERSE_API_TOKEN")
print(f"Host : {HOST}")
print(f"PID : {PID}")
print(f"Token: {'set' if TOKEN else 'not set'}")
Host : borealisdata.ca PID : doi:10.5683/SP3/CZECEG Token: set
Enable Logging¶
dataversefs uses loguru for structured logging.
The library is silent by default; enable it by calling logger.enable("dataversefs").
| Level | What you see |
|---|---|
INFO |
Routing table build — number of files/dirs and elapsed time |
DEBUG |
Every HEAD redirect resolution and GET Range request with timing |
import sys
from loguru import logger
logger.remove()
logger.add(
sys.stdout,
level="INFO",
filter="dataversefs",
format="{time:HH:mm:ss.SSS} | {level:<7} | {message}",
colorize=False,
)
logger.enable("dataversefs")
print("dataversefs logging enabled at INFO level.")
dataversefs logging enabled at INFO level.
Mount the Filesystem¶
The filesystem is scoped to the FakeQuake dataset. All paths below are relative to the dataset root.
fs = fsspec.filesystem(
"dataverse",
host=HOST,
pid=PID,
token=TOKEN,
skip_instance_cache=True,
)
root_entries = fs.ls("", detail=True)
print(f"{len(root_entries)} entries at dataset root:")
for e in root_entries:
print(f" [{e['type']:9s}] {e['name']}")
18:17:08.686 | INFO | Building routing table for doi:10.5683/SP3/CZECEG 18:17:09.544 | INFO | Routing table built for doi:10.5683/SP3/CZECEG — 146 files, 15 dirs (0.848s) 4 entries at dataset root: [directory] 05_Figures [directory] 04_ModelInfo [directory] 02_FinalDataPackage [directory] 03_StationInfo
Browse the File Structure¶
Use glob() to survey each top-level directory.
# Station metadata and model files
print("03_StationInfo/:")
for f in fs.glob("03_StationInfo/*"):
info = fs.info(f)
size = f"{info['size']:>10,} B" if info["type"] == "file" else ""
print(f" {size} {f}")
print()
# Figures
figures = fs.glob("05_Figures/*.png")
print(f"05_Figures/: {len(figures)} slip pattern PNGs (cascadia-000000.png … cascadia-000111.png)")
print()
# Waveform archives
print("02_FinalDataPackage/ zip archives:")
for f in fs.glob("02_FinalDataPackage/**/*.zip"):
info = fs.info(f)
size_gb = info["size"] / 1e9
print(f" {size_gb:5.2f} GB {f}")
03_StationInfo/:
5,002 B 03_StationInfo/ONC_Seismometers&GNSS.csv
14,037 B 03_StationInfo/PNSN_Seismometers.tab
256 B 03_StationInfo/onc_offshore.gflist
1,092 B 03_StationInfo/onc_onshore.gflist
5,872 B 03_StationInfo/pnsn.gflist
05_Figures/: 113 slip pattern PNGs (cascadia-000000.png … cascadia-000111.png)
02_FinalDataPackage/ zip archives:
0.01 GB 02_FinalDataPackage/CascadiaRuptureFiles/CascadiaRuptureFiles.zip
0.14 GB 02_FinalDataPackage/CascadiaWaveForms/Cas-ONC-Offshore-StrongMotion/Cas-ONC-Off_Noise.zip
0.30 GB 02_FinalDataPackage/CascadiaWaveForms/Cas-ONC-Offshore-StrongMotion/Cas-ONC-Off_Signal.zip
0.53 GB 02_FinalDataPackage/CascadiaWaveForms/Cas-ONC-Offshore-StrongMotion/Cas-ONC-Off_SignalwithNoise.zip
0.05 GB 02_FinalDataPackage/CascadiaWaveForms/Cas-ONC-Onshore-GNSS/Cas-ONC-On-GNSS_Noise.zip
0.04 GB 02_FinalDataPackage/CascadiaWaveForms/Cas-ONC-Onshore-GNSS/Cas-ONC-On-GNSS_Signal.zip
0.05 GB 02_FinalDataPackage/CascadiaWaveForms/Cas-ONC-Onshore-GNSS/Cas-ONC-On-GNSS_SignalwithNoise.zip
4.77 GB 02_FinalDataPackage/CascadiaWaveForms/Cas-ONC-Onshore-StrongMotion/Cas-ONC-On-SM_Noise.zip
1.71 GB 02_FinalDataPackage/CascadiaWaveForms/Cas-ONC-Onshore-StrongMotion/Cas-ONC-On-SM_Signal.zip
2.81 GB 02_FinalDataPackage/CascadiaWaveForms/Cas-ONC-Onshore-StrongMotion/Cas-ONC-On_SM_SignalwithNoise/Cas-ONC-On-SM_SignalwithNoise_1.zip
2.82 GB 02_FinalDataPackage/CascadiaWaveForms/Cas-ONC-Onshore-StrongMotion/Cas-ONC-On_SM_SignalwithNoise/Cas-ONC-On-SM_SignalwithNoise_2.zip
4.38 GB 02_FinalDataPackage/CascadiaWaveForms/Cas-PNSN-Onshore-StrongMotion/Cas-PNSN_Noise/Cas-PNSN_Noise_1.zip
4.15 GB 02_FinalDataPackage/CascadiaWaveForms/Cas-PNSN-Onshore-StrongMotion/Cas-PNSN_Noise/Cas-PNSN_Noise_2.zip
4.15 GB 02_FinalDataPackage/CascadiaWaveForms/Cas-PNSN-Onshore-StrongMotion/Cas-PNSN_Noise/Cas-PNSN_Noise_3.zip
4.38 GB 02_FinalDataPackage/CascadiaWaveForms/Cas-PNSN-Onshore-StrongMotion/Cas-PNSN_Noise/Cas-PNSN_Noise_4.zip
4.38 GB 02_FinalDataPackage/CascadiaWaveForms/Cas-PNSN-Onshore-StrongMotion/Cas-PNSN_Noise/Cas-PNSN_Noise_5.zip
4.38 GB 02_FinalDataPackage/CascadiaWaveForms/Cas-PNSN-Onshore-StrongMotion/Cas-PNSN_Noise/Cas-PNSN_Noise_6.zip
4.81 GB 02_FinalDataPackage/CascadiaWaveForms/Cas-PNSN-Onshore-StrongMotion/Cas-PNSN_Signal/Cas-PNSN_Signal_1.zip
4.83 GB 02_FinalDataPackage/CascadiaWaveForms/Cas-PNSN-Onshore-StrongMotion/Cas-PNSN_Signal/Cas-PNSN_Signal_2.zip
5.30 GB 02_FinalDataPackage/CascadiaWaveForms/Cas-PNSN-Onshore-StrongMotion/Cas-PNSN_SignalwithNoise/Cas-PNSN_SignalwithNoise_1.zip
5.07 GB 02_FinalDataPackage/CascadiaWaveForms/Cas-PNSN-Onshore-StrongMotion/Cas-PNSN_SignalwithNoise/Cas-PNSN_SignalwithNoise_2.zip
5.06 GB 02_FinalDataPackage/CascadiaWaveForms/Cas-PNSN-Onshore-StrongMotion/Cas-PNSN_SignalwithNoise/Cas-PNSN_SignalwithNoise_3.zip
5.06 GB 02_FinalDataPackage/CascadiaWaveForms/Cas-PNSN-Onshore-StrongMotion/Cas-PNSN_SignalwithNoise/Cas-PNSN_SignalwithNoise_4.zip
5.28 GB 02_FinalDataPackage/CascadiaWaveForms/Cas-PNSN-Onshore-StrongMotion/Cas-PNSN_SignalwithNoise/Cas-PNSN_SignalwithNoise_5.zip
Station Metadata¶
Station information is available as plain CSV/tab files — directly readable with Pandas without downloading a zip.
# ONC stations (offshore + onshore, GNSS + strong-motion)
onc = pd.read_csv(fs.open("03_StationInfo/ONC_Seismometers&GNSS.csv"))
print(f"ONC stations: {len(onc)} rows")
onc.head()
ONC stations: 38 rows
| network code | stationCode.locationCode | description | lon | lat | depth | channel code east | channel code north | channel code vertical | channel code east.1 | channel code north.1 | channel code vertical.1 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | NaN | onshore | NaN | NaN | NaN | NaN | strong-motion | NaN | NaN | GNSS | NaN | NaN |
| 1 | OW | AL2H | Albert Head is a community located in Metchosi... | -123.487390 | 48.38981 | -29.0 | HNE | HNN | HNZ | LYE | LYN | LYZ |
| 2 | OW | BAMF | Bamfield is a community located on the west co... | -125.135265 | 48.83544 | -8.0 | HNE | HNN | HNZ | LYE | LYN | LYZ |
| 3 | OW | BCOV | Beaver Cove is a small coastal community on No... | -126.842640 | 50.54427 | -36.0 | HNE | HNN | HNZ | LYE | LYN | LYZ |
| 4 | CN | BPEB | Brooks Pensinsula is on the west coast of Vanc... | -127.771880 | 50.15662 | -730.0 | HNE | HNN | HNZ | LYE | LYN | LYZ |
# PNSN broadband and strong-motion stations (tab-separated)
pnsn = pd.read_csv(fs.open("03_StationInfo/PNSN_Seismometers.tab"), sep="\t")
print(f"PNSN stations: {len(pnsn)} rows")
pnsn.head()
PNSN stations: 156 rows
| network code | stationCode.locationCode | description | lon | lat | depth | channel code east | channel code north | channel code vertical | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | UO | ALSE | Alsea, OR, USA | -123.590401 | 44.381802 | -95.00 | HNE | HNN | HNZ |
| 1 | UO | BEER | Swisshome, OR, USA | -123.847298 | 44.107700 | -117.00 | HNE | HNN | HNZ |
| 2 | UO | BENT | Myrtle Point, OR, USA | -124.272697 | 42.958801 | -657.20 | HNE | HNN | HNZ |
| 3 | UO | BLEU | Tillamook, OR, USA | -123.794701 | 45.422699 | -15.55 | HNE | HNN | HNZ |
| 4 | UO | CARP | Carpenterville, OR, USA | -124.344704 | 42.230400 | -613.30 | HNE | HNN | HNZ |
Slip Pattern Figures¶
Each of the 112 simulated rupture events has a corresponding slip pattern PNG. These are directly accessible — no zip extraction required.
fig, axes = plt.subplots(1, 3, figsize=(15, 5))
for ax, event_num in zip(axes, [0, 50, 111]):
png_path = f"05_Figures/cascadia-{event_num:06d}.png"
png_bytes = fs.cat(png_path)
img = Image.open(io.BytesIO(png_bytes))
ax.imshow(img)
ax.axis("off")
ax.set_title(f"Event {event_num:03d}")
fig.suptitle("Cascadia Rupture Slip Patterns", fontsize=14)
plt.tight_layout()
plt.show()
Rupture Files (Small Zip, In-Memory)¶
The kinematic rupture parameters for all 112 events are bundled in a single
5 MB zip — small enough to load entirely into memory with fs.cat(), then
extract individual entries using Python's zipfile module.
RUPT_ZIP = "02_FinalDataPackage/CascadiaRuptureFiles/CascadiaRuptureFiles.zip"
zip_bytes = fs.cat(RUPT_ZIP)
print(f"Downloaded {len(zip_bytes):,} bytes")
with zipfile.ZipFile(io.BytesIO(zip_bytes)) as zf:
all_entries = zf.namelist()
log_files = sorted(f for f in all_entries if f.endswith(".log"))
rupt_files = sorted(f for f in all_entries if f.endswith(".rupt"))
print(f"{len(log_files)} .log files, {len(rupt_files)} .rupt files")
print()
# Print the event summary for event 000
print(f"--- {log_files[0]} ---")
print(zf.read(log_files[0]).decode())
Downloaded 5,192,342 bytes 112 .log files, 112 .rupt files --- CascadiaRuptureFiles/cascadia-000000.log --- Scenario calculated at 2022-03-30 20:02:06 GMT Project name: Cascadia Run name: cascadia Run number: 000000 Velocity model: cascadia.mod No. of KL modes: 72 Hurst exponent: 0.4 Corr. length used Lstrike: 21.39 km Corr. length used Ldip: 7.66 km Slip std. dev.: 0.900 km Maximum length Lmax: 12.77 km Maximum width Wmax: 2.52 km Effective length Leff: 10.86 km Effective width Weff: 2.14 km Target magnitude: Mw 6.8000 Actual magnitude: Mw 6.7573 Hypocenter (lon,lat,z[km]): (236.355030,41.524822,30.29) Hypocenter time: 2023-03-23T00:00:00.000000Z Centroid: (lon,lat,z[km]): (236.377573,41.471190,30.55) Source time function type: dreger Average Rise Time (s): 1.82 Average Rupture Velocity (km/s): 3.20
MiniSEED Waveforms via Layered ZipFileSystem¶
The waveform archives are large (100 MB – 5 GB). Downloading an entire archive just to access a handful of traces would be wasteful.
A better approach: layer fsspec's ZipFileSystem on top of a dataversefs file
object. Because DataverseFile is seekable (it translates seek() calls into
HTTP Range requests), Python's zipfile engine can:
- Fetch the ZIP central directory from the end of the file (~1–2 Range requests)
- Seek directly to each requested entry and fetch only its compressed bytes
This transfers only the central directory + the entries you actually open — not the full archive.
!!! note "When to download the full archive instead"
For bulk access — processing all 112 events or all stations — the per-entry
overhead of many small Range requests adds up. In that case,
fs.get(zip_path, "local.zip") is more efficient.
See the Work with Zip Archives guide.
# Use the ONC offshore signal archive — the smallest waveform zip (301 MB)
WAVE_ZIP = (
"02_FinalDataPackage/CascadiaWaveForms/"
"Cas-ONC-Offshore-StrongMotion/Cas-ONC-Off_Signal.zip"
)
# Layer ZipFileSystem on top of a seekable dataversefs file object.
# zipfile reads the central directory via Range requests; no full download.
zip_fo = fs.open(WAVE_ZIP, "rb")
zip_fs = ZipFileSystem(zip_fo)
# Discover the internal structure
top_entries = zip_fs.ls("", detail=False)
print(f"Zip root: {len(top_entries)} entries")
for e in top_entries[:5]:
print(f" {e}")
if len(top_entries) > 5:
print(f" … and {len(top_entries) - 5} more")
Zip root: 1 entries Cas-ONC-Off_Signal
# Find MiniSEED files for the first event, vertical (HNZ) channel
all_mseed = zip_fs.find("", detail=False)
all_mseed = [f for f in all_mseed if f.endswith(".mseed")]
print(f"Total .mseed files in archive: {len(all_mseed)}")
# Pick one HNZ trace from the first event
event_0_hnz = [f for f in all_mseed if "cascadia-000000" in f and "-HNZ" in f]
print(f"Event 000 HNZ traces: {len(event_0_hnz)}")
for f in event_0_hnz:
print(f" {f}")
Total .mseed files in archive: 1680 Event 000 HNZ traces: 5 Cas-ONC-Off_Signal/Cas-ONC-Off-Sig_cascadia-000000/Cas-ONC-Off-Sig_cascadia-000000_BACME-W1-HNZ.mseed Cas-ONC-Off_Signal/Cas-ONC-Off-Sig_cascadia-000000/Cas-ONC-Off-Sig_cascadia-000000_CBC27-W1-HNZ.mseed Cas-ONC-Off_Signal/Cas-ONC-Off-Sig_cascadia-000000/Cas-ONC-Off-Sig_cascadia-000000_CQS64-W1-HNZ.mseed Cas-ONC-Off_Signal/Cas-ONC-Off-Sig_cascadia-000000/Cas-ONC-Off-Sig_cascadia-000000_NC89-W1-HNZ.mseed Cas-ONC-Off_Signal/Cas-ONC-Off-Sig_cascadia-000000/Cas-ONC-Off-Sig_cascadia-000000_NCBC-W1-HNZ.mseed
# Read one MiniSEED file directly from the zip — only its compressed bytes
# are fetched from Borealis, not the full 301 MB archive.
with zip_fs.open(event_0_hnz[0], "rb") as f:
st = obspy.read(f)
print(st)
print()
tr = st[0]
print(f"Station : {tr.stats.network}.{tr.stats.station}.{tr.stats.channel}")
print(f"Start time: {tr.stats.starttime}")
print(f"Sampling : {tr.stats.sampling_rate} Hz")
print(f"Duration : {tr.stats.npts / tr.stats.sampling_rate:.1f} s")
print(f"Units : acceleration (m/s²)")
1 Trace(s) in Stream: .BACME..HNZ | 2023-03-22T23:58:00.000000Z - 2023-03-23T00:08:29.990000Z | 100.0 Hz, 63000 samples Station : .BACME.HNZ Start time: 2023-03-22T23:58:00.000000Z Sampling : 100.0 Hz Duration : 630.0 s Units : acceleration (m/s²)
Plot the Waveform¶
tr = st[0]
times = tr.times() # seconds since starttime
fig, ax = plt.subplots(figsize=(12, 3))
ax.plot(times, tr.data, linewidth=0.6, color="steelblue")
ax.set_xlabel("Time (s)")
ax.set_ylabel("Acceleration (m/s²)")
ax.set_title(
f"{tr.stats.network}.{tr.stats.station}.{tr.stats.channel} — "
"cascadia-000000 signal"
)
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
zip_fo.close() # close the underlying dataversefs file object
What Happened Under the Hood¶
fsspec.filesystem("dataverse", ...)— instantiatedDataverseFileSystemand built a routing table from the dataset's file list (146 files). The numbered directory prefixes (02_,03_, …) are part of thedirectoryLabelmetadata and are preserved as-is in the virtual path tree.Direct file access (
fs.cat,fs.open) —fs.cat("03_StationInfo/ONC_Seismometers&GNSS.csv")and the PNG fetches each issue a singleHEADrequest to resolve the Borealis → S3 pre-signed URL, followed by aGETfor the full file bytes.In-memory zip —
fs.cat(RUPT_ZIP)downloads the 5 MB rupture archive in one request. Python'szipfile.ZipFile(io.BytesIO(...))then navigates the central directory and decompresses individual entries entirely in RAM.Layered
ZipFileSystem—ZipFileSystem(fs.open(WAVE_ZIP, "rb"))wraps the dataversefs file object in fsspec's zip filesystem. Whenzipfile.ZipFileopens in read mode, it callsseek()to find the central directory at the end of the file. Eachseek()translates to an HTTP Range request viaAbstractBufferedFile._fetch_range(). Only the central directory bytes and the specific compressed entries youopen()are transferred — not the full 301 MB archive.ObsPy —
obspy.read(f)accepts any file-like object. The decompressed MiniSEED bytes flow from zipfile's decompressor directly into ObsPy's parser without touching the filesystem.