Search for Files with glob()¶
fs.glob(pattern) returns a list of paths matching a shell-style wildcard pattern.
The pattern is resolved against the virtual filesystem root — the dataset or
sub-dataverse you scoped when creating the DataverseFileSystem.
Pattern Reference¶
| Pattern | Searches | Matches | Does not match |
|---|---|---|---|
*.bin |
root level only | file.bin |
0_raw/gnssa/2024/001/data.bin |
data/*.csv |
data/ only |
data/values.csv |
data/archive/old.csv |
**/*.bin |
entire tree | any .bin at any depth |
— |
0_raw/gnssa/**/*.bin |
all depths under 0_raw/gnssa/ |
0_raw/gnssa/2024/001/data.bin |
archive/data.bin |
0_raw/gnssa/*/*/* |
exactly 3 levels under 0_raw/gnssa/ |
0_raw/gnssa/2024/001/file |
0_raw/gnssa/2024/file |
**/.zattrs |
entire tree | .zattrs at any depth |
zattrs (no dot) |
Key rule
*.bin matches only at the root level — the same as in a shell or any
standard filesystem. Prefix with **/ to search recursively:
**/*.bin finds every .bin file anywhere in the dataset.
Examples¶
Find all files with a given extension¶
# All .bin files anywhere in the dataset
fs.glob("**/*.bin")
# All .bin files under a specific subdirectory (faster — scoped search)
fs.glob("0_raw/gnssa/**/*.bin")
Find files at a specific depth¶
# Files exactly 3 levels under 0_raw/gnssa/ (e.g. year/day/file)
fs.glob("0_raw/gnssa/*/*/*")
# Files exactly 3 levels deep with a specific extension
fs.glob("0_raw/gnssa/*/*/*.bin")
Find Zarr metadata files¶
# All .zattrs files inside a Zarr store
fs.glob("my_store/**/.zattrs")
# All .zarray files (chunk metadata) inside a specific variable
fs.glob("my_store/temperature/**/.zarray")
List all files in the dataset¶
# Every file, regardless of depth or name
fs.glob("**/*")
# or equivalently:
fs.find("", withdirs=False)
Inspecting results before reading¶
# See what glob returns before fetching any data
paths = fs.glob("0_raw/gnssa/**/*.bin")
print(f"{len(paths)} files found")
print(paths[:5])
# Get full metadata (size, type, id) for matched files
detail = fs.find("0_raw/gnssa", withdirs=False, detail=True)
Performance note¶
glob() and find() both scan the in-memory routing table — no additional
network requests are made after the initial dataset JSON fetch. Patterns with
a fixed prefix (e.g. 0_raw/gnssa/**/*.bin) are slightly faster than
open-ended patterns (**/*.bin) because the search is scoped to a subtree,
but the difference is small for typical dataset sizes.