Skip to content

Search for Files with glob()

fs.glob(pattern) returns a list of paths matching a shell-style wildcard pattern. The pattern is resolved against the virtual filesystem root — the dataset or sub-dataverse you scoped when creating the DataverseFileSystem.

Pattern Reference

Pattern Searches Matches Does not match
*.bin root level only file.bin 0_raw/gnssa/2024/001/data.bin
data/*.csv data/ only data/values.csv data/archive/old.csv
**/*.bin entire tree any .bin at any depth
0_raw/gnssa/**/*.bin all depths under 0_raw/gnssa/ 0_raw/gnssa/2024/001/data.bin archive/data.bin
0_raw/gnssa/*/*/* exactly 3 levels under 0_raw/gnssa/ 0_raw/gnssa/2024/001/file 0_raw/gnssa/2024/file
**/.zattrs entire tree .zattrs at any depth zattrs (no dot)

Key rule

*.bin matches only at the root level — the same as in a shell or any standard filesystem. Prefix with **/ to search recursively: **/*.bin finds every .bin file anywhere in the dataset.

Examples

Find all files with a given extension

# All .bin files anywhere in the dataset
fs.glob("**/*.bin")

# All .bin files under a specific subdirectory (faster — scoped search)
fs.glob("0_raw/gnssa/**/*.bin")

Find files at a specific depth

# Files exactly 3 levels under 0_raw/gnssa/ (e.g. year/day/file)
fs.glob("0_raw/gnssa/*/*/*")

# Files exactly 3 levels deep with a specific extension
fs.glob("0_raw/gnssa/*/*/*.bin")

Find Zarr metadata files

# All .zattrs files inside a Zarr store
fs.glob("my_store/**/.zattrs")

# All .zarray files (chunk metadata) inside a specific variable
fs.glob("my_store/temperature/**/.zarray")

List all files in the dataset

# Every file, regardless of depth or name
fs.glob("**/*")
# or equivalently:
fs.find("", withdirs=False)

Inspecting results before reading

# See what glob returns before fetching any data
paths = fs.glob("0_raw/gnssa/**/*.bin")
print(f"{len(paths)} files found")
print(paths[:5])

# Get full metadata (size, type, id) for matched files
detail = fs.find("0_raw/gnssa", withdirs=False, detail=True)

Performance note

glob() and find() both scan the in-memory routing table — no additional network requests are made after the initial dataset JSON fetch. Patterns with a fixed prefix (e.g. 0_raw/gnssa/**/*.bin) are slightly faster than open-ended patterns (**/*.bin) because the search is scoped to a subtree, but the difference is small for typical dataset sizes.