Skip to content

Roadmap

Planned and considered improvements to dataversefs.


Mirror Cache

Status: Under consideration

The fsspec caching layers (filecache, blockcache) always store files flat in the cache directory. Even with BasenameCacheMapper(directory_levels=N), all cached files land directly in cache_storage — path components are encoded into the filename, not turned into subdirectories.

A mirror cache would replicate the full directory structure of the remote dataset locally. Accessing run1/sensors/data.csv would create cache_storage/run1/sensors/data.csv on disk, making the local cache browsable and directly usable by tools that expect a normal directory tree.

What this requires:

  1. A custom AbstractCacheMapper subclass that returns the full relative path (e.g. run1/sensors/data.csv) unchanged as the cache key.
  2. A small WholeFileCacheFileSystem subclass that calls os.makedirs(os.path.dirname(fn), exist_ok=True) before writing each cached file — fsspec's default caching layer does not create parent directories.

This is a contained, well-scoped change (~20–30 lines) and would be distributed as a dataversefs.MirrorCacheFileSystem wrapper so users do not need to assemble the pieces themselves.

If this feature would be useful to you, please open an issue or leave a comment on the GitHub repository.