Roadmap¶
Planned and considered improvements to dataversefs.
Mirror Cache¶
Status: Under consideration
The fsspec caching layers (filecache, blockcache) always
store files flat in the cache directory. Even with
BasenameCacheMapper(directory_levels=N), all cached files land directly in
cache_storage — path components are encoded into the filename, not turned into
subdirectories.
A mirror cache would replicate the full directory structure of the remote dataset
locally. Accessing run1/sensors/data.csv would create cache_storage/run1/sensors/data.csv
on disk, making the local cache browsable and directly usable by tools that expect a
normal directory tree.
What this requires:
- A custom
AbstractCacheMappersubclass that returns the full relative path (e.g.run1/sensors/data.csv) unchanged as the cache key. - A small
WholeFileCacheFileSystemsubclass that callsos.makedirs(os.path.dirname(fn), exist_ok=True)before writing each cached file — fsspec's default caching layer does not create parent directories.
This is a contained, well-scoped change (~20–30 lines) and would be distributed as a
dataversefs.MirrorCacheFileSystem wrapper so users do not need to assemble the pieces
themselves.
If this feature would be useful to you, please open an issue or leave a comment on the GitHub repository.