Rethink the whole way we interact with data: Session, CheckedSession, FileHandler, LazySession (#727), open_excel, ... See also the refactoring in #761 and #614.
Dataset API:
__init__(connect_string, max_memory=None, **kwargs) -- filepath or connection string, kwargs passed to underlying Dataset implementation (compression option, Excel option, ...). If max_memory is not None, the Dataset will transparently flush some of its content (probably base on LRU) to "disk" when more memory is needed.
open(**kwargs) -- open/connect to the underlying storage. Kwargs here override those passed in __init__. Normally called via __enter__.
__enter__ and __exit__ (to be usable as a context manager)
read(key=None) -- read a single key, multiple keys (when key is a list), or everything (if key is None) and return the values. Unsure this explicit method makes sense. Maybe __getitem__, with an optional load() is enough.
load(key=None) -- load a single key, multiple keys (when key is a list), or everything (if key is None) and return nothing.
open_key(key=None) -- in the future for returning a lazy object which will load data when actually accessed. Can potentially load only part of that key (array/...). This needs further thoughts.
__getattr__ -> forwards to __getitem__
__getitem__(key) -> equivalent to load(key) if not loaded yet and return the array (or use open_key(key) instead???)
__setitem__(key) -> add or change an existing value.
close() -- close file/connection to underlying storage. Normally called via __exit__
Misc thoughts:
- I think excel.Workbook should be a subclass of Dataset
- We could/should also implement a generic "read" top-level function which would open a dataset, read the array and close it, to replace/complement the read_* functions.
Rethink the whole way we interact with data: Session, CheckedSession, FileHandler, LazySession (#727), open_excel, ... See also the refactoring in #761 and #614.
Dataset API:
__init__(connect_string, max_memory=None, **kwargs)-- filepath or connection string, kwargs passed to underlying Dataset implementation (compression option, Excel option, ...). If max_memory is not None, the Dataset will transparently flush some of its content (probably base on LRU) to "disk" when more memory is needed.open(**kwargs)-- open/connect to the underlying storage. Kwargs here override those passed in__init__. Normally called via__enter__.__enter__and__exit__(to be usable as a context manager)read(key=None)-- read a single key, multiple keys (when key is a list), or everything (if key is None) and return the values. Unsure this explicit method makes sense. Maybe__getitem__, with an optionalload()is enough.load(key=None)-- load a single key, multiple keys (when key is a list), or everything (if key is None) and return nothing.open_key(key=None)-- in the future for returning a lazy object which will load data when actually accessed. Can potentially load only part of that key (array/...). This needs further thoughts.__getattr__-> forwards to__getitem____getitem__(key)-> equivalent toload(key)if not loaded yet and return the array (or use open_key(key) instead???)__setitem__(key)-> add or change an existing value.close()-- close file/connection to underlying storage. Normally called via__exit__Misc thoughts: