************* I/O with HDF5 ************* Background ========== `HDF5`_ is a format often used in computational physics and other data-science applications, because of its ability to store huge amounts of structured numerical data. Many datasets can be stored in a single file, categorized, linked together, and so on. A variety of python modules leverage HDF5 for input and output; often they rely on `h5py`_ or `PyTables`, pythonic interfaces interoperabale with `numpy`_, and no native support of python objects. However, more complex data structures have no native support for H5ing; a variety of choices are possible. Any python object can be `pickled`_ and stored as a binary blob in HDF5, but the resulting blobs are not usable outside of python. The `pandas`_ data analysis module can `read_hdf`_ and export `to_hdf`_, but even though the data is written in a usable way, the data layouts are nontrivial to read without pandas. We aim for a happy medium, by providing a class, :class:`~.ReadWriteable`, from which other python classes, which contain a variety of data fields, can inherit to allow them to be easily serialized to and from HDF5. An ReadWriteable object will be saved as a `group`_ that contains properties written into groups and `datasets`_, with the same name as the property itself. If a property is one of a slew of known types then it will be written natively as an H5 field, otherwise it will be pickled. The data types that are not pickled are - ``ReadWriteable`` or anything that inherits from ReadWriteable. - ``bool``, ``int``, ``float``, and ``complex``. - ``tuple``, ``list``, ``dict`` (`with some limitations on valid keys `_) - ``numpy.ndarray`` Serializing Objects =================== ReadWriteable objects inherit methods .. autoclass:: supervillain.h5.ReadWriteable :members: :undoc-members: :show-inheritance: .. literalinclude :: h5/readwriteable.py :pyobject: _example_readwrite :caption: One can also provide custom strategies and ``to_h5`` and ``from_h5`` methods. It is nevertheless advisable to inherit from :class:`~.ReadWriteable` for typechecking purposes. Serializing Raw Data ==================== To provide custom methods for H5ing otherwise-unknown types that cannot be made ReadWriteable, a user can write a small strategy. A strategy is an instance-free class with just static methods ``applies``, ``write``, and ``read``. For example, the strategy for writing a single integer is .. literalinclude:: h5/strategy/int.py :caption: However, it is probably simplest in most circumstances to just inherit from ``ReadWriteable``. See **supervillain/h5/strategy/** for the default strategies. If the ReadWriteable strategy is desired but the class cannot be made to inherit from ``ReadWriteable``, just create a new strategy that inherits from ``h5.strategy.ReadWriteable`` and overwrites the ``applies`` method. Extendable Data and Objects =========================== Certain data can meaningfully be extended. For example, you might do Monte Carlo generation, make measurements, and realize you don't have enough for your desired precision. In that case you might want to extend the ensemble, adding new configurations and measurements to disk. There is a datatype for wrapping numpy arrays, .. autoclass:: supervillain.h5.extendable.array :members: :show-inheritance: which indicates that the array should be written with its zeroeth (batch) dimension `resizable `_. When an :class:`~.Observable` is attached to an ensemble it is automatically an extendable array. An object which contains extendable data won't know how to extend itself unless it inherits from .. autoclass:: supervillain.h5.Extendable :members: :show-inheritance: .. literalinclude :: h5/extendable.py :pyobject: _example_extend :caption: The ``extend_h5`` method can be overwritten if custom handling is needed; the object should still inherit from :class:`~.Extendable` for typechecking purposes. .. _HDF5: https://www.hdfgroup.org/solutions/hdf5/ .. _h5py: https://docs.h5py.org/en/stable/index.html .. _PyTables: https://www.pytables.org/ .. _numpy: https://numpy.org/ .. _pandas: https://pandas.pydata.org/pandas-docs/stable/index.html .. _read_hdf: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_hdf.html .. _to_hdf: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_hdf.html .. _pickled: https://docs.python.org/3/library/pickle.html .. _group: https://docs.hdfgroup.org/hdf5/develop/_h5_d_m__u_g.html#subsubsec_data_model_abstract_group .. _datasets: https://docs.hdfgroup.org/hdf5/develop/_h5_d_m__u_g.html#subsubsec_data_model_abstract_dataset