********
Analysis
********

We don't only want to sample field configurations from the distribution given by the :ref:`action <action>`, we want to measure observables on the resulting ensembles and get reliable estimates with understood uncertainties.

Thermalization
--------------

To get an unbiased estimate from an :class:`~.Ensemble` generated from a Markov chain we have to guarantee that the set of configurations which we account for in our expectation values do not remember the initial state the chain was started from, as that state is generated by some process that is *not* representative of the actual distribution of interest (in a :func:`'cold' <supervillain.ensemble.Ensemble.generate>` start, it is from the all-zero configuration, for example).

Knowing how many configurations to cut is a judgement call, and you may be misled if the observables you consider thermalize very quickly; not every observable need thermalize at once.
Moreover, one can never be completely confident that your samples are drawn from the basin of lowest action (in a global sense); perhaps the Markov chain simply has not reached a preferable basin of configurations.

To facilitate measuring only on configurations after a certain step in the Markov chain, the :class:`~.Ensemble` provides the :func:`cut <supervillain.ensemble.Ensemble.cut>` method, which returns another :class:`~.Ensemble` without the leading configurations.

Handling Autocorrelation
------------------------

There are also correlations from one configuration to the next.
This introduces `autocorrelation`_ into the observables time series; a naive expectation value that does not account for the autocorrelation will produce underestimated uncertainties.
Good algorithms for estimating the *autocorrelation time* are known :cite:`Wolff:2003sm`.

We can measure the autocorrelation of a timeseries.

.. autofunction :: supervillain.analysis.autocorrelation
.. autofunction :: supervillain.analysis.autocorrelation_time

Some simple ways of decreasing autocorrelation are to decimate your Markov Chain, only keeping every nᵗʰ configuration.
The :class:`~.Ensemble` provides the :func:`every <supervillain.ensemble.Ensemble.every>` method, which returns another :class:`~.Ensemble` ensemble keeping configurations evenly spaced by n.
A natural choice for n is the autocorrelation time.

Ensembles also have an :meth:`~.Ensemble.autocorrelation_time`, which leverages the above :py:func:`~.analysis.autocorrelation_time` and understands which observables to include.

Blocking
--------

.. autoclass:: supervillain.analysis.Blocking
   :no-special-members:
   :members:


The Bootstrap
-------------

Bootstrap resampling, `bootstrapping`_, or "the bootstrap" is a resampling method used for uncertainty estimation.
One draws, with replacement, from a sample drawn according to the distribution of interest.
The idea is that each draw *could* have been what your samples were with the same likelihood as the ensemble you actually have, and that we can estimate uncertainties by looking at distributions of means of observables from these fictitious Markov chains.

.. autoclass:: supervillain.analysis.Bootstrap
   :no-special-members:
   :members: plot_band, plot_correlator, estimate

Uncertainty
-----------

.. autoclass:: supervillain.analysis.uncertain.Uncertain
   :members:

.. _autocorrelation: https://en.wikipedia.org/wiki/Autocorrelation
.. _bootstrapping: https://en.wikipedia.org/wiki/Bootstrapping_(statistics)

Comparing Results
-----------------

We can get at-a-glance comparisons between ensembles.
In this example we generate two ensembles from the same action and algorithm and compare their results (which ought to match, with sufficient statistics!).

.. plot:: example/plot/comparison.py
 

.. automodule:: supervillain.analysis.comparison_plot
   :members: