demo_subsubclass                                  Further subclassing of existing policies and bandits.
demo_sine_bandit                                  Bandit reward function fluctuating over time.
demo_offline_cmab_alpha_linucb_direct_method      Offline bandit and parameter evaluation - direct method.
demo_offline_cmab_alpha_linucb_replay             Offline bandit and parameter evaluation - replay.
demo_mab_policy_comparison                        Comparison of some contextual-free bandits.
demo_epsilon_greedy_policy                        Basic simulation of a context-free policy.
demo_lif_bandit                                   Use of continuum bandit and LiF policy.
demo_cmab_policy_comparison_linear_bandit         Comparison of a contextual policies with linear bandit.
demo_cmab_policy_comparison_weight_bandit         Comparison of a contextual policies with weight bandit.
demo_simpsons_paradox_propensity                  Simpson's Paradox to demonstrate propensity weighting.
demo_sutton_barto                                 Contextual code reproducing Sutton & Barto (2018) plots.
demo_bandit_algorithms_for_website_optimization   Contextual code reproducing John Myles White (2012) plots.
demo_epsilon_greedy_to_epoch_greedy_policy        Contextual epsilon epoch and greedy.
