Solve the true problem¶
Now that we have constructed the true problem and discretize it. The MSPPy package provides two kinds of solvers, namely SDDP and extensive, to solve the problem. After solving the problem, the obtained policy can be evaluated both on the discretization problem and the true problem.
Stage-wise independent discrete finite problem¶
Both the extensive solver and SDDP solver shows that, the optimum is $3.18 and an optimal amount of newspaper to buy today is 7 for the true problem
from msppy.utils.examples import construct_nvid
from msppy.solver import SDDP, Extensive
from msppy.evaluation import Evaluation, EvaluationTrue
nvid = construct_nvid()
nvid_ext = Extensive(nvid)
nvid_ext.solve(outputFlag=0)
nvid_ext.first_stage_solution
nvid_sddp = SDDP(nvid)
nvid_sddp.solve(max_iterations=10)
nvid_sddp.db[-1]
nvid_sddp.first_stage_solution
res = EvaluationTrue(nvid)
res.run(n_simulations=-1)
res.gap
Stage-wise independent continuous problem¶
In this case, for both solvers, the problem needs to be discretized. The following snippet discretizes the data process by 100 number of samples and a random seed 888. We refer the resulting disretization problem as SAA(1). Both solvers show that, for SAA(1), the optimum is $2.79 and an optimal amount of newspaper to buy today is 9.08
from msppy.utils.examples import construct_nvic
from msppy.solver import SDDP, Extensive
from msppy.evaluation import Evaluation, EvaluationTrue
nvic = construct_nvic()
nvic.discretize(random_state=1, n_samples=100)
nvic_ext = Extensive(nvic)
nvic_ext.solve(outputFlag=0)
nvic_ext.first_stage_solution
nvic_sddp = SDDP(nvic)
nvic_sddp.solve(max_iterations=10)
nvic_sddp.db[-1]
nvic_sddp.first_stage_solution
res = Evaluation(nvic)
res.run(n_simulations=-1)
res.gap
res_true = EvaluationTrue(nvid)
res_true.run(n_simulations=3000, percentile=95)
res_true.CI
Our intention is solve the true problem rather than the SAA(1). The advantage of SDDP solver is that by solving SAA(1), it also offers a policy to the true problem (though not optimal). By simulation as shown above, the 95% confidence interval of the value of the obtained policy on the true problem is [3.10,3.37]. Therefore, we conclude that by solving SAA(1), the SDDP solver provides you with a policy whose value is within [3.10,3.37] with 95% confidence for the true problem.
Markov chain problem¶
Both the extensive solver and SDDP solver shows that, the optimum is $9.05 and an optimal amount of newspaper to buy today is 6 for the true problem.
from msppy.utils.examples import construct_nvmc
from msppy.solver import SDDP, Extensive
from msppy.evaluation import Evaluation, EvaluationTrue
nvmc = construct_nvmc()
nvmc_ext = Extensive(nvmc)
nvmc_ext.solve()
nvmc_ext.first_stage_solution
nvmc_sddp = SDDP(nvmc)
nvmc_sddp.solve(max_iterations=10)
nvmc_sddp.db[-1]
nvmc_sddp.first_stage_solution
res = Evaluation(nvmc)
res.run(n_simulations=-1)
res.gap
Markovian continuous problem¶
In this case, for both solvers, the problem needs to be discretized. The following snippet ten dimensional Markov chain using stochastic approximation method (stochastic gradient descent) with 1000 iterations. We refer the resulting disretization problem as MC. Both solvers show that, for MC, the optimum is $2.67 and an optimal amount of newspaper to buy today is 17.76
from msppy.utils.examples import construct_nvm
from msppy.solver import SDDP, Extensive
from msppy.evaluation import Evaluation, EvaluationTrue
nvm = construct_nvm()
nvm.discretize(n_Markov_states=10, n_sample_paths=1000, method='SA');
nvm_ext = Extensive(nvm)
nvm_ext.solve(outputFlag=0)
nvm_ext.first_stage_solution
nvm_sddp = SDDP(nvm)
nvm_sddp.solve(max_iterations=100, logToConsole=0)
nvm_sddp.db[-1]
nvm_sddp.first_stage_solution
res = Evaluation(nvm)
res.run(n_simulations=-1)
res.gap
res_true = EvaluationTrue(nvm)
res_true.run(n_simulations=3000)
res_true.CI
But our intention is solve the true problem rather than the MC. The advantage of SDDP solver is that by solving MC, it also offers a policy to the true problem (though not guaranteed to be optimal). By simulation as above, the 95% confidence interval of the value of the obtained policy on the true problem is [26.54,27.99]. Therefore, we conclude that by solving MC, the SDDP solver provides you with a policy whose value is within [26.54,27.99] with 95% confidence for the true problem.
Risk averse problem¶
Direct method
from msppy.utils.examples import construct_nvid
from msppy.solver import SDDP
nvid = construct_nvid()
nvid.set_AVaR(l=0.5, a=0.1, method='direct')
nvid_sddp = SDDP(nvid)
nvid_sddp.solve(max_iterations=10)
nvid_sddp.first_stage_solution
Indirect method
from msppy.utils.examples import construct_nvid
from msppy.solver import SDDP
nvid = construct_nvid()
nvid.set_AVaR(l=0.5, a=0.1, method='indirect')
nvid_sddp = SDDP(nvid)
nvid_sddp.solve(max_iterations=20)
nvid_sddp.first_stage_solution
Biased sampling¶
For both finite-horizon and infinite horizon risk averse problems, we provide a biased sampling scheme to solve the problem. Details can be found in the paper. A biased sampling scheme is in essence a change of the refrence probability measure making `bad’ scenarios more frequent and is shown to improve the rate of convergence.
Finite horizon
from msppy.utils.examples import construct_nvid
from msppy.solver import SDDP
nvid = construct_nvid()
nvid.set_AVaR(l=0.5, a=0.1)
nvid_sddp = SDDP(nvid, biased_sampling = True)
nvid_sddp.solve(max_iterations=20)
Inifinite horizon
from msppy.utils.examples import construct_nvid
from msppy.solver import SDDP
nvid = construct_nvid()
nvid.set_AVaR(l=0.5, a=0.1)
nvid_sddp = PSDDP(nvid, biased_sampling = True)
nvid_sddp.solve(max_iterations=20)
Integer problem¶
SDDiP solver and Extensive solver can solve MSIP problems. The built-in cuts provided are Benders’ cut, strengthened Benders’ cut and Lagaragian cut.
from msppy.utils.examples import construct_nvidi
from msppy.solver import SDDiP, Extensive
nvidi = construct_nvidi()
nvidi_ext = Extensive(nvidi)
nvidi_ext.solve(outputFlag=0)
nvidi_ext.first_stage_solution
nvidi_sddip = SDDiP(nvidi)
nvidi_sddip.solve(max_iterations=10, cuts=['SB'])
nvidi_sddip.db[-1]
nvidi_sddip.first_stage_solution
Infinite horizon problem¶
PSDDP solver use periodical SDDP algorithm to solve MSLP problems. Recall that we have constructed a 3-stage MSLP with period of 2 stages. So the backward pass will add cuts to these 3 stages. In the following snippet, forward_T is set to be 10, meaning that the forward pass will solve the first 10 stages and select trial solutions.
from msppy.utils.examples import construct_nvidinf
from msppy.solver import PSDDP
nvidinf = construct_nvidinf()
nvidinf_sddp = PSDDP(nvidinf)
nvidinf_sddp.solve(max_iterations=10, forward_T=10)
nvidinf_sddp.db[-1]
nvidinf_sddp.first_stage_solution
from msppy.evaluation import Evaluation
res = Evaluation(nvidinf)
res.run(n_simulations=1000, T=100, n_processes=3)
res.gap
Stopping criterion of the SDDP solver¶
SDDP algorithm is an iterative algorithm and thus we need to specify when to stop the algorithm. Besides simple numeric stopping criterion (max_iterations, max_time, max_stable_iterations, see documentation below), more sophisticated criterion are available. Returns back to the stage-wise independent finite discrete problem introduced at the beginning of this section, in which I specify max_iterations to be 10. Because this problem is tiny, one can be certain that after 10 iterations, it converges. When problem is large-scale and we cannot estimate a good threshold for number of iterations or computational time. We need something else.
Optimality gap¶
The following snippet specifies an optimality-gap based stopping criteria. Specifically, the SDDP solver evaluates the obtained policy every 1 iteration by compute the exact expected policy value (turning off simulation). If the gap becomes not larger than 1e-4, the algorithm will be stopped. We can see that the algorithm stops after 6 iterations.
from msppy.utils.examples import construct_nvid
from msppy.solver import SDDP
nvid = construct_nvid()
nvid_sddp = SDDP(nvid)
nvid_sddp.solve(freq_evaluations=1, n_simulations=-1, tol=1e-4)
nvid_sddp.iteration
nvid_sddp.db[-1]
nvid_sddp.first_stage_solution
Stabilization¶
The following snippet specifies a stabilization based stopping criteria. Specifically, the SDDP solver compares the policy every 1 iteration by computing the difference of the expected policy values. If the difference becomes not larger than 1e-4, the algorithm will be stopped. We can see that again, the algorithm stops after 6 iterations.
from msppy.utils.examples import construct_nvid
from msppy.solver import SDDP
nvid = construct_nvid()
nvid_sddp = SDDP(nvid)
nvid_sddp.solve(freq_comparisons=1, n_simulations=-1, tol=1e-4)
nvid_sddp.iteration
nvid_sddp.db[-1]
nvid_sddp.first_stage_solution
Module Reference¶
The Solver module¶
-
class
msppy.solver.Extensive(MSP)[source]¶ Extensive solver class. Can solve 1. small-scale stgage-wise independent finite discrete risk netural problem 2. small-scale Markov chain risk neutral problem.
Parameters: MSP (list) – A multi-stage stochastic program object. -
extensive_model¶ The constructed extensive model
-
solving_time¶ The time cost in solving extensive model
-
construction_time¶ The time cost in constructing extensive model
-
first_stage_solution¶ the obtained solution in the first stage
-
-
class
msppy.solver.SDDP(MSP, biased_sampling=False)[source]¶ SDDP solver base class.
Parameters: MSP (list) – A multi-stage stochastic program object. -
bounds¶ dataframe of the obtained bound
-
first_stage_solution¶ the obtained solution in the first stage
-
plot_bounds(start=0, window=1, smooth=0, ax=None)[source]¶ plot the evolution of bounds
Parameters: - ax (Matplotlib AxesSubplot instance, optional) – The specified subplot is used to plot; otherwise a new figure is created.
- window (int, optional (default=1)) – The length of the moving windows to aggregate the policy values. If length is bigger than 1, approximate confidence interval of the policy values and statistical bounds will be plotted.
- smooth (bool, optional (default=0)) – If 1, fit a smooth line to the policy values to better visualize the trend of statistical values/bounds.
- start (int, optional (default=0)) – The start iteration to plot the bounds. Set start to other values can zoom in the evolution of bounds in most recent iterations.
Returns: Return type: matplotlib.pyplot.figure instance
-
solve(n_processes=1, n_steps=1, max_iterations=10000, max_stable_iterations=10000, max_time=1000000.0, tol=0.001, freq_evaluations=None, percentile=95, tol_diff=-inf, random_state=None, evaluation_true=False, freq_comparisons=None, n_simulations=3000, n_simulations_true=3000, query=None, query_T=None, query_dual=None, query_stage_cost=False, query_policy_value=False, freq_clean=None, logFile=1, logToConsole=1, directory='', rgl_norm='L2', rgl_a=0, rgl_b=0.95)[source]¶ Solve the discretized problem.
Parameters: - n_processes (int, optional (default=1)) – The number of processes to run in parallel. Run serial SDDP if 1. If n_steps is 1, n_processes is coerced to be 1.
- n_steps (int, optional (default=1)) – The number of forward/backward steps to run in each cut iteration. It is coerced to be 1 if n_processes is 1.
- max_iterations (int, optional (default=10000)) – The maximum number of iterations to run SDDP.
- max_stable_iterations (int, optional (default=10000)) – The maximum number of iterations to have same deterministic bound
- tol (float, optional (default=1e-3)) – tolerance for convergence of bounds
- freq_evaluations (int, optional (default=None)) – The frequency of evaluating gap on the discretized problem. It will be ignored if risk averse
- percentile (float, optional (default=95)) – The percentile used to compute confidence interval
- diff (float, optional (default=-inf)) – The stabilization threshold
- freq_comparisons (int, optional (default=None)) – The frequency of comparisons of policies
- n_simulations (int, optional (default=10000)) – The number of simluations to run when evaluating a policy on the discretized problem
- freq_clean (int/list, optional (default=None)) – The frequency of removing redundant cuts. If int, perform cleaning at the same frequency for all stages. If list, perform cleaning at different frequency for each stage; must be of length T-1 (the last stage does not have any cuts).
- random_state (int, RandomState instance or None, optional (default=None)) – Used in evaluations and comparisons. (In the forward step, there is an internal random_state which is not supposed to be changed.) If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by numpy.random.
- logFile (binary, optional (default=1)) – Switch of logging to log file
- logToConsole (binary, optional (default=1)) – Switch of logging to console
Examples
>>> SDDP().solve(max_iterations=10, max_time=10, max_stable_iterations=10) Optimality gap based stopping criteria: evaluate the obtained policy every freq_evaluations iterations by running n_simulations Monte Carlo simulations. If the gap becomes not larger than tol, the algorithm will be stopped. >>> SDDP().solve(freq_evaluations=10, n_simulations=1000, tol=1e-2) Simulation can be turned off; the solver will evaluate the exact expected policy value. >>> SDDP().solve(freq_evaluation=10, n_simulations=-1, tol=1e-2) Stabilization based stopping criteria: compare the policy every freq_comparisons iterations by computing the CI of difference of the expected policy values. If the upper end of CI becomes not larger than tol diff, the algorithm will be stopped. >>> SDDP().solve(freq_comparisons=10, n_simulations=1000, tol=1e-2) Turn off simulation and
-
-
class
msppy.solver.SDDiP(MSP, biased_sampling=False)[source]¶ SDDP solver base class.
Parameters: MSP (list) – A multi-stage stochastic program object. -
solve(cuts, pattern=None, relax_stage=None, level_step_size=0.2929, level_max_stable_iterations=1000, level_max_iterations=1000, level_max_time=1000, level_mip_gap=0.0001, level_tol=0.001, *args, **kwargs)[source]¶ Call SDDiP solver to solve the discretized problem.
Parameters: - cuts (list) – Entries of the list could be ‘B’,’SB’,’LG’
- pattern (dict, optional (default=None)) – The pattern of adding cuts can be cyclical or barrier-in. See the example below.
- relax_stage (int, optional (default=None)) – All stage models after relax_stage (exclusive) will be relaxed.
- level_step_size (float, optional (default=0.2929)) – Step size for level method.
- level_max_stable_iterations (int, optional (default=1000)) – The maximum number of iterations to have the same deterministic g_* for the level method.
- level_mip_gap (float, optional (default=1e-4)) – The MIPGap used when solving the inner problem for the level method.
- level_max_iterations (int, optional (default=1000)) – The maximum number of iterations to run for the level method.
- level_max_time (int, optional (default=1000)) – The maximum number of time to run for the level method.
- level_tol (float, optional (default=1e-3)) – Tolerance for convergence of bounds for the level method.
Examples
>>> SDDiP().solve(max_iterations=10, cut=['SB'])
The following cyclical add difference cuts. Specifically, for every six iterations add Benders’ cuts for the first four, strengthened Benders’ cuts for the fifth, and Lagrangian cuts for the last.
>>> SDDiP().solve(max_iterations=10, cut=['B','SB','LG'], ... pattern={"cycle": (4, 1, 1)})
The following add difference cuts from certain iterations. Specifically, add Benders’ cuts from the beginning, Strengthened Benders’ cuts from the fourth iteration, and Lagragian cuts from the fifth iteration.
>>> SDDiP().solve(max_iterations=10, cut=['B','SB','LG'], ... pattern={'in': (0, 4, 5)})
-
The Evaluation module¶
-
class
msppy.evaluation.Evaluation(MSP)[source]¶ Evaluation base class.
Parameters: MSP (list) – A multi-stage stochastic program object. -
db¶ The deterministic bounds.
Type: list
-
pv¶ The simulated policy values.
Type: list
-
epv¶ The exact value of expected policy value (only available for approximation model).
Type: float
-
CI¶ The CI of simulated policy values.
Type: tuple
-
gap¶ The gap between upper end of the CI and deterministic bound.
Type: float
-
stage_cost¶ The cost of individual stage models.
Type: dataframe
-
solution¶ The solution of queried variables.
Type: dataframe
-
n_sample_paths¶ The number of sample paths to evaluate policy.
Type: int
-
sample_paths_idx¶ The index list of exhaustive sample paths if simulation is turned off.
Type: list
-
markovian_samples¶ The simulated Markovian type samples.
-
markovian_idx¶ The Markov state that is the closest to the markovian_samples.
Type: list
-
run(*args, **kwargs)[source]¶ Run a Monte Carlo simulation to evaluate the policy.
Parameters: - n_simulations (int/-1) – If int: the number of simulations; If -1: exhuastive evaluation.
- percentile (float, optional (default=95)) – The percentile used to compute the confidence interval.
- query (list, optional (default=None)) – The names of variables that are intended to query.
- query_dual (list, optional (default=None)) – The names of constraints whose dual variables are intended to query.
- query_stage_cost (bool, optional (default=False)) – Whether to query values of individual stage costs.
- n_processes (int, optional (default=1)) – The number of processes to run the simulation.
- T (int, optional (default=None)) – For infinite horizon problem, the number stages to evaluate the policy.
-
-
class
msppy.evaluation.EvaluationTrue(MSP)[source]¶ Evaluation base class.
Parameters: MSP (list) – A multi-stage stochastic program object. -
db¶ The deterministic bounds.
Type: list
-
pv¶ The simulated policy values.
Type: list
-
epv¶ The exact value of expected policy value (only available for approximation model).
Type: float
-
CI¶ The CI of simulated policy values.
Type: tuple
-
gap¶ The gap between upper end of the CI and deterministic bound.
Type: float
-
stage_cost¶ The cost of individual stage models.
Type: dataframe
-
solution¶ The solution of queried variables.
Type: dataframe
-
n_sample_paths¶ The number of sample paths to evaluate policy.
Type: int
-
sample_paths_idx¶ The index list of exhaustive sample paths if simulation is turned off.
Type: list
-
markovian_samples¶ The simulated Markovian type samples.
-
markovian_idx¶ The Markov state that is the closest to the markovian_samples.
Type: list
-
run(*args, **kwargs)[source]¶ Run a Monte Carlo simulation to evaluate the policy.
Parameters: - n_simulations (int/-1) – If int: the number of simulations; If -1: exhuastive evaluation.
- percentile (float, optional (default=95)) – The percentile used to compute the confidence interval.
- query (list, optional (default=None)) – The names of variables that are intended to query.
- query_dual (list, optional (default=None)) – The names of constraints whose dual variables are intended to query.
- query_stage_cost (bool, optional (default=False)) – Whether to query values of individual stage costs.
- n_processes (int, optional (default=1)) – The number of processes to run the simulation.
- T (int, optional (default=None)) – For infinite horizon problem, the number stages to evaluate the policy.
-