Evaluate the obtained policy¶

We have already seen how to evaluate the obtained policy in the previous section. In this section, we will dig into more details about evaluation.

Take the stage-wise independent continuous problem we have introduced. We set the number of stages T=4 to intentionally make the problem a bit more complex. We choose optimality gap less than 1e-3 as our stopping criterion and turn off simulation to obtain the exact gap. As shown below, after ten iterations, evaluation of the obtained policy shows the optimality gap of 0.22%; after twenty iterations, the optimality gap turns to 0.00%, which is below the tolerance we set, so the algorithm stops. In the end, we obtain that the optimal value is 6.68 and a first stage solution is 9.08.

[1]:

from msppy.msp import MSLP
import numpy as np
from msppy.solver import SDDP
from msppy.evaluation import Evaluation, EvaluationTrue
nvic = MSLP(T=3, sense=-1, bound=100)
def f(random_state):
    return random_state.lognormal(mean=np.log(4),sigma=2)
for t in range(3):
    m = nvic[t]
    buy_now, buy_past = m.addStateVar(name='bought', obj=-1.0)
    if t != 0:
        sold = m.addVar(name='sold', obj=2)
        unsatisfied = m.addVar(name='unsatisfied')
        recycled = m.addVar(name='recycled', obj=0.5)
        m.addConstr(sold + unsatisfied == 0, uncertainty={'rhs':f})
        m.addConstr(sold + recycled == buy_past)
nvic.discretize(random_state=1, n_samples=100)
nvic_sddp = SDDP(nvic)
nvic_sddp.solve(max_iterations=30, freq_evaluations=10, n_simulations=-1, tol=1e-3)
nvic_sddp.db[-1]
nvic_sddp.first_stage_solution

Academic license - for non-commercial use only
Academic license - for non-commercial use only
Academic license - for non-commercial use only

----------------------------------------------------------------
                   SDDP Solver, Lingquan Ding
----------------------------------------------------------------
   Iteration               Bound               Value        Time
----------------------------------------------------------------
----------------------------------------------------------------------------
             Evaluation for approximation model, Lingquan Ding
----------------------------------------------------------------------------
   Iteration               Bound               Value        Time
----------------------------------------------------------------------------
           1           75.000000            0.000000    0.030841
           2           16.762950            4.681206    0.023101
           3           14.183936           -4.237863    0.014929
           4            7.146810           18.153307    0.033554
           5            7.077500           -0.167953    0.022254
           6            7.013682           17.689941    0.023259
           7            6.792215            6.187859    0.019248
           8            6.720049           32.249301    0.017918
           9            6.699237           11.650188    0.017348
          10            6.687975            5.936189    0.017623
          10            6.687975            6.673283    8.646298       0.22%
          11            6.683078           -6.759604    0.021045
          12            6.681344           -5.749741    0.028955
          13            6.680848            1.394741    0.019374
          14            6.680848            7.709294    0.021415
          15            6.680848           15.143449    0.022055
          16            6.680848           15.751499    0.019867
          17            6.680848           -9.404420    0.015488
          18            6.680848            8.153084    0.021040
          19            6.680848            4.383224    0.021477
          20            6.680848            9.160585    0.016293
          20            6.680848            6.680848    9.266528       0.00%
----------------------------------------------------------------
Time: 0.42708349227905273 seconds
Algorithm stops since convergence tolerance:0.001 has reached
----------------------------------------------------------------------------
Time: 17.91282606124878 seconds

[1]:

{'bought': 9.082937518406089}

Evaluate the policy on the true problem

[2]:

res_true = EvaluationTrue(nvic)
res_true.run(n_simulations=3000, percentile=95,
    query=['sold','bought','unsatisfied','recycled'], query_stage_cost=True)
res_true.CI

[2]:

(4.673637020980986, 5.326828207132874)

[3]:

res_true.stage_cost

[3]:

	0	1	2
0	-9.082938	6.136286	9.204739
1	-9.082938	6.136286	6.121598
2	-9.082938	-4.323174	24.059178
3	-9.082938	-5.103329	24.059178
4	-9.082938	-5.171771	19.555382
...	...	...	...
2995	-9.082938	-0.296567	24.059178
2996	-9.082938	-7.281342	8.099300
2997	-9.082938	-7.119978	24.059178
2998	-9.082938	0.424552	6.393926
2999	-9.082938	6.136286	6.346240

3000 rows × 3 columns

[4]:

res_true.solution['sold']

[4]:

	0	1	2
0	NaN	9.082938	2.126630
1	NaN	9.082938	0.071202
2	NaN	2.109964	12.029589
3	NaN	1.589861	12.029589
4	NaN	1.544233	9.027058
...	...	...	...
2995	NaN	4.794369	12.029589
2996	NaN	0.137852	1.389670
2997	NaN	0.245428	12.029589
2998	NaN	5.275114	0.252754
2999	NaN	9.082938	0.220964

3000 rows × 3 columns

[5]:

res_true.solution['bought']

[5]:

	0	1	2
0	9.082938	12.029589	0.0
1	9.082938	12.029589	0.0
2	9.082938	12.029589	0.0
3	9.082938	12.029589	0.0
4	9.082938	12.029589	0.0
...	...	...	...
2995	9.082938	12.029589	0.0
2996	9.082938	12.029589	0.0
2997	9.082938	12.029589	0.0
2998	9.082938	12.029589	0.0
2999	9.082938	12.029589	0.0

3000 rows × 3 columns

[6]:

res_true.solution['unsatisfied']

[6]:

	0	1	2
0	NaN	5.314715	0.000000
1	NaN	2.338170	0.000000
2	NaN	0.000000	10.243121
3	NaN	0.000000	2.060445
4	NaN	0.000000	0.000000
...	...	...	...
2995	NaN	0.000000	2.027065
2996	NaN	0.000000	0.000000
2997	NaN	0.000000	10.667744
2998	NaN	0.000000	0.000000
2999	NaN	21.167075	0.000000

3000 rows × 3 columns

[7]:

res_true.solution['recycled']

[7]:

	0	1	2
0	NaN	0.000000	9.902959
1	NaN	0.000000	11.958387
2	NaN	6.972973	0.000000
3	NaN	7.493077	0.000000
4	NaN	7.538705	3.002531
...	...	...	...
2995	NaN	4.288569	0.000000
2996	NaN	8.945085	10.639919
2997	NaN	8.837509	0.000000
2998	NaN	3.807823	11.776834
2999	NaN	0.000000	11.808625

3000 rows × 3 columns