Evaluate the obtained policy

We have already seen how to evaluate the obtained policy in the previous section. In this section, we will dig into more details about evaluation.

Take the stage-wise independent continuous problem we have introduced. We set the number of stages T=4 to intentionally make the problem a bit more complex. We choose optimality gap less than 1e-3 as our stopping criterion and turn off simulation to obtain the exact gap. As shown below, after ten iterations, evaluation of the obtained policy shows the optimality gap of 0.22%; after twenty iterations, the optimality gap turns to 0.00%, which is below the tolerance we set, so the algorithm stops. In the end, we obtain that the optimal value is 6.68 and a first stage solution is 9.08.

[1]:
from msppy.msp import MSLP
import numpy as np
from msppy.solver import SDDP
from msppy.evaluation import Evaluation, EvaluationTrue
nvic = MSLP(T=3, sense=-1, bound=100)
def f(random_state):
    return random_state.lognormal(mean=np.log(4),sigma=2)
for t in range(3):
    m = nvic[t]
    buy_now, buy_past = m.addStateVar(name='bought', obj=-1.0)
    if t != 0:
        sold = m.addVar(name='sold', obj=2)
        unsatisfied = m.addVar(name='unsatisfied')
        recycled = m.addVar(name='recycled', obj=0.5)
        m.addConstr(sold + unsatisfied == 0, uncertainty={'rhs':f})
        m.addConstr(sold + recycled == buy_past)
nvic.discretize(random_state=1, n_samples=100)
nvic_sddp = SDDP(nvic)
nvic_sddp.solve(max_iterations=30, freq_evaluations=10, n_simulations=-1, tol=1e-3)
nvic_sddp.db[-1]
nvic_sddp.first_stage_solution
Academic license - for non-commercial use only
Academic license - for non-commercial use only
Academic license - for non-commercial use only
----------------------------------------------------------------
                   SDDP Solver, Lingquan Ding
----------------------------------------------------------------
   Iteration               Bound               Value        Time
----------------------------------------------------------------
----------------------------------------------------------------------------
             Evaluation for approximation model, Lingquan Ding
----------------------------------------------------------------------------
   Iteration               Bound               Value        Time
----------------------------------------------------------------------------
           1           75.000000            0.000000    0.030841
           2           16.762950            4.681206    0.023101
           3           14.183936           -4.237863    0.014929
           4            7.146810           18.153307    0.033554
           5            7.077500           -0.167953    0.022254
           6            7.013682           17.689941    0.023259
           7            6.792215            6.187859    0.019248
           8            6.720049           32.249301    0.017918
           9            6.699237           11.650188    0.017348
          10            6.687975            5.936189    0.017623
          10            6.687975            6.673283    8.646298       0.22%
          11            6.683078           -6.759604    0.021045
          12            6.681344           -5.749741    0.028955
          13            6.680848            1.394741    0.019374
          14            6.680848            7.709294    0.021415
          15            6.680848           15.143449    0.022055
          16            6.680848           15.751499    0.019867
          17            6.680848           -9.404420    0.015488
          18            6.680848            8.153084    0.021040
          19            6.680848            4.383224    0.021477
          20            6.680848            9.160585    0.016293
          20            6.680848            6.680848    9.266528       0.00%
----------------------------------------------------------------
Time: 0.42708349227905273 seconds
Algorithm stops since convergence tolerance:0.001 has reached
----------------------------------------------------------------------------
Time: 17.91282606124878 seconds
[1]:
{'bought': 9.082937518406089}

Evaluate the policy on the true problem

[2]:
res_true = EvaluationTrue(nvic)
res_true.run(n_simulations=3000, percentile=95,
    query=['sold','bought','unsatisfied','recycled'], query_stage_cost=True)
res_true.CI
[2]:
(4.673637020980986, 5.326828207132874)
[3]:
res_true.stage_cost
[3]:
0 1 2
0 -9.082938 6.136286 9.204739
1 -9.082938 6.136286 6.121598
2 -9.082938 -4.323174 24.059178
3 -9.082938 -5.103329 24.059178
4 -9.082938 -5.171771 19.555382
... ... ... ...
2995 -9.082938 -0.296567 24.059178
2996 -9.082938 -7.281342 8.099300
2997 -9.082938 -7.119978 24.059178
2998 -9.082938 0.424552 6.393926
2999 -9.082938 6.136286 6.346240

3000 rows × 3 columns

[4]:
res_true.solution['sold']
[4]:
0 1 2
0 NaN 9.082938 2.126630
1 NaN 9.082938 0.071202
2 NaN 2.109964 12.029589
3 NaN 1.589861 12.029589
4 NaN 1.544233 9.027058
... ... ... ...
2995 NaN 4.794369 12.029589
2996 NaN 0.137852 1.389670
2997 NaN 0.245428 12.029589
2998 NaN 5.275114 0.252754
2999 NaN 9.082938 0.220964

3000 rows × 3 columns

[5]:
res_true.solution['bought']
[5]:
0 1 2
0 9.082938 12.029589 0.0
1 9.082938 12.029589 0.0
2 9.082938 12.029589 0.0
3 9.082938 12.029589 0.0
4 9.082938 12.029589 0.0
... ... ... ...
2995 9.082938 12.029589 0.0
2996 9.082938 12.029589 0.0
2997 9.082938 12.029589 0.0
2998 9.082938 12.029589 0.0
2999 9.082938 12.029589 0.0

3000 rows × 3 columns

[6]:
res_true.solution['unsatisfied']
[6]:
0 1 2
0 NaN 5.314715 0.000000
1 NaN 2.338170 0.000000
2 NaN 0.000000 10.243121
3 NaN 0.000000 2.060445
4 NaN 0.000000 0.000000
... ... ... ...
2995 NaN 0.000000 2.027065
2996 NaN 0.000000 0.000000
2997 NaN 0.000000 10.667744
2998 NaN 0.000000 0.000000
2999 NaN 21.167075 0.000000

3000 rows × 3 columns

[7]:
res_true.solution['recycled']
[7]:
0 1 2
0 NaN 0.000000 9.902959
1 NaN 0.000000 11.958387
2 NaN 6.972973 0.000000
3 NaN 7.493077 0.000000
4 NaN 7.538705 3.002531
... ... ... ...
2995 NaN 4.288569 0.000000
2996 NaN 8.945085 10.639919
2997 NaN 8.837509 0.000000
2998 NaN 3.807823 11.776834
2999 NaN 0.000000 11.808625

3000 rows × 3 columns