Goal
The goal of this issue is to improve the performance of the TD-LVFA agent by tuning the reward function.
This involves adjusting the different weight coefficients $w_i$ to make the reward better reflect strategic depth of checkers, and to discover what aspects matter strategically through the lens of the reward function.
Current Reward Function
As defined in the documentation the reward is computed as follows
$$R = W + w_0 p + w_1 t + w_2 c_m + w_3 d + w_4 b + w_5 c_c + w_6 c_{kc}$$
where:
| Symbol |
Meaning |
Description |
| $W$ |
Win/Loss/Draw reward |
+250 for win, -250 for loss, 0 for draw |
| $p$ |
Pawn advantage |
Difference in pawn count |
| $t$ |
Threatened pawns |
Pawns threatened by the opponent |
| $c_m$ |
Captures available |
Number of captures that can be made |
| $d$ |
Diagonal pairs |
Number of diagonally aligned pawns |
| $b$ |
Backrow bridge control |
Whether the backrow is controlled |
| $c_c$ |
Central control (pawns) |
Pawns controlling central tiles |
| $c_{kc}$ |
Central control (kings) |
Kings controlling central tiles |
The weights $w_i$ can be adjusted to modify the reward function based on strategic importance (Some components would have a greater impact because they occur more frequently).
The reward is based on the feature representation defined on the Neto, H.C., Julia, R.M.S., Caexeta, G.S. et al. paper [1].
Where to Modify in Code
You can experiment by modifying the weights or change the form of the reward to optimize the agent's performance.
Reward shaping evaluation
You can compare performance using the benchmark.ipynb notebook.
This allows you to benchmark the tuned TD(λ) agent against:
- a random agent,
- a TD(λ) agent,
- or a MCTS agent.
Goal
The goal of this issue is to improve the performance of the TD-LVFA agent by tuning the reward function.$w_i$ to make the reward better reflect strategic depth of checkers, and to discover what aspects matter strategically through the lens of the reward function.
This involves adjusting the different weight coefficients
Current Reward Function
As defined in the documentation the reward is computed as follows
where:
The weights$w_i$ can be adjusted to modify the reward function based on strategic importance (Some components would have a greater impact because they occur more frequently).
The reward is based on the feature representation defined on the Neto, H.C., Julia, R.M.S., Caexeta, G.S. et al. paper [1].
Where to Modify in Code
compute_intermediaite_rewardsfunction.stepfunction.You can experiment by modifying the weights or change the form of the reward to optimize the agent's performance.
Reward shaping evaluation
You can compare performance using the
benchmark.ipynbnotebook.This allows you to benchmark the tuned TD(λ) agent against: