Hi! Thanks for your work on reimplementing dreamverv1 in a simple way.
I tried to learn the computation process of dreamerv1, but feel confused about the logistics of the compute_lambda_values function:
|
last_values = torch.clone(last_values) |
|
last_lambda_values = 0 |
|
lambda_targets = [] |
|
for step in reversed(range(horizon - 1)): |
|
if step == horizon - 2: |
|
next_values = last_values |
|
else: |
|
next_values = values[step + 1] * (1 - lmbda) |
|
delta = rewards[step] + next_values * done_mask[step] |
|
last_lambda_values = delta + lmbda * done_mask[step] * last_lambda_values |
|
lambda_targets.append(last_lambda_values) |
|
return torch.stack(list(reversed(lambda_targets)), dim=0) |
- Does the above snippet refer to Eq.6 in the original paper? i.e.,
$$V_\lambda(s_\lambda) = (1- \lambda) \sum_{n-1}^{H-1} \lambda ^{n-1} V_N^n(s_\lambda) + \lambda ^{H-1} V_N^H(s_\lambda)$$
I could not find anything in common between them.
- If so, what does
delta mean? Is delta TD target?
I'm new to the Dreamer series. Please forgive me if my question looks dumb to you. Thanks.
Hi! Thanks for your work on reimplementing
dreamverv1in a simple way.I tried to learn the computation process of
dreamerv1, but feel confused about the logistics of thecompute_lambda_valuesfunction:sheeprl/sheeprl/algos/dreamer_v1/utils.py
Lines 66 to 77 in dee8c80
I could not find anything in common between them.
deltamean? IsdeltaTD target?I'm new to the Dreamer series. Please forgive me if my question looks dumb to you. Thanks.