Skip to content

Commit 025ae14

Browse files
committed
Update blog
1 parent 0a903cc commit 025ae14

File tree

1 file changed

+3
-1
lines changed

1 file changed

+3
-1
lines changed

source/_posts/2025/06/策略梯度算法中梯度公式的推导.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,9 @@ $$\max_{\theta\in\Theta}J(\theta)$$
3232

3333
$$
3434
\begin{aligned}
35-
\nabla_\theta J(\theta)=\nabla_\theta\mathbb{E}_{s\in S}[V_{\pi_\theta}(s)]&=\nabla_\theta\mathbb{E}_{s\in S}\mathbb{E}_{a_t\sim\pi_\theta(*|s)}[Q(s,a_t)]\\
35+
\nabla_\theta J(\theta)&=\nabla_\theta\mathbb{E}_{s\in S}[V_{\pi_\theta}(s)]\\
36+
37+
&=\nabla_\theta\mathbb{E}_{s\in S}\mathbb{E}_{a_t\sim\pi_\theta(*|s)}[Q(s,a_t)]\\
3638
3739
&=\mathbb{E}_{s\in S}\nabla_\theta\mathbb{E}_{a_t\sim\pi_\theta(*|s)}[Q(s,a_t)]\\
3840

0 commit comments

Comments
 (0)