We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
1 parent 0a903cc commit 025ae14Copy full SHA for 025ae14
source/_posts/2025/06/策略梯度算法中梯度公式的推导.md
@@ -32,7 +32,9 @@ $$\max_{\theta\in\Theta}J(\theta)$$
32
33
$$
34
\begin{aligned}
35
-\nabla_\theta J(\theta)=\nabla_\theta\mathbb{E}_{s\in S}[V_{\pi_\theta}(s)]&=\nabla_\theta\mathbb{E}_{s\in S}\mathbb{E}_{a_t\sim\pi_\theta(*|s)}[Q(s,a_t)]\\
+\nabla_\theta J(\theta)&=\nabla_\theta\mathbb{E}_{s\in S}[V_{\pi_\theta}(s)]\\
36
+
37
+&=\nabla_\theta\mathbb{E}_{s\in S}\mathbb{E}_{a_t\sim\pi_\theta(*|s)}[Q(s,a_t)]\\
38
39
&=\mathbb{E}_{s\in S}\nabla_\theta\mathbb{E}_{a_t\sim\pi_\theta(*|s)}[Q(s,a_t)]\\
40
0 commit comments