From ed1723b3a0c07a08235014d62622787bf42ca330 Mon Sep 17 00:00:00 2001 From: prdeepakbabu Date: Sun, 9 Feb 2025 21:51:14 -0800 Subject: [PATCH] updated doc to clarify off-policy RL vs. offline RL --- units/en/unitbonus3/offline-online.mdx | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/units/en/unitbonus3/offline-online.mdx b/units/en/unitbonus3/offline-online.mdx index be6fa374..3a36aabc 100644 --- a/units/en/unitbonus3/offline-online.mdx +++ b/units/en/unitbonus3/offline-online.mdx @@ -25,6 +25,10 @@ This method has one drawback: the *counterfactual queries problem*. What do we d There exist some solutions on this topic, but if you want to know more about offline reinforcement learning, you can [watch this video](https://www.youtube.com/watch?v=k08N5a0gG0A) +## Offline RL is Not the Same as Off-Policy RL + +Offline reinforcement learning (offline RL) and off-policy reinforcement learning (off-policy RL) are often confused but are distinct methods with different objectives and constraints. Both can use data generated by other policies, but their training scenarios, data interaction capabilities, and challenges differ significantly.Off-policy reinforcement learning allows an agent to learn a policy by using experience collected by another policy (known as the behavior policy). The agent can gather new experiences by interacting with the environment while still using past experiences for training.Offline RL, also known as batch RL, involves training a policy using a fixed dataset collected from previous interactions. The agent cannot interact with the environment during training. + ## Further reading For more information, we recommend you check out the following resources: