diff --git a/units/en/unitbonus3/offline-online.mdx b/units/en/unitbonus3/offline-online.mdx index be6fa374..3a36aabc 100644 --- a/units/en/unitbonus3/offline-online.mdx +++ b/units/en/unitbonus3/offline-online.mdx @@ -25,6 +25,10 @@ This method has one drawback: the *counterfactual queries problem*. What do we d There exist some solutions on this topic, but if you want to know more about offline reinforcement learning, you can [watch this video](https://www.youtube.com/watch?v=k08N5a0gG0A) +## Offline RL is Not the Same as Off-Policy RL + +Offline reinforcement learning (offline RL) and off-policy reinforcement learning (off-policy RL) are often confused but are distinct methods with different objectives and constraints. Both can use data generated by other policies, but their training scenarios, data interaction capabilities, and challenges differ significantly.Off-policy reinforcement learning allows an agent to learn a policy by using experience collected by another policy (known as the behavior policy). The agent can gather new experiences by interacting with the environment while still using past experiences for training.Offline RL, also known as batch RL, involves training a policy using a fixed dataset collected from previous interactions. The agent cannot interact with the environment during training. + ## Further reading For more information, we recommend you check out the following resources: