RL Grammar Constrained

In this repository we suggest an implementation for agents who use Context Free Grammar rules in their learning process. We use the stable baselines implementations of DQN and PPO2 agents with the following small change:

The agent cannot choose actions that will violate grammar rules.

This approach can be used in order to give the agent prior knowledge of temporal structures that solves the environment he is trying to learn.

Files in the repository

stable_baselines_master - Cloned version of Stable-Baselines with changes to DQN and PPO2 algorithms
cyk_prefix_parser - Implementations of cyk algorithm, cyk algorithm with prefix check variation, CFG to CNF converter (see README file in folder for the original sources and more details)
gym_random_rooms - Environment based on OpanAI Gym API (see README file in folder for more details).
action_filters.py - Example filters of actions to be used as part of our algorithm (AllPassFilter, GrammarFilter)
merge_runs.py - Plotting mean and std of several runs from tensorboard logs (after interpolation)
main.py - Example main program to run
grammar.txt, grammar_cnf.txt - Grammar example file and the result of cfg2cnf from it.

Requirements

Tensorflow versions from 1.8.0 to 1.14.0
Python3 (>=3.5)

Example

Create the grammar file according to rules defined on cyk_prefix_parser folder - grammar.txt this file will be used to define the patterns agent must follow
Install stable_baselines_master folder:

pip install -e stable_baselines_master

Run the following main code:

import os
import matplotlib.pyplot as plt
import gym
from stable_baselines import PPO2, DQN
from action_filters import AllPassFilter, GrammarFilter


if __name__ == '__main__':
    time_steps = 200000
    env = gym.make(env_id)

    log_dir = "./log/"
    log_name = "env_id_GrammarHistory30_PPO2"
    env.reset()

    model = PPO2('CnnPolicy', env, tensorboard_log=log_dir, filter=GrammarFilter(history_size=30, negate_grammar=False, grammar_file="grammar.txt"))
    # model = PPO2('CnnPolicy', env, tensorboard_log=log_dir, ent_coef=0.05, filter=AllPassFilter())
    model.learn(total_timesteps=time_steps, tb_log_name=log_name)
    env.close()

Merge multiple results of different runs with the same log_name:

python merge_runs.py log res

res.png is the average plot of each run type.

Hyper-parameters

the original stable baselines DQN or PPO2 hyper-parameters.
history_size - The horizon of actions history that the grammar is checked against. History buffer is emptied whenever the episode ends or its size surpasses history_size.
negate_grammar - If False the grammar defines what pattern the agent should do, if True the grammar defines what the agent shouldn't do.
grammar_on_exploration (for DQN only) - if True the grammar rules will apply to exploration actions as well.

Results

These are the results of PPO2 and DQN algorithms on the Random Rooms environment:

DQN	PPO2

With both algorithms we can see a significant improvement in performance when grammar constraints are used.

Grammar PPO2 and regular PPO2 after training 1900 episodes respectively:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RL Grammar Constrained

Files in the repository

Requirements

Example

Hyper-parameters

Results

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
cyk_prefix_parser		cyk_prefix_parser
gym-random_rooms		gym-random_rooms
results		results
stable_baselines_master		stable_baselines_master
LICENSE		LICENSE
README.md		README.md
action_filters.py		action_filters.py
grammar.txt		grammar.txt
grammar_cnf.txt		grammar_cnf.txt
main.py		main.py
merge_runs.py		merge_runs.py

License

nkami/rl_grammar_constrained

Folders and files

Latest commit

History

Repository files navigation

RL Grammar Constrained

Files in the repository

Requirements

Example

Hyper-parameters

Results

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages