[Documentation] [TitleIndex] [WordIndex


Please take a look at the tutorial on how to install, compile, and use this package.

Check out the code at: https://github.com/toddhester/rl-texplore-ros-pkg

This package provides a way of running reinforcement learning experiments with the agents from the rl_agent package and environments from the rl_env package without using the rl_msgs interface. Instead, the code instantiates agent and environment objects and calls their methods directly. It can be set to run for a particular number of episodes and trials, and prints out the sum of rewards for each episode to cerr.

Running an experiment

To run an RL experiment, you can call rl_experiment like so:

rosrun rl_experiment experiment --agent type --env type [options]

where the agent type is one of the following:

qlearner sarsa modelbased rmax texplore dyna savedpolicy

and the environment type is one of the following:

taxi tworooms fourrooms energy fuelworld mcar cartpole car2to7 car7to2 carrandom stocks lightworld

There are a number of options available to set parameters of both the agent and environment used. More details on the agent options are available in the rl_agent documentation, and more details on the env options are available in the rl_env documentation.

Env Options:

General Options:

In addition to these options, there a few variables that can be changed in the code, in the rl.cc file. Near the top of the file are two variables: MAXSTEPS and NUMTRIALS. MAXSTEPS determines the maximum number of steps for an episode. A new episode will be started after this many steps even if the agent has not reached a terminal state.


As an example, here is how you would run Q-Learning (Watkins 1989) on the stochastic Taxi task (Dietterich 1998):

rosrun rl_experiment experiment --agent qlearner --env taxi --stochastic

Or to run real-time TEXPLORE (Hester and Stone 2010, Hester et al 2012) at 10 Hz on the deterministic Fuel World task (Hester and Stone 2010) with 8 discrete trees:

rosrun rl_experiment experiment --agent texplore --nmodels 8 --planner parallel-uct --actrate 10 --env fuelworld --deterministic

While you should find that the qlearner, sarsa, dyna, and rmax agents work fine on the easier tasks (tworooms, taxi, etc), they will not converge within the default 1000 episodes on more complex tasks like Fuel World. As an example, here is how to run Q-Learning (Watkins 1989) on the Fuel World task (Hester and Stone 2010) using the --nepisodes flag to run it for 1,000,000 episodes, which should be enough time for it to converge.

rosrun rl_experiment experiment --agent qlearner --env fuelworld --nepisodes 1000000

Another problem you may run into is when running these methods on the continuous domains (mcar, cartpole, car2to7, car7to2, and carrrandom). For these domains, the tabular RL methods (Q-Learning, SARSA, Dyna, R-Max) will need the state to be discretized. The following command will run Q-Learning (Watkins 1989) on the Mountain Car task (Sutton and Barto 1998) while discretizing each of the state features into 10 discrete values using the --nstates option.

rosrun rl_experiment experiment --agent qlearner --env mcar --nstates 10


2024-07-20 13:26