Environments

  • class rlcoach.environments.environment.Environment(_level: rl_coach.environments.environment.LevelSelection, seed: int, frame_skip: int, human_control: bool, custom_reward_threshold: Union[int, float], visualization_parameters: rl_coach.base_parameters.VisualizationParameters, target_success_rate: float = 1.0, **kwargs)[source]
    • Parameters
      • level – The environment level. Each environment can have multiple levels

      • seed – a seed for the random number generator of the environment

      • frame_skip – number of frames to skip (while repeating the same action) between each two agent directives

      • human_control – human should control the environment

      • visualization_parameters – a blob of parameters used for visualization of the environment

      • **kwargs

as the class is instantiated by EnvironmentParameters, this is used to support havingadditional arguments which will be ignored by this class, but might be used by others

  • property action_space
  • Get the action space of the environment

    • Returns
    • the action space
  • close() → None[source]

  • Clean up steps.

    • Returns
    • None
  • get_action_from_user() → Union[int, float, numpy.ndarray, List][source]

  • Get an action from the user keyboard

    • Returns
    • action index
  • get_available_keys() → List[Tuple[str, Union[int, float, numpy.ndarray, List]]][source]

  • Return a list of tuples mapping between action names and the keyboard key that triggers them

    • Returns
    • a list of tuples mapping between action names and the keyboard key that triggers them
  • get_goal() → Union[None, numpy.ndarray][source]

  • Get the current goal that the agents needs to achieve in the environment

    • Returns
    • The goal
  • get_random_action() → Union[int, float, numpy.ndarray, List][source]

  • Returns an action picked uniformly from the available actions

    • Returns
    • a numpy array with a random action
  • get_rendered_image() → numpy.ndarray[source]

  • Return a numpy array containing the image that will be rendered to the screen.This can be different from the observation. For example, mujoco’s observation is a measurements vector.

    • Returns
    • numpy array containing the image that will be rendered to the screen
  • property goal_space

  • Get the state space of the environment

    • Returns
    • the observation space
  • handle_episode_ended() → None[source]

  • End an episode

    • Returns
    • None
  • property last_env_response

  • Get the last environment response

    • Returns
    • a dictionary that contains the state, reward, etc.
  • property phase

  • Get the phase of the environment:return: the current phase

  • render() → None[source]

  • Call the environment function for rendering to the screen

    • Returns
    • None
  • resetinternal_state(_force_environment_reset=False) → rl_coach.core_types.EnvResponse[source]

  • Reset the environment and all the variable of the wrapper

    • Parameters
    • force_environment_reset – forces environment reset even when the game did not end

    • Returns

    • A dictionary containing the observation, reward, done flag, action and measurements
  • setgoal(_goal: Union[None, numpy.ndarray]) → None[source]

  • Set the current goal that the agent needs to achieve in the environment

    • Parameters
    • goal – the goal that needs to be achieved

    • Returns

    • None
  • property state_space

  • Get the state space of the environment

    • Returns
    • the observation space
  • step(action: Union[int, float, numpy.ndarray, List]) → rl_coach.core_types.EnvResponse[source]

  • Make a single step in the environment using the given action

    • Parameters
    • action – an action to use for stepping the environment. Should follow the definition of the action space.

    • Returns

    • the environment response as returned in get_last_env_response

DeepMind Control Suite

A set of reinforcement learning environments powered by the MuJoCo physics engine.

Website: DeepMind Control Suite

  • class rlcoach.environments.control_suite_environment.ControlSuiteEnvironment(_level: rl_coach.environments.environment.LevelSelection, frame_skip: int, visualization_parameters: rl_coach.base_parameters.VisualizationParameters, target_success_rate: float = 1.0, seed: Union[None, int] = None, human_control: bool = False, observation_type: rl_coach.environments.control_suite_environment.ObservationType = , custom_reward_threshold: Union[int, float] = None, **kwargs)[source]
    • Parameters
      • level – (str)A string representing the control suite level to run. This can also be a LevelSelection object.For example, cartpole:swingup.

      • frame_skip – (int)The number of frames to skip between any two actions given by the agent. The action will be repeatedfor all the skipped frames.

      • visualization_parameters – (VisualizationParameters)The parameters used for visualizing the environment, such as the render flag, storing videos etc.

      • target_success_rate – (float)Stop experiment if given target success rate was achieved.

      • seed – (int)A seed to use for the random number generator when running the environment.

      • human_control – (bool)A flag that allows controlling the environment using the keyboard keys.

      • observation_type – (ObservationType)An enum which defines which observation to use. The current options are to use: Measurements only - a vector of joint torques and similar measurements Image only - an image of the environment as seen by a camera attached to the simulator* Measurements & Image - both type of observations will be returned in the state using the keys‘measurements’ and ‘pixels’ respectively.

      • custom_reward_threshold – (float)Allows defining a custom reward that will be used to decide when the agent succeeded in passing the environment.

Blizzard Starcraft II

A popular strategy game which was wrapped with a python interface by DeepMind.

Website: Blizzard Starcraft II

  • class rlcoach.environments.starcraft2_environment.StarCraft2Environment(_level: rl_coach.environments.environment.LevelSelection, frame_skip: int, visualization_parameters: rl_coach.base_parameters.VisualizationParameters, target_success_rate: float = 1.0, seed: Union[None, int] = None, human_control: bool = False, custom_reward_threshold: Union[int, float] = None, screen_size: int = 84, minimap_size: int = 64, feature_minimap_maps_to_use: List = range(0, 7), feature_screen_maps_to_use: List = range(0, 17), observation_type: rl_coach.environments.starcraft2_environment.StarcraftObservationType = , disable_fog: bool = False, auto_select_all_army: bool = True, use_full_action_space: bool = False, **kwargs)[source]

ViZDoom

A Doom-based AI research platform for reinforcement learning from raw visual information.

Website: ViZDoom

  • class rlcoach.environments.doom_environment.DoomEnvironment(_level: rl_coach.environments.environment.LevelSelection, seed: int, frame_skip: int, human_control: bool, custom_reward_threshold: Union[int, float], visualization_parameters: rl_coach.base_parameters.VisualizationParameters, cameras: List[rl_coach.environments.doom_environment.DoomEnvironment.CameraTypes], target_success_rate: float = 1.0, **kwargs)[source]
    • Parameters
      • level – (str)A string representing the doom level to run. This can also be a LevelSelection object.This should be one of the levels defined in the DoomLevel enum. For example, HEALTH_GATHERING.

      • seed – (int)A seed to use for the random number generator when running the environment.

      • frame_skip – (int)The number of frames to skip between any two actions given by the agent. The action will be repeatedfor all the skipped frames.

      • human_control – (bool)A flag that allows controlling the environment using the keyboard keys.

      • custom_reward_threshold – (float)Allows defining a custom reward that will be used to decide when the agent succeeded in passing the environment.

      • visualization_parameters – (VisualizationParameters)The parameters used for visualizing the environment, such as the render flag, storing videos etc.

      • cameras

(List[CameraTypes])A list of camera types to use as observation in the state returned from the environment.Each camera should be an enum from CameraTypes, and there are several options like an RGB observation,a depth map, a segmentation map, and a top down map of the enviornment.

param target_success_rate

(float)Stop experiment if given target success rate was achieved.

CARLA

An open-source simulator for autonomous driving research.

Website: CARLA

  • class rlcoach.environments.carla_environment.CarlaEnvironment(_level: rl_coach.environments.environment.LevelSelection, seed: int, frame_skip: int, human_control: bool, custom_reward_threshold: Union[int, float], visualization_parameters: rl_coach.base_parameters.VisualizationParameters, server_height: int, server_width: int, camera_height: int, camera_width: int, verbose: bool, experiment_suite: carla.driving_benchmark.experiment_suites.experiment_suite.ExperimentSuite, config: str, episode_max_time: int, allow_braking: bool, quality: rl_coach.environments.carla_environment.CarlaEnvironmentParameters.Quality, cameras: List[rl_coach.environments.carla_environment.CameraTypes], weather_id: List[int], experiment_path: str, separate_actions_for_throttle_and_brake: bool, num_speedup_steps: int, max_speed: float, target_success_rate: float = 1.0, **kwargs)[source]

OpenAI Gym

A library which consists of a set of environments, from games to robotics.Additionally, it can be extended using the API defined by the authors.

Website: OpenAI Gym

In Coach, we support all the native environments in Gym, along with several extensions such as:

  • Roboschool - a set of environments powered by the PyBullet engine,that offer a free alternative to MuJoCo.

  • Gym Extensions - a set of environments that extends Gym forauxiliary tasks (multitask learning, transfer learning, inverse reinforcement learning, etc.)

  • PyBullet - a physics engine thatincludes a set of robotics environments.

  • class rlcoach.environments.gym_environment.GymEnvironment(_level: rl_coach.environments.environment.LevelSelection, frame_skip: int, visualization_parameters: rl_coach.base_parameters.VisualizationParameters, target_success_rate: float = 1.0, additional_simulator_parameters: Dict[str, Any] = {}, seed: Union[None, int] = None, human_control: bool = False, custom_reward_threshold: Union[int, float] = None, random_initialization_steps: int = 1, max_over_num_frames: int = 1, observation_space_type: rl_coach.environments.gym_environment.ObservationSpaceType = None, **kwargs)[source]

    • Parameters
      • level – (str)A string representing the gym level to run. This can also be a LevelSelection object.For example, BreakoutDeterministic-v0

      • frame_skip – (int)The number of frames to skip between any two actions given by the agent. The action will be repeatedfor all the skipped frames.

      • visualization_parameters – (VisualizationParameters)The parameters used for visualizing the environment, such as the render flag, storing videos etc.

      • additional_simulator_parameters – (Dict[str, Any])Any additional parameters that the user can pass to the Gym environment. These parameters should beaccepted by the init function of the implemented Gym environment.

      • seed – (int)A seed to use for the random number generator when running the environment.

      • human_control – (bool)A flag that allows controlling the environment using the keyboard keys.

      • custom_reward_threshold – (float)Allows defining a custom reward that will be used to decide when the agent succeeded in passing the environment.If not set, this value will be taken from the Gym environment definition.

      • random_initialization_steps – (int)The number of random steps that will be taken in the environment after each reset.This is a feature presented in the DQN paper, which improves the variability of the episodes the agent sees.

      • max_over_num_frames – (int)This value will be used for merging multiple frames into a single frame by taking the maximum value for eachof the pixels in the frame. This is particularly used in Atari games, where the frames flicker, and objectscan be seen in one frame but disappear in the next.

      • observation_space_type – This value will be used for generating observation space. Allows a custom space. Should be one ofObservationSpaceType. If not specified, observation space is inferred from the number of dimensionsof the observation: 1D: Vector space, 3D: Image space if 1 or 3 channels, PlanarMaps space otherwise.