Core Types

Core Types

ActionInfo

class rlcoach.core_types.ActionInfo(_action: Union[int, float, numpy.ndarray, List], all_action_probabilities: float = 0, action_value: float = 0.0, state_value: float = 0.0, max_action_value: float = None)[source]
Action info is a class that holds an action and various additional information details about it
- Parameters
- - action – the action
  - all_action_probabilities – the probability that the action was given when selecting it
  - action_value – the state-action value (Q value) of the action
  - state_value – the state value (V value) of the state where the action was taken
  - max_action_value – in case this is an action that was selected randomly, this is the value of the actionthat received the maximum value. if no value is given, the action is assumed to be theaction with the maximum value

Batch

class rlcoach.core_types.Batch(_transitions: List[rl_coach.core_types.Transition])[source]
A wrapper around a list of transitions that helps extracting batches of parameters from it.For example, one can extract a list of states corresponding to the list of transitions.The class uses lazy evaluation in order to return each of the available parameters.
- Parameters
- transitions – a list of transitions to extract the batch from
- actions(expand_dims=False) → numpy.ndarray[source]
- if the actions were not converted to a batch before, extract them to a batch and then return the batch
  - Parameters
  - expand_dims – add an extra dimension to the actions batch
  - Returns
  - a numpy array containing all the actions of the batch
- gameovers(_expand_dims=False) → numpy.ndarray[source]
- if the game_overs were not converted to a batch before, extract them to a batch and then return the batch
  - Parameters
  - expand_dims – add an extra dimension to the game_overs batch
  - Returns
  - a numpy array containing all the game over flags of the batch
- goals(expand_dims=False) → numpy.ndarray[source]
- if the goals were not converted to a batch before, extract them to a batch and then return the batchif the goal was not filled, this will raise an exception
  - Parameters
  - expand_dims – add an extra dimension to the goals batch
  - Returns
  - a numpy array containing all the goals of the batch
- info(key, expand_dims=False) → numpy.ndarray[source]
- if the given info dictionary key was not converted to a batch before, extract it to a batch and then return thebatch. if the key is not part of the keys in the info dictionary, this will raise an exception
  - Parameters
  - expand_dims – add an extra dimension to the info batch
  - Returns
  - a numpy array containing all the info values of the batch corresponding to the given key
- infoas_list(_key) → list[source]
- get the info and store it internally as a list, if wasn’t stored before. return it as a list:param expand_dims: add an extra dimension to the info batch:return: a list containing all the info values of the batch corresponding to the given key
- nstep_discounted_rewards(_expand_dims=False) → numpy.ndarray[source]
- - if the n_step_discounted_rewards were not converted to a batch before, extract them to a batch and then return
  - the batch

if the n step discounted rewards were not filled, this will raise an exception:param expand_dims: add an extra dimension to the total_returns batch:return: a numpy array containing all the total return values of the batch

nextstates(_fetches: List[str], expand_dims=False) → Dict[str, numpy.ndarray][source]
follow the keys in fetches to extract the corresponding items from the next states in the batchif these keys were not already extracted before. return only the values corresponding to those keys
- Parameters
- - fetches – the keys of the state dictionary to extract
  - expand_dims – add an extra dimension to each of the value batches
- Returns
- a dictionary containing a batch of values correponding to each of the given fetches keys
rewards(expand_dims=False) → numpy.ndarray[source]
if the rewards were not converted to a batch before, extract them to a batch and then return the batch
- Parameters
- expand_dims – add an extra dimension to the rewards batch
- Returns
- a numpy array containing all the rewards of the batch
shuffle() → None[source]
Shuffle all the transitions in the batch
- Returns
- None
property size
- Returns
- the size of the batch
slice(start, end) → None[source]
Keep a slice from the batch and discard the rest of the batch
- Parameters
- - start – the start index in the slice
  - end – the end index in the slice
- Returns
- None
states(fetches: List[str], expand_dims=False) → Dict[str, numpy.ndarray][source]
follow the keys in fetches to extract the corresponding items from the states in the batchif these keys were not already extracted before. return only the values corresponding to those keys
- Parameters
- - fetches – the keys of the state dictionary to extract
  - expand_dims – add an extra dimension to each of the value batches
- Returns
- a dictionary containing a batch of values correponding to each of the given fetches keys

EnvResponse

class rlcoach.core_types.EnvResponse(_next_state: Dict[str, numpy.ndarray], reward: Union[int, float, numpy.ndarray], game_over: bool, info: Dict = None, goal: numpy.ndarray = None)[source]
An env response is a collection containing the information returning from the environment after a single actionhas been performed on it.
- Parameters
- - next_state – The new state that the environment has transitioned into. Assumed to be a dictionary where theobservation is located at state[‘observation’]
  - reward – The reward received from the environment
  - game_over – A boolean which should be True if the episode terminated afterthe execution of the action.
  - info – any additional info from the environment
  - goal – a goal defined by the environment

Episode

class rlcoach.core_types.Episode(_discount: float = 0.99, bootstrap_total_return_from_old_policy: bool = False, n_step: int = -1)[source]
An Episode represents a set of sequential transitions, that end with a terminal state.
- Parameters
- - discount – the discount factor to use when calculating total returns
  - bootstrap_total_return_from_old_policy – should the total return be bootstrapped from the values in thememory
  - n_step – the number of future steps to sum the reward over before bootstrapping
- get_first_transition() → rl_coach.core_types.Transition[source]
- Get the first transition in the episode, or None if there are no transitions available
  - Returns
  - The first transition in the episode
- get_last_transition() → rl_coach.core_types.Transition[source]
- Get the last transition in the episode, or None if there are no transition available
  - Returns
  - The last transition in the episode
- gettransition(_transition_idx: int) → rl_coach.core_types.Transition[source]
- Get a specific transition by its index.
  - Parameters
  - transition_idx – The index of the transition to get
  - Returns
  - The transition which is stored in the given index
- gettransitions_attribute(_attribute_name: str) → List[Any][source]
- Get the values for some transition attribute from all the transitions in the episode.For example, this allows getting the rewards for all the transitions as a list by callingget_transitions_attribute(‘reward’)
  - Parameters
  - attribute_name – The name of the attribute to extract from all the transitions
  - Returns
  - A list of values from all the transitions according to the attribute given in attribute_name
- insert(transition: rl_coach.core_types.Transition) → None[source]
- Insert a new transition to the episode. If the game_over flag in the transition is set to True,the episode will be marked as complete.
  - Parameters
  - transition – The new transition to insert to the episode
  - Returns
  - None
- is_empty() → bool[source]
- Check if the episode is empty
  - Returns
  - A boolean value determining if the episode is empty or not
- length() → int[source]
- Return the length of the episode, which is the number of transitions it holds.
  - Returns
  - The number of transitions in the episode
- update_discounted_rewards()[source]
- Update the discounted returns for all the transitions in the episode.The returns will be calculated according to the rewards of each transition, together with the number of stepsto bootstrap from and the discount factor, as defined by n_step and discount respectively when initializingthe episode.
  - Returns
  - None

Transition

class rlcoach.core_types.Transition(_state: Dict[str, numpy.ndarray] = None, action: Union[int, float, numpy.ndarray, List] = None, reward: Union[int, float, numpy.ndarray] = None, next_state: Dict[str, numpy.ndarray] = None, game_over: bool = None, info: Dict = None)[source]
A transition is a tuple containing the information of a single step of interactionbetween the agent and the environment. The most basic version should contain the following values:(current state, action, reward, next state, game over)For imitation learning algorithms, if the reward, next state or game over is not known,it is sufficient to store the current state and action taken by the expert.
- Parameters
- - state – The current state. Assumed to be a dictionary where the observationis located at state[‘observation’]
  - action – The current action that was taken
  - reward – The reward received from the environment
  - next_state – The next state of the environment after applying the action.The next state should be similar to the state in its structure.
  - game_over – A boolean which should be True if the episode terminated afterthe execution of the action.
  - info – A dictionary containing any additional information to be stored in the transition