ramo.strategy package

Submodules

ramo.strategy.best_response module

ramo.strategy.best_response.calc_best_response(u, player, payoff_matrix, joint_strategy, epsilon=0, global_opt=False, init_strat=None)

Calculate a best response for a given player to a joint strategy.

Parameters:

u (callable) – The utility function for this player.
player (int) – The player to calculate expected returns for.
payoff_matrix (ndarray) – The payoff matrix for the given player.
joint_strategy (List[ndarray]) – A list of each player’s individual strategy.
epsilon (float, optional) – Tolerance parameter to calculate an epsilon best-response strategy. (Default value = 0)
global_opt (bool, optional) – Whether to use a global optimiser or a local one. (Default value = False)
init_strat (ndarray, optional) – The initial guess for the best response. (Default value = None)

Returns:

A best response strategy.

Return type:

ndarray

ramo.strategy.best_response.calc_expected_returns(player, payoff_matrix, joint_strategy)

Calculate the expected return for a player’s actions with a given joint strategy.

Parameters:

player (int) – The player to caculate expected returns for.
payoff_matrix (ndarray) – The payoff matrix for the given player.
joint_strategy (List[ndarray]) – A list of each player’s individual strategy.

Returns:

The expected returns for the given player’s actions.

Return type:

ndarray

ramo.strategy.best_response.calc_utility_from_joint_strat(u, player, payoff_matrix, joint_strategy)

Calculate the utility from a given joint strategy.

Parameters:

u (callable) – The utility function for this player.
player (int) – The player to calculate expected returns for.
payoff_matrix (ndarray) – The payoff matrix for the given player.
joint_strategy (List[ndarray]) – A list of each player’s individual strategy.

Returns:

The utility from the joint strategy for this player.

Return type:

float

ramo.strategy.best_response.objective(strategy, expected_returns, u)

The objective function in an MONFG under SER.

Implements the objective function for a player in an MONFG under SER. In a best-response calculation, players aim to maximise their utility.

Parameters:

strategy (ndarray) –
expected_returns (ndarray) – The expected returns given all other players’ strategies.
u (callable) – The utility function of this agent.

Returns:

The value on the objective with the provided arguments.

Return type:

float

ramo.strategy.best_response.optimise_policy(expected_returns, u, epsilon=0, global_opt=False, init_strat=None, guesses=1)

Optimise a policy given a utility function.

When setting global_opt=True, this will optimise the function using the SHGO algorithm. The algorithm is proven to converge to the global optimum for the general case where \(f(x)\) is non-continuous, non-convex and non-smooth, when using the default simplicial sampling method [1]. We currently use the non-default sobol sampling method as there is a bug in the default method and sobol has shown more reliable in practice.

When using a local optimiser, the function is only guaranteed to find a local optimum. By default it will use Sequential Least Squares Programming (SLSQP).

References

Parameters:

expected_returns (ndarray) – The expected returns from the player’s actions.
u (callable) – The player’s utility function.
epsilon (float, optional) – Allow epsilon approximate solutions. (Default value = 0)
global_opt (bool, optional) – Whether to use a global optimiser or a local one. We use the sampling method ‘sobol’ by default as we found better empirical results with it than with ‘simplicial’. The drawback is that simplicial has much better theoretical convergence guarantees. (Default value = False)
init_strat (ndarray, optional) – An initial guess for the optimal policy. (Default value = None)
guesses (int, optional) – The amount of starting guesses to try. (Default value = 1)

Returns:

Whether the optimisation was successful, the optimised strategy and utility from this strategy.

Return type:

Tuple[bool, ndarray, float]

ramo.strategy.operations module

ramo.strategy.operations.enumerate_supports(num_actions, min_size=1, max_size=None)

Enumerate the set of all supports, with a minimum and maximum size, for a number of actions.

Parameters:

num_actions (int) – The number of possible actions for a player.
min_size (int, optional) – The minimum size of each support. (Default = 1)
max_size (int, optional) – The maximum size of each support. (Default = None)

Returns:

A list of supports.

Return type:

List[Tuple[int]]

ramo.strategy.operations.expand_support(support, num_actions)

Return all possible supports which also include a base support.

Parameters:

support (ndarray) – A base support.
num_actions (int) – The number of actions available to choose from.

Returns:

A list of expanded supports.

Return type:

List[ndarray]

ramo.strategy.operations.expand_support_non_support(sup_non_sup)

Return all possible support-non support tuples which also include a base support.

Parameters:: sup_non_sup (Tuple[Tuple[int]]) – A base support-non support.
Returns:: A list of expanded support and non-supports.
Return type:: List[Tuple[Tuple[int]]]

ramo.strategy.operations.get_non_support(strat, tol=1e-15)

Get the actions which are not in the support.

Parameters:

strat (ndarray) – A strategy array.
tol (float, optional) – The tolerance to count action probabilities still as zero. (Default value = 1e-15)

Returns:

A support of the actions which were not in the input support.

Return type:

List[int]

ramo.strategy.operations.get_support(strat, tol=1e-15)

Get the actions which are in the support of a strategy.

Parameters:

strat (ndarray) – A strategy array.
tol – The tolerance to count action probabilities still as zero.

Returns:

An array of actions that are in the support.

Return type:

ndarray

ramo.strategy.operations.make_action_from_pure_strat(strat)

Make an action from a pure strategy.

Parameters:: strat (ndarray) – A strategy.
Returns:: The closest matching action to this strategy.
Return type:: int

ramo.strategy.operations.make_joint_strat(player_id, player_strat, opp_strat)

Make joint strategy from the opponent strategy and player strategy.

Parameters:

player_id (int) – The id of the player.
player_strat (ndarray) – The strategy of the player.
opp_strat (List[ndarray]) – A list of the strategies of all other players.

Returns:

A list of strategies with the player’s strategy at the correct index.

Return type:

List[ndarray]

ramo.strategy.operations.make_joint_strat_from_flat(flat_strat, player_actions)

Make a joint strategy from a flat joint strategy.

Parameters:

flat_strat (ndarray) – A joint strategy as a flat array.
player_actions (Tuple[int]) – A tuple with the number of actions per player.

Returns:

A list of individual strategies.

Return type:

List[ndarray]

ramo.strategy.operations.make_joint_strat_from_profile(joint_action, player_actions)

Makes a joint strategy from an action profile.

Parameters:

joint_action (ndarray) – An array with the action per player.
player_actions (Tuple[int]) – The number of actions per player.

Returns:

A list of strategies.

Return type:

List[ndarray]

ramo.strategy.operations.make_profile_from_pure_joint_strat(joint_strat)

Make an action profile from a pure joint strategy.

Parameters:: joint_strat (List[ndarray]) – A list of pure strategies.
Returns:: An action profile.
Return type:: ndarray

ramo.strategy.operations.make_strat_from_action(action, num_actions)

Turn an action into a strategy representation.

Parameters:

action (int) – An action.
num_actions (int) – The number of possible actions.

Returns:

A pure strategy as a numpy array.

Return type:

ndarray

ramo.strategy.operations.normalise_joint_strat(joint_strat)

Normalise all individual strategies in a joint strategy.

Parameters:: joint_strat (List[ndarray]) – A list of individual strategies.
Returns:: A joint strategy with each individual strategy normalised.
Return type:: List[ndarray]

ramo.strategy.operations.normalise_strat(strat)

Normalise a strategy to sum to one.

Parameters:: strat (ndarray) – A strategy array.
Returns:: The same strategy as a probability vector.
Return type:: ndarray

ramo.strategy.operations.supports_diff(support1, support2)

Take the difference of two supports, defined as support1 support2.

Parameters:

support1 (List[int]) – The first support.
support2 (List[int]) – The second support.

Returns:

A support of the actions which were not in the input support.

Return type:

List[int]

ramo.strategy.operations.totally_mixed_supports(player_actions)

Generate the totally mixed supports.

Note

Totally mixed supports are supports which assign a positive probability to each action. In this case, it means that a dictionary is returned with for every player a tuple of all their actions.

Parameters:: player_actions (Tuple[int]) – A tuple of actions in the support.
Returns:: Tuple[int]}: A dictionary of the totally mixed supports for each player.
Return type:: Dict{int

ramo.strategy.strategies module

ramo.strategy.strategies.softmax_strategy(theta)

Take a softmax over an array of parameters.

Parameters:: theta (ndarray) – An array of policy parameters.
Returns:: A probability distribution over actions as a policy.
Return type:: ndarray

ramo.strategy package

Submodules

ramo.strategy.best_response module

ramo.strategy.operations module

ramo.strategy.strategies module

Module contents