Loss Functions¶
This is a collection of custom kerascompatible loss functions that are used throughout this package.
Note
These functions generally require the Tensorflow backend.
Value Losses¶
These loss functions can be applied to learning a value function. Most of the losses are actually already provided by keras. The valuefunction losses included here are minor adaptations of the available keras losses.
Policy Losses¶
The way policy losses are implemented is slightly different from value losses
due to their nonstandard structure. A policy loss is implemented in a method
on updateable policy objects (see below). If you need to implement a
custom policy loss, you can override this policy_loss_with_metrics()
method.

BaseUpdateablePolicy.
policy_loss_with_metrics
(self, Adv, A=None)[source]¶ This method constructs the policy loss as a scalarvalued Tensor, together with a dictionary of metrics (also scalars).
This method may be overridden to construct a custom policy loss and/or to change the accompanying metrics.
Parameters:  Adv : 1d Tensor, shape: [batch_size]
A batch of advantages.
 A : nd Tensor, shape: [batch_size, …]
A batch of actions taken under the behavior policy. For some choices of policy loss, e.g.
update_strategy='sac'
this input is ignored.
Returns:  loss, metrics : (Tensor, dict of Tensors)
The policy loss along with some metrics, which is a dict of type
{name <str>: metric <Tensor>}
. The loss and each of the metrics (dict values) are scalar Tensors, i.e. Tensors withndim=0
.The
loss
is passed to a keras Model usingtrain_model.add_loss(loss)
. Similarly, each metric in the metric dict is passed to the model usingtrain_model.add_metric(metric, name=name, aggregation='mean')
.