Loss Functions

This is a collection of custom keras-compatible loss functions that are used throughout this package.


These functions generally require the Tensorflow backend.

Value Losses

These loss functions can be applied to learning a value function. Most of the losses are actually already provided by keras. The value-function losses included here are minor adaptations of the available keras losses.

Policy Losses

The way policy losses are implemented is slightly different from value losses due to their non-standard structure. A policy loss is implemented in a method on updateable policy objects (see below). If you need to implement a custom policy loss, you can override this policy_loss_with_metrics() method.

BaseUpdateablePolicy.policy_loss_with_metrics(self, Adv, A=None)[source]

This method constructs the policy loss as a scalar-valued Tensor, together with a dictionary of metrics (also scalars).

This method may be overridden to construct a custom policy loss and/or to change the accompanying metrics.

Adv : 1d Tensor, shape: [batch_size]

A batch of advantages.

A : nd Tensor, shape: [batch_size, …]

A batch of actions taken under the behavior policy. For some choices of policy loss, e.g. update_strategy='sac' this input is ignored.

loss, metrics : (Tensor, dict of Tensors)

The policy loss along with some metrics, which is a dict of type {name <str>: metric <Tensor>}. The loss and each of the metrics (dict values) are scalar Tensors, i.e. Tensors with ndim=0.

The loss is passed to a keras Model using train_model.add_loss(loss). Similarly, each metric in the metric dict is passed to the model using train_model.add_metric(metric, name=name, aggregation='mean').