Differentiable Probability Distributions

keras_gym.proba_dists.CategoricalDist Differential implementation of a categorical distribution.
keras_gym.proba_dists.NormalDist Implementation of a normal distribution.
class keras_gym.proba_dists.CategoricalDist(logits, boltzmann_tau=0.2, name='categorical_dist', random_seed=None)[source]

Differential implementation of a categorical distribution.

Parameters:
logits : 2d Tensor, dtype: float, shape: [batch_size, num_categories]

A batch of logits \(z\in \mathbb{R}^n\) with \(n=\) num_categories.

boltzmann_tau : float, optional

The Boltzmann temperature that is used in generating near one-hot propensities in sample(). A smaller number means closer to deterministic, one-hot encoded samples. A larger number means better numerical stability. A good value for \(\tau\) is one that offers a good trade-off between these two desired properties.

name : str, optional

Name scope of the distribution.

random_seed : int, optional

To get reproducible results.

cross_entropy(self, other)[source]

Compute the cross-entropy of a probability distribution \(p_\text{other}\) relative to the current probablity distribution \(p_\text{self}\), symbolically:

\[\text{CE}[p_\text{self}, p_\text{other}]\ =\ -\sum p_\text{self}\,\log p_\text{other}\]
Parameters:
other : probability dist

The other probability dist must be of the same type as self.

Returns:
cross_entropy : 1d Tensor, shape: [batch_size]

The cross-entropy.

entropy(self)[source]

Compute the entropy of the probability distribution.

Parameters:
x : nd Tensor, shape: [batch_size, …]

A batch of specific variates.

Returns:
entropy : 1d Tensor, shape: [batch_size]

The entropy of the probability distribution.

kl_divergence(self, other)[source]

Compute the Kullback-Leibler divergence of a probability distribution \(p_\text{other}\) relative to the current probablity distribution \(p_\text{self}\), symbolically:

\[\text{KL}[p_\text{self}, p_\text{other}]\ =\ -\sum p_\text{self}\, \log\frac{p_\text{other}}{p_\text{self}}\]
Parameters:
other : probability dist

The other probability dist must be of the same type as self.

Returns:
kl_divergence : 1d Tensor, shape: [batch_size]

The KL-divergence.

log_proba(self, x)[source]

Compute the log-probability associated with specific variates.

Parameters:
x : nd Tensor, shape: [batch_size, …]

A batch of specific variates.

Returns:
log_proba : 1d Tensor, shape: [batch_size]

The log-probabilities.

sample(self)[source]

Sample from the probability distribution. In order to return a differentiable sample, this method uses the approach outlined in [ArXiv:1611.01144].

Returns:
sample : 2d array, shape: [batch_size, num_categories]

The sampled variates. The returned arrays are near one-hot encoded versions of deterministic variates.

class keras_gym.proba_dists.NormalDist(mu, logvar, name='normal_dist', random_seed=None)[source]

Implementation of a normal distribution.

Parameters:
mu : 1d Tensor, dtype: float, shape: [batch_size, n]

A batch of vectors of means \(\mu\in\mathbb{R}^n\).

logvar : 1d Tensor, dtype: float, shape: [batch_size, n]

A batch of vectors of log-variances \(\log(\sigma^2)\in\mathbb{R}^n\)

name : str, optional

Name scope of the distribution.

random_seed : int, optional

To get reproducible results.

cross_entropy(self, other)[source]

Compute the cross-entropy of a probability distribution \(p_\text{other}\) relative to the current probablity distribution \(p_\text{self}\), symbolically:

\[\text{CE}[p_\text{self}, p_\text{other}]\ =\ -\sum p_\text{self}\,\log p_\text{other}\]
Parameters:
other : probability dist

The other probability dist must be of the same type as self.

Returns:
cross_entropy : 1d Tensor, shape: [batch_size]

The cross-entropy.

entropy(self)[source]

Compute the entropy of the probability distribution.

Parameters:
x : nd Tensor, shape: [batch_size, …]

A batch of specific variates.

Returns:
entropy : 1d Tensor, shape: [batch_size]

The entropy of the probability distribution.

kl_divergence(self, other)[source]

Compute the Kullback-Leibler divergence of a probability distribution \(p_\text{other}\) relative to the current probablity distribution \(p_\text{self}\), symbolically:

\[\text{KL}[p_\text{self}, p_\text{other}]\ =\ -\sum p_\text{self}\, \log\frac{p_\text{other}}{p_\text{self}}\]
Parameters:
other : probability dist

The other probability dist must be of the same type as self.

Returns:
kl_divergence : 1d Tensor, shape: [batch_size]

The KL-divergence.

log_proba(self, x)[source]

Compute the log-probability associated with specific variates.

Parameters:
x : nd Tensor, shape: [batch_size, …]

A batch of specific variates.

Returns:
log_proba : 1d Tensor, shape: [batch_size]

The log-probabilities.

sample(self)[source]

Sample from the (multi) normal distribution.

Returns:
sample : 1d Tensor, shape: [batch_size, actions_ndim]

The sampled normally-distributed variates.