Distributions¶
Distributions are used to select randomized actions during sampling, and for some algorithms to compute likelihood and related values for training. Typically, the distribution is owned by the agent. This page documents the implemented distributions and some methods–see the code for details.

class
rlpyt.distributions.base.
Distribution
¶ Base distribution class. Not all subclasses will impelement all methods.

sample
(dist_info)¶ Generate random sample(s) from distribution informations.

kl
(old_dist_info, new_dist_info)¶ Compute the KL divergence of two distributions at each datum; should maintain leading dimensions (e.g. [T,B]).

mean_kl
(old_dist_info, new_dist_info, valid)¶ Compute the mean KL divergence over a data batch, possible ignoring data marked as invalid.

log_likelihood
(x, dist_info)¶ Compute loglikelihood of samples
x
at distributions described indist_info
(i.e. can have same leading dimensions [T, B]).

likelihood_ratio
(x, old_dist_info, new_dist_info)¶ Compute likelihood ratio of samples
x
at new distributions over old distributions (usuallynew_dist_info
is variable for differentiation); should maintain leading dimensions.

entropy
(dist_info)¶ Compute entropy of distributions contained in
dist_info
; should maintain any leading dimensions.

perplexity
(dist_info)¶ Exponential of the entropy, maybe useful for logging.

mean_entropy
(dist_info, valid=None)¶ In case some sophisticated mean is needed (e.g. internally ignoring select parts of action space), can override.

mean_perplexity
(dist_info, valid=None)¶ Exponential of the entropy, maybe useful for logging.


class
rlpyt.distributions.discrete.
DiscreteMixin
(dim, dtype=<sphinx.ext.autodoc.importer._MockObject object>, onehot_dtype=<sphinx.ext.autodoc.importer._MockObject object>)¶ Conversions to and from onehot.

to_onehot
(indexes, dtype=None)¶ Convert from integer indexes to onehot, preserving leading dimensions.

from_onehot
(onehot, dtype=None)¶ Convert from onehot to integer indexes, preserving leading dimensions.


class
rlpyt.distributions.categorical.
Categorical
(dim, dtype=<sphinx.ext.autodoc.importer._MockObject object>, onehot_dtype=<sphinx.ext.autodoc.importer._MockObject object>)¶ Bases:
rlpyt.distributions.discrete.DiscreteMixin
,rlpyt.distributions.base.Distribution
Multinomial distribution over a discrete domain.

sample
(dist_info)¶ Sample from
torch.multiomial
over trailing dimension ofdist_info.prob
.


class
rlpyt.distributions.epsilon_greedy.
EpsilonGreedy
(epsilon=1, **kwargs)¶ Bases:
rlpyt.distributions.discrete.DiscreteMixin
,rlpyt.distributions.base.Distribution
For epsilongreedy exploration from stateaction Qvalues.

sample
(q)¶ Input can be shaped [T,B,Q] or [B,Q], and vector epsilon of length B will apply across the Batch dimension (same epsilon for all T).

set_epsilon
(epsilon)¶ Assign value for epsilon (can be vector).


class
rlpyt.distributions.epsilon_greedy.
CategoricalEpsilonGreedy
(z=None, **kwargs)¶ Bases:
rlpyt.distributions.epsilon_greedy.EpsilonGreedy
For epsilongreedy exploration from distributional (categorical) representation of stateaction Qvalues.

sample
(p, z=None)¶ Input p to be shaped [T,B,A,P] or [B,A,P], A: number of actions, P: number of atoms. Optional input z is domain of atomvalues, shaped [P]. Vector epsilon of lenght B will apply across Batch dimension.

set_z
(z)¶ Assign vector of bin locations, distributional domain.


class
rlpyt.distributions.gaussian.
Gaussian
(dim, std=None, clip=None, noise_clip=None, min_std=None, max_std=None, squash=None)¶ Multivariate Gaussian with independent variables (diagonal covariance). Standard deviation can be provided, as scalar or value per dimension, or it will be drawn from the dist_info (possibly learnable), where it is expected to have a value per each dimension. Noise clipping or sample clipping optional during sampling, but not accounted for in formulas (e.g. entropy). Clipping of standard deviation optional and accounted in formulas. Squashing of samples to squash * tanh(sample) is optional and accounted for in log_likelihood formula but not entropy.

entropy
(dist_info)¶ Uses
self.std
unless that is None, then will get log_std from dist_info. Not implemented for squashing.

log_likelihood
(x, dist_info)¶ Uses
self.std
unless that is None, then uses log_std from dist_info. When squashing: instead of numerically risky arctanh, assume param ‘x’ is presquash action, seesample_loglikelihood()
below.

sample_loglikelihood
(dist_info)¶ Special method for use with SAC algorithm, which returns a new sampled action and its loglikelihood for training use. Temporarily turns OFF squashing, so that log_likelihood can be computed on nonsquashed sample, and then restores squashing and applies it to the sample before output.

sample
(dist_info)¶ Generate random samples using
torch.normal
, fromdist_info.mean
. Usesself.std
unless it isNone
, then usesdist_info.log_std
.

set_clip
(clip)¶ Input value or
None
to turn OFF.

set_squash
(squash)¶ Input multiplicative factor for
squash * tanh(sample)
(usually will be 1), orNone
to turn OFF.

set_noise_clip
(noise_clip)¶ Input value or
None
to turn OFF.

set_std
(std)¶ Input value, which can be same shape as action space, or else broadcastable up to that shape, or
None
to turn OFF and usedist_info.log_std
in other methods.
