Bernoulli Distribution - Mean, Variance, Entropy
The Bernoulli distribution is a distribution of a single binary random variable.
Let $x \in \left\lbrace0,1\right\rbrace$ be a binary random variable. The probability distribution function (pdf) of $x$ can be parameterized as follows:
\begin{align}
p(x=1 \mid \theta) &= \theta \\
p(x=0 \mid \theta) &= 1 - \theta
\end{align}
where $0 \leq \theta \leq 1 $. This means that $x$ takes the value $1$ with probability $\theta$ and the value $0$ with probability $1-\theta$.
The distribution can also be written as: \begin{align} \mathrm{Bern}(x \mid \theta) = \theta^x (1-\theta)^{1-x} \end{align} Proposition 1 The Bernoulli distribution is normalized.
Proof. \begin{align} \sum_{x\, \in \,\left\lbrace0,1\right\rbrace} p(x \mid \theta) &= p(x=0 \mid \theta) + p(x=1 \mid \theta) \\ &= 1-\theta + \theta \\ &= 1 \end{align} □
Proposition 2 The mean of a Bernoulli distributed binary random variable $x$ is $\theta.$
Proof. \begin{align} \mathbb{E}[x] &= \sum_{x\, \in \,\left\lbrace0,1\right\rbrace} x\, p(x \mid \theta)\\ &= 0 \cdot p(x=0 \mid \theta) + 1 \cdot p(x=1 \mid \theta) \\ &= \theta \end{align} □
Proposition 3 The variance of a Bernoulli distributed binary random variable $x$ is $\theta\, (1-\theta).$
Proof. \begin{align} \mathrm{var}[x] &= \mathbb{E}[(x - \mathbb{E}[x])^2] \\ &= \mathbb{E}[(x - \theta)^2] \\ &= \sum_{x\, \in \,\left\lbrace0,1\right\rbrace} (x-\theta)^2 \, p(x \mid \theta) \\ &= \theta^2\, p(x=0 \mid \theta) + (1-\theta)^2\, p(x=1 \mid \theta) \\ &= \theta^2\, (1-\theta) + (1-\theta)^2\, \theta \\ &= (1-\theta)\, [\theta^2 + (1-\theta)\,\theta] \\ &= (1-\theta)\, (\theta^2 + \theta -\theta^2) \\ &= \theta \, (1-\theta) \end{align} □
Proposition 4 The entropy $\mathbf{H}[x]$ of a Bernoulli distributed binary random variable $x$ is given by : \begin{align} \mathbf{H}[x] = -\theta \, ln\, \theta - (1-\theta)\, ln \,(1-\theta). \end{align}
Proof. \begin{align} \mathbf{H}[x] &= -\sum_{x\, \in \,\left\lbrace0,1\right\rbrace} p(x \mid \theta)\, ln\, p(x \mid \theta)\\ &= - p(x = 0 \mid \theta)\, ln\, p(x = 0 \mid \theta) - p(x = 1 \mid \theta)\, ln\, p(x = 1 \mid \theta) \\ &= - (1-\theta)\, ln\,(1-\theta) - \theta\,ln\,\theta \end{align} □
The distribution can also be written as: \begin{align} \mathrm{Bern}(x \mid \theta) = \theta^x (1-\theta)^{1-x} \end{align} Proposition 1 The Bernoulli distribution is normalized.
Proof. \begin{align} \sum_{x\, \in \,\left\lbrace0,1\right\rbrace} p(x \mid \theta) &= p(x=0 \mid \theta) + p(x=1 \mid \theta) \\ &= 1-\theta + \theta \\ &= 1 \end{align} □
Proposition 2 The mean of a Bernoulli distributed binary random variable $x$ is $\theta.$
Proof. \begin{align} \mathbb{E}[x] &= \sum_{x\, \in \,\left\lbrace0,1\right\rbrace} x\, p(x \mid \theta)\\ &= 0 \cdot p(x=0 \mid \theta) + 1 \cdot p(x=1 \mid \theta) \\ &= \theta \end{align} □
Proposition 3 The variance of a Bernoulli distributed binary random variable $x$ is $\theta\, (1-\theta).$
Proof. \begin{align} \mathrm{var}[x] &= \mathbb{E}[(x - \mathbb{E}[x])^2] \\ &= \mathbb{E}[(x - \theta)^2] \\ &= \sum_{x\, \in \,\left\lbrace0,1\right\rbrace} (x-\theta)^2 \, p(x \mid \theta) \\ &= \theta^2\, p(x=0 \mid \theta) + (1-\theta)^2\, p(x=1 \mid \theta) \\ &= \theta^2\, (1-\theta) + (1-\theta)^2\, \theta \\ &= (1-\theta)\, [\theta^2 + (1-\theta)\,\theta] \\ &= (1-\theta)\, (\theta^2 + \theta -\theta^2) \\ &= \theta \, (1-\theta) \end{align} □
Proposition 4 The entropy $\mathbf{H}[x]$ of a Bernoulli distributed binary random variable $x$ is given by : \begin{align} \mathbf{H}[x] = -\theta \, ln\, \theta - (1-\theta)\, ln \,(1-\theta). \end{align}
Proof. \begin{align} \mathbf{H}[x] &= -\sum_{x\, \in \,\left\lbrace0,1\right\rbrace} p(x \mid \theta)\, ln\, p(x \mid \theta)\\ &= - p(x = 0 \mid \theta)\, ln\, p(x = 0 \mid \theta) - p(x = 1 \mid \theta)\, ln\, p(x = 1 \mid \theta) \\ &= - (1-\theta)\, ln\,(1-\theta) - \theta\,ln\,\theta \end{align} □