Rotation 1, Week 1: Getting the theory

This week, I could not derive all equations but I got the gist of their theory
by working out few of equation; all of which are well summarized in paper. The
paper is very readable for anyone who has taken a course in statistics,
information theory and system science. Statistics and information theory are
weak spots of mine. So I’d concentrate on them here to clear my own head.

Two key concepts are used in the paper are: Shanon entropy H(X) of a
random variable X with a probability mass function (pmf) p(x)
and mutual information between two random variables.

Shannon entropy H(X) is defined to be $-\Sigma p(x) \log_2 p(x)$.

If one uses log_2 then the H(X) is measured in bits; this is
what is often used to measure it. If we flip two coins then we can use 2 bits to
encode the outcome: 00 (HH), 01 (HT), 10 (TH), and 11(TT). Lets call it normal
encoding
. Let’s see what is the value of $H(X)$ in this case: it is - 4  * (\frac{1}{4} \log \frac{1}{4}) = 2. If $H(X)$ is always equal to number of
bits required for normal encoding, it would not be of great help. Usually
$H(X)$ is less than no of bits required to encode the outcome when probability
is not uniform. Intuitively, if some events occurs more often then it make sense
(intuitively) that we can encode the outcome using fewer bits.

Next concept is mutual information between two variables X_1 and
X_2. Mutual information between two variables is defined in terms of
entropy: I(X_1;X_2) = H(X_1) - H(X_1|X_2). It “measures the degree to
which knowledge of one variable reduces entropic uncertainty in another,
regardless how their outcome may correlate” (emphasis mine). The paper
establishes relation between I(X_1;X_2) and H(X).

Any upper bound on mutual information I(X_1;X_2) in a channel which
takes X_1 as input and emits $X_2$ as output, puts a upper bound on
channel capacity $C$. In a system with such a channel, there is a lower bound on
the mean-squared estimation error, $E(X_1 – \hat{X_1})^2$. The term $\hat{X_1}$
is an “estimator”; any arbitrary function of discrete signal time series, and
the $X_1$ dynamics at equilibrium is described by a stochastic differential
equation.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s