Uncertainty
Binary Numbers
A decimal number: ![]()
A binary number: ![]()
Example


![]()
In general, to represent
(k>0)
in binary requires a
binary digits,
where
denotes the
least integer that is larger than or equal to x.
Example:
,
,
A box of balls:

A ball is randomly drawn. What is the color of the ball?
Black of course!
Before viewing the ball:
bit (nary
digi) uncertainty.
After viewing the ball:
bit information.
A box of balls:

A ball is randomly drawn. What is the color of the ball?
Before viewing the ball:
bits
uncertainty.
The ball is !
After viewing the ball:
bits
information.
![]()
where N is the number of colors.
A box of balls:

A ball is randomly drawn. What is the color of the ball?
Before viewing the ball:
bits uncertainty?
The ball is almost always
!
After viewing the ball:
We get less information because we are almost certain the ball will be black beforehand.
We define the amount of information gained after observing the event
, which
occurs with probability
, as
![]()
The choice of b determines the unit of the information.
When b=2, the unit is bit. We will use
for
from now
on.
Recall the definition of expectation for a function.
H(X) can also be expressed as
follows
![]()
Example
Let C denote a random variable C in Demo 1 whose probability
distribution is
given by the table below.
Compute the uncertainty of C.



The average length of the binary code:
![]()
Example
Let C denote a random variable C in Demo 2 whose probability
distribution is
given by the table below.
Compute the uncertainty of C.



The average length of the binary code:
![]()
Example
Suppose that X has only two possible values
and
and that
so that
. Then the uncertainty of X in bits, provided that
0 < p < 1, is
![]()
is called the binary entropy
function.
Entropy of a source

H(U) is then the entropy of the d.m.s., which is a measure of average information contend per source symbol.
Uncertainty of a vector-valued r.v.
Because there is no mathematical distinction between discrete random
variables and discrete vectors (we might have
, the uncertainty of
a random vector is denoted as
and according to the definition
of uncertainty
![]()
(H(X,Y) is also used in the literature. We will always use the former notation.)
If we adopt the convention
when p=0, equations
(4) and (2) can be rewritten as
![]()
and
![]()
IT-Inequality
![]()
with equality if and only if x=1.
Conditional uncertainty given an event
Very often we shall be interested in
the behavior of one random variable when another
random variable is specified.
The following definition is the natural
generalization of uncertainty to this
situation.

Conditional uncertainty given a r.v.


When X and Y are statistically independent, both inequalities above hold with equality. This proves the theorem.
Notice that
it follows from Theorem 5
and 6
that, when X has L possible values,
![]()
with equality if and only both X and Y are statistically
independent and
for all x.
It follows trivially from Theorem 5
and the definition of H(X|Y)
![]()
with equality if and only if for every y such that
there is an x such that
. In other words,
H(X|Y)=0 when and only when ``the value of Y essentially
determines the value of X''.
Note that ![]()
Coding a Digital Information Source
- to efficiently
represent data generated by a discrete source
Figure 4: A variable-length coding scheme.
We take the smallness of
,
the average code-word length, as the measure of
goodness of the coding scheme.
If
is the
code-word for
and
is the length of this code-word,
then
![]()


Table 1: Illustrating the definition of a prefix-free
code: Code II is a prefix-free code.

Efficiency and redundancy
Since the entropy H(U) represents a fundamental limit
on the average number of binary digits necessary to
represent a discrete memoryless source, it is
appropriate to define the coding efficiency of
the code as
![]()
then represent the redundancy of the code.
Source Coding Algorithms
Extension of a d.m.s.
Sometimes it is useful to consider blocks rather than
individual source symbols of a d.m.s. We may view each such
block as a super-symbol being produced by an extended source
with source symbols of the form
.
Figure 5: Extension of a d.m.s.:
.
Accordingly (refer to Theorem 6),

Huffman coding (binary)
Example


Figure 6: An example of the Huffman encoding algorithm.


Huffman coding applied to an extended d.m.s.

Let
and we have
.
Figure 7: Another example of the Huffman encoding algorithm.

If we apply Huffman encoding algorithm to the original source

Shannon-Fano coding
Shannon-Fano coding.

Lempel-Ziv coding
A sequence emitted by the source
is parsed into segments according some rules.
Each segment is then represented by an integer
and a binary digit. The integer is determined
by the position of a relevant prefix of the
segment in an ordered list of binary sequences,
which is build adaptively.
Let
denotes
.
Lempel-Ziv coding.
![]()

Channel Capacity
Discrete memoryless channels
A discrete memoryless channel (DMC or d.m.c)
is a statistical model
which describes the medium through which the
symbols flow to the receiver. The channel is
discrete when the input alphabets
and the output
alphabets
are finite, i.e., J and K are positive integers.
It is memoryless when the
current output depends on only the current input,
i.e.
![]()
where
is the i-th input
and
the j-th
output.
A DMC is specified by the complete set of
conditional (transition)
probabilities
for all
and
,
which is commonly shown as a diagram like in
Fig 8.
Figure 8: Discrete memoryless channel. (a) A general diagram. (b)
A binary symmetric channel (BSC).
The channel matrix

where
for
and
,
is also a convenient way to describe a channel.
Note that each and every row must add to
one which implies that there is no all-zero row in
a channel matrix.
Example The channel matrix for the BSC in (b) of
Fig. 8:
![]()
:
When properly rearranged, all rows in the
channel matrix are the same. In other
words,
every row is a permutation of another
row.
:
When properly rearranged, all columns (except
possible all-zero columns)
in the
channel matrix are the same, i.e., every
non-all-zero columns is a permutation of
another non-all-zero columns.
:
Both uniformly
dispersive and uniformly focusing.
Example A BSC is a strongly symmetric channel.
The binary erasure channel in Fig. 9
is uniformly dispersive but not uniformly
focusing.
Figure 9: A binary erasure channel (BEC).
Mutual information

Mutual information is a difference of two
uncertainties, which represents the uncertainty about
X that is resolved by observing Y.
Properties of mutual information
Figure 10: Illustrating the relations between I(X;Y), H(XY),
H(X), H(Y), H(Y|X) and H(X|Y).
Channel capacity of a DMC
In the case of a discrete memoryless channel, the mutual
information,
I(X;Y), between input X and output Y represents the the
uncertainty about the channel input that is resolved by
observing the channel output.
Notice that I(X;Y) depends on the input probability distribution (i.e., the probability distribution of X), which determines the output probability distribution (i.e., the probability distribution of Y).

It is straightforward to show that

Note that K is the number of non-all-zero columns.
Example Binary symmetric channel:
![]()
setsize#2ptxxxxxx
setsize#2ptsplain #1#2#3#1<17#1<20
#1<24#1<29
#1<34#1<41
#3
#1#2#3
@#125<@@25
setsize#2pt
@ pt
@@ pt
#3
where h(p) is
the binary entropy function.
Example The channel matrix:
Example The channel matrix:
Decomposition of a channel
A DMC is symmetric when the channel can be decomposed
into L strongly symmetric channels, whose output letters
are disjoint, with selection
probabilities of ,
, , i.e., the channel
matrix can be written as a sum of L channel matrices of
strongly symmetric channels with appropriate coefficients of
,
and
where
is a channel matrix for a
strongly symmetric DMC () and
where non-all-zero columns in the channel matrices
are disjoint.
It is fairly easy to test a DMC for symmetry and
decompose it into strongly symmetric channels.
Note that a symmetric DMC is necessarily uniformly dispersive.
Example Channel capacity of a BEC.
Figure 11: Decomposition of a BEC into two strongly symmetric DMCs
with appropriate selection probabilities.
Special channels
Each column in the channel can have at most one
non-zero element.
I(X;Y) = H(X) (i.e., no source
entropy is lost in the transmission)
when
``Y essentially
determines X''.
Each row in the channel matrix has one
non-zero element.
I(X;Y)= H(Y) when
``X essentially
determines Y''.
Information rate and channel capacity per unit time
The information rate R of the source
where is the rate at which the source emits symbols
and H(U) is the source entropy.
The channel capacity per unit time
where is the transmission rate of the channel.
Channel Coding Theorem
Shannon-Hartley law
The channel capacity of a continuous
band-limited power-limited
additive white Gaussian noise (AWGN) channel
where S/N is the signal-to-noise ratio (SNR)
at the channel output.
If the channel bandwidth B is fixed, then the channel output signal is also band-limited and is characterized by sampling values at the Nyquist rate 2B.
If we use the channel at a rate ,
the channel capacity per unit time
where is the noise power spectral density of the AWGN
channel.
Trade-Offs between S/N Ratio and Bandwidth
Figure 12: Normalized channel capacity versus channel SNR.
Fundamental limit of a communication system
Define the spectral rate
Since
the bit energy
, defined as the per-bit share of
all energy in the message,
and
we have
Therefore
or
For the infinite bandwidth, the limiting value
is called the Shannon limit.
is plotted in the following.
Figure 13: Bandwidth-efficiency diagram.
Assume that in a BPSK system equals 0.1 (-10 dB), which
results for a BSC.
According to the channel coding theorem,
Suppose H(U) = 1 bit/sym. It follows that
If we use the channel n times for k bits of the message from
the source and ,
we would be able to build a reliable system.
We have then
Therefore