Anno accademico di offerta: 
Responsabile della didattica: 
Settore scientifico disciplinare: 
Semestre dell'insegnamento: 
Primo Semestre
Lingua di insegnamento: 


Obiettivi formativi

L’obiettivo del corso è fornire allo studente la capacità di comprendere
ed applicare le regole di base della teoria dell'informazione, e in particolare:
- il significato "fisico" delle principali grandezze entropia e informazione mutua e le loro inter-relazioni
- il concetto di sequenze tipiche e il loro ruolo nella compressione dei dati
- il concetto di capacità di canale

Le capacità di applicare le conoscenze sopra elencate risultano
essere in particolare:
- progettare ed analizzare codici di compressione dei dati
- calcolare la capacità di un canale di trasmissione


Entry-level courses covering: Probability theory and stochastic processes; Fourier analysis in continuous and discrete time; Fourier analysis of linear time-invariant systems. A short guide to review background material can be found
here . Video lectures of the preparatory course held in September 2017 can be found here . To get prep-course userid and password, please send an email to the instructor.

Contenuti dell'insegnamento

1.1 Definition of basic information theory quantities: entropy, mutual information
1.2 Basic theorems, relations and inequalities
1.3 Sufficient statistics
2.1 Asymptotic equipartition property (AEP) and typical set
2.2 Entropy rate
2.3 Discrete-time Markov chains (DTMC)
2.4 AEP theorem for stationary ergodic sources
3.1 Data Compression: definitions, examples,
3.2 Kraft inequality, Noiseless coding theorem
3.3 Huffman and Lempel-Ziv codes
4.1 Channel capacity: definitions, examples
4.2 Typical-sequence dcoding. Jointly typical set and its properties.
4.2 Proof Channel Coding Theorem
4.3 Joint source-channel coding theorem
5.1 Differential entropy: definition, examples
6.1 Mutual information for discrete X and continuous Y: examples
6.2 Capacity of additive Gaussian channel, Shannon Capacity formula
6.3 Parallel Gaussian channels
6.4 Capacity of discrete-time additive Gaussian channel with memory.
6.5 Capacity of continuous-time additive Gaussian channel with memory. Water pouring.

Programma esteso

Syllabus (every class = 2 hours)

Course organization, objectives, textbooks, exam details. Sneaky preview of the course, motivations, applications. Assigned Reading of Ch.1 of textbook. Physical justification and definition of entropy. Examples of entropy calculation. Up to sec. 2.1.

Definition of joint and conditional entropy, eample 2.2.1. Relative entropy, mutual information and their relation. Chain rules for PMFs and entropy.

Relative conditional entropy, conditional mutual information, chain rules for D and I. Inequalities for D and I. max and min of H, H(X|Y)<=H(X) and generalizations. Convex functions. Jensen's inequality, examples.

first hour: logsum inequality, convexity of D, concavity of H. concavity of I in p(x) and convexity in p(y|x). Exercise: mixing increases entropy. second hour: Definition of Markov chain and first properties for 3 random variables (RV) X,Y,Z. Data processesng inequality. Counterexample.

first hour: sufficient statistics: definition in terms of mutual information. Examples: number of successes in repeated trials; sample mean in estimation of common mean in a vector of independent gaussian RVs. Sufficient statistics and hypothesis testing: factorization theorem. Second hour: Fano inequality. Exercise 2.32.

Exercises 2.5, 2.4, 2.27, 2.30 (after brief introduction to the method of Lagrange multipliers), 2.21.

Ch 3 asymptotic equipartition property (AEP): introduction. Probability theory refresher: i.p. convergence, Chebychev inequality, Weak law of large numbers, AEP. Typical set and properties. Example with binary sequences.

first hour: relation among Typical set and high-probability sets. Theorem 3.3.1. second hour: Problem solving: Exercises 3.8, 3.9. Ch 4: Entropy rates: introduction. Definition of discrete-time stochastic process and stationarity.

first hour: introduction to discrete-time Markov chains (DTMC): transition matrix, updatge law (Chapman-Kolmogorov), stationary distribution. Two-state example: state diagram, evolution towards limit distribution. Evaluation with flux balancing. second hour: Entropy rates H and H'. H=H' for stationary processes. Statement of AEP theorem for stationary ergodic sources (Shannon/Breiman/McMillan). Explicit evaluation of H for DTMC. Examples.

Doubly-stochastic matrices and uniform steady-state distribution. Connections with entropy as defined in statistical thermodynamics: DTMC on microstates with doubly-stochastic transition matrix. Entropy increases towards steady-state distribution entropy. Example 4 (eq 4.50-4.52). Hidden Markov models (HMM): entropy rate.

Problem solving: Ex. 4.1 mixing increases entropy. Conditions for observable Y in a HMM sian una DTMC. examples where Y is not a DTMC. Point a. of Ex. 4.18 on Entropy Rate of stationary but not ergodic process.

Problem solving: first hour: points b, c of Ex. 4.18. second hour: Es 4.10 entropy rate of a second order markov process: study of hidden markov chain. Ex. 4.6.

Ch 5: Data compression. Examples of codes. Kraft inequality. Search of optimal codes with Lagrange multipliers method. Noiseless coding theorem.

Comments on first Shannon Theorem: when p not dyadic. Quasi-optimal Shannon Codes. Shannon super-codes are asymptotically optimal. Extra cost on minimal code length when using a PMF that differs from true PMF. McMillan Theorem: every uniquely decodable theorem satisfies Kraft inequality. Introduction to Huffman codes: examples 1, 2.

Huffman codes: example 3 (dummy symbols), Exercise 5.32, example 5.73 (set of different optimal codelengths). Competitive optimality of Shannon code. Proof of optimality of Huffman code.

first hour: Optimal compression of Markov sources. Description of Lempel-Ziv algorithm for universal compression. second hour: Channel capacity: introduction, definition of discrete memoryless channel (DMC), examples of capacity computation: ideal channel, noisy channel with disjoint outputs, noisy typewriter.

Capacity of bynary symmetric channel (BSC), binary erasure channel (BEC). Symmetric, weakly-symmetric and Gallager-symmetric channels. Convexity of C on convex set of input PMFs. Hints to numerical techniques to evaluate max I.

Introduction to proof of II Shannon Theorem. Channel Coding, ideas on typical-sequence dcoding. Jointly typical set and its properties. Average and maximum error probability, achievable rate and operative channel capacity. Statement of II Shannon theorem.

first hour: proof of direct part of II Shannon theorem. second hour: proof of converse part of II Shannon theorem.

first hour: joint source-channel coding theorem. second hour: exercises on channel capacity: 7.8, 7.9 (Z channel) 7.3 (memory increases capacity) 7.12 (unused symbol). Ex 7.23 assigned as homework.

Differential entropy (Ch 9): definition, examples (uniform, Gaussian); AEP, properties of Typical set. 2^h=edge of Typical set. Joint and conditional diff. entropy. Ex: multivariate Gaussian. Relative Entropy and mutual information. Inequalities. Hadamard. Shift and change of scale. multivariate Gaussian maximizes entropy at given covariance matrix.

Mutual information for discrete X and continuous Y. Ex: Evaluation for PAM signal with equially likely symbols on discrete-time memoryless additive gaussian channel (DTMAGC). Capacity of DTMAGC. Sampling Theorem and Shannon Capacity formula. Gaussian additive noise is a worst case for capacity.

Parallel Gaussian channels: capacity. discrete time additive gaussian channels (DTAGC) with memory: capacity. Introduction to Toeplitz matrices and Toeplitz distribution theorem. DTAGC capacity evaluation (water-pouring).

continuous-time additive gaussian channel (CTAGC) with memory: capacity evaluation of CTAGC using equivalent input noise, Karhunen-Loeve basis and continuous-time Toeplitz distribution theorem. Examples.


[1] T. M. Cover, J. A. Thomas, "Elements of Information Theory". John Wiley and Sons, 1991.

Testi Secondari:

[2] R. Blahut, "Principles and Practice of Information Theory". Addison-Wesley, 1988.
[3] J. Cioffi, "Ch. 8: Fundamental Limits of Coding and Sequences",

Metodi didattici

Didattica frontale 42 ore. Esercitazioni 6 ore.

COVID emergency:
All Classes will be held online on Teams in the following Virtual Classroom.

The Team name of the course is "INFORMATION THEORY-DIA-LM-1-COMMUNICATION ENGINEERING". The Team registration code is 1z57r1y.

Classes will be held in the virtual classroom on:

Monday 12:00-14:00; Tuesday 12:00-13:00 and 15:00-16:00

Teaching will work as follows. BEFORE every class, it is YOUR RESPONSIBILITY to watch on your own the corresponding
and use the corresponding
class notes.
ID and password to access the videos/slides will be communicated to you in class on the first lecture.
At the scheduled Teams Videomeeting the instructor will go over the main concepts of the videolecture, provide extra examples, and answer the questions you hopefully will have collected along with your doubts/curiosities.

Modalità verifica apprendimento


Oral only, to be scheduled on an individual basis. When ready, please contact the instructor by email
alberto.bononi[AT] by specifying the requested date. The exam consists of solving some proposed exercises and explaining theoretical details connected with them, for a total time of about 1 hour. You can have a summary of important formulas written in A SINGLE A4 sheet to consult if you so wish.

NOTE: The exam may be split into two distinct parts and scheduled on different days at the student's request: Part 1 Data Compression; Part 2 Channel Coding.

NOTE: even if you register on ESSE3 for an exam, please send email to alberto.bononi[AT] to inform the instructor directly. The exam will take place online on Teams.


We will be in video-connection on microsoft Teams. Once we agree on a time/date, you will get an invitation: to connect click on the link at the botton of the invitation email on the day of the exam.

At the beginning, you are requested to show at 360 degrees the room where you are taking the exam. You must be alone in the room, with only paper and pen and your allowed A4 summary sheet.

How the exam will proceed depends on whether you have just pen and paper (and a cell phone), or if you have an electronic device where you can write on by hand (eg a tablet)

CASE A) paper and pen:

I will provide you the text of the exam on my shared screen. You can copy that out on your paper sheet.

You are requested to look into the camera when you reply to me, or into your paper sheet when you write.

You are requested to tilt the camera of your PC such that the sheet of paper on which you write is clearly visible and readable by me.
This will allow me to follow your work and guide you towards the solution.

Sometimes you will be asked to take pictures of your work and send it by email/through the Teams chat. Please make sure to reduce the resolution of your camera, so that pictures are more compressed. Or please download the app
Adobe Scan to take compressed pictures, so that you can transmit them easily even on a low bandwidth connection.

CASE B) you have a tablet and can write on it

in that case I will send you a link to an electronic whiteboard
(I use which we will use as a shared paper sheet to work out your exam, exactly as we used to do in my office.

IMPORTANT NOTICE: Getting help on the internet from others on solving your problem is considered as a delinquent behavior and may lead to your withdrawal from the exam and to possible further sanctions.

Altre informazioni

1) Ricevimento

Monday 15:00-17:00 (Scientific Complex, Building 2, floor 2, Room 2/19T).

COVID emergency:
Meet me on the
"LMCE 20-21" Team Virtual Classroom (same as where Lecture 0 and prep-courses review took place). Please send an email so that we can schedule an appointment.

2) Sito web del corso:

To get userid and password, please send an email to
from your account