Compressed Sensing: A Python demo

I am having trouble with HTML format of blog post. Here is draft PDF PDF.

I recommend . This is much better written.

Can you solve $latex Ax=b$ for $latex x$ when number of rows are less than number of columns in (A) i.e there are more variables than equations? Such a system is called underdetermined system. For examples, give me 3 numbers whose average is 3. Or solve (x/3+y/3+z/3=3). You are right! There are many solutions to this problem!

If one adds more containts then one can hope to get a unique solution. Give me a solution which has minimum length (minimize (L_2)-norm)? Answer is 3,3,3 (hopefully). Give me a solution which is most sparse (least number of non-zero entries)? Answer is 9,0,0 (or 0,0,9).

Many signals are sparse (in some domain). Images are sparse in Fourier/Wavelets domain; neural acitivity is sparse in time-domain, activity of programming is sparse in spatial-domain. Can we solve under-determined (Ax=b) for sparse (x). Formally,

\min_x L_0(x), \quad \text{st} \quad Ax = b where (L_0) is the 0-norm i.e. number of non-zero entries.

This problem is hard to solve. (L_0) is not a convex function. In this case, one uses convex relaxation trick and replace the non-convex function with convex one.

\min_x L_1(x), \quad \text{st} \quad Ax = b

It is now well-understood that under certain assumptions on (A), we can recover (x) by solving this tractable convex-optimization problem rather than the original NP hard problem. Greedy strategy based solvers also exists which works even faster (though, according to Terry Tao blog, they do not guarantee the same performance as interior-point method based).

Somewhat formal statement

If (x_N) (size (N)) is (S-sparse) then an under-determined system (A_{kN}x_{N} =b_k) — where (b) is vector of (k) observations — can be solved exactly given that (A) (mesurement matrix) has following properties:

  • \left\Vert y\right\Vert^2 \sim = \left\Vert x \right\Vert ^2 where \(y = A x\) for any sparse signal \(x\) i.e application of measurements matrix does not change the length of sparse signal \(x\) ‘very much’. See Restricted Isometric Property in sec. 3.1 .

Lets there is a sparse signal (x) of length (N). Its sparse. We take a dot-product of x with a random vector (A_i) of size N i.e. (b_i = A_i * x). We make (k) such measurements. We can put all k (A_i) into a matrix (A) and represent the process by a linear system:

\begin{bmatrix} A_1 \\ A_2 \\ \vdots \\ A_k \end{bmatrix} * x = \begin{bmatrix} b_1 \\ b_2 \\ \vdots \\ b_k \end{bmatrix} \qquad(1)

Note that each (A_i) has the dimension (1 \times N). We can rewrite eq. 1 as following:

A x = b\qquad(2) where dimension of (A) are (k \times N) and (k << N). This is an under-determined system with the condition that (x) is sparse. Can we recover (x) (of size N) from b (of size k (<<N))?

Figure fig. 1 shows a real system. Compressed sensing says that (x) can be recovered by solving the following linear program.

\min_x \left\Vert x \right\Vert_1 \; \text{given} \; Ax = b\qquad(3)

A demonstration using Python

Data for fig. 2 is generated by script ./ and for fig. 3, data is generated by script ./ Both of these scripts also generate similar figures.

Dependencies In addition to scipy, numpy, and matplotlib, we also need pyCSalgo. It is available via pip: pip install pyCSalgo --user.

Code is available at github


  1. input : sparse signal x of size N
  2. Generate a random matrix (measurement matrix) \(A\) of size \(k\times N\).
  3. Make \(k\) measurements of \(x\) using \(A\). That is \(b = A x\). Note that each measurement of \(x\) (i.e. entry of \(b\)) is some linear combination of values of \(x\) given by \(A_i x\) where \(A_i\) is ith row.
  4. Now find \(x\) by solving \(\min_x \left\Vert x \right\Vert_1 \text{given}\; Ax=b\).

For step 4, we are using function l1eq_pd from Python library pyCSalgo. This is reimplementation of l1magic routine written in Matlab by Candes. When (k >> 2S) where (S) is the sparsity of (x), we can recover (x) quite well. In practice, (k \ge 4S) almost always gives exact result.

There are other algorithms available which works faster; they are based on greedy strategy. We are not discussing them here.

Sparse in time domain


Figure 1: x is the sparse signal of length 256. Using measurements matrix A, we made 124 random measurements of \(x\) namely b. The sparse solution of eq. 2 is the solution of $\min_x \left\Vert x \right\Vert_1 \text{given}\; Ax =b$.

Sparse in Fourier domain


Figure 3: CS solution when signal is sparse in Forier domain. Note that we took 200 samples for a singal of size 2000. The recovery is pretty good.

With error in measurements

This example shows that some error is added to measurements. The reconstructed signal is no longer ‘exact’ enough. But still, given that x was binary signal, we can recover x by thresholding function.


Mathematical definitions

Restricted isometric property (RIP)

Restricted isometric property (RIP) of a matrix (A) is defined as the following.

Given (\delta_s \in (0,1)), for any sub-matrix (A_s) of (A), and for any sparse vector (y), if following holds

(1-\delta_s) \left\Vert y \right\Vert^2 \leq \left\Vert A_s y \right\Vert ^2 \leq (1+\delta_s) \left\Vert y \right\Vert ^2

then matrix (A) is said to satisfy (s-)restricted isometric property with restricted isometric constant (\delta_s). Note that a matrix A with such a property does not change the length of signal (x) ‘very much’. This enables us to sense two different sparse signal (x_1) and (x_2) such that (A x_1) and (A x_2) are almost likely to be different.


To learn about compressed sensing, I recommend following articles

There are many many other great articles and papers on this subject. There are some dedicated webpages also.


Benchmark ODE solver: GSL V/s Boost Odeint library

For our neural simulator, MOOSE, we use GNU Scientific Library (GSL) for random number generation, for solving system of non-linear equations, and for solving ODE system.

Recently I checked the performance of GSL ode solver V/s Boost ode solver; both using Runge-Kutta 4. The boost-odeint outperformed GSL by approximately by a factor of 4. The numerical results were same. Both implementation were compiled with -O3 switch.

Below are the numerical results. In second subplot, a molecule is changing is concentration ( calcium ) and on the top, another molecule is produced (CaMKII). This network has more than 30 reactions and approximately 20 molecules.

GSL took approximately 39 seconds to simulate the system for 1 year. While Boost-odeint took only 8.6 seconds. (This turns out to be a corner case)

Update: If I let both solvers choose the step size by themselves for given accuracy, boost usually outperforms the GSL solver by a factor of 1.2x to 2.9x. These tests were done during development of a signaling network which has many many reactions. Under no circumstances I have tested, BOOSE ODE solver was slower than GSL solver.

Computing value of pi using Monte-Carlo sampling

I used Haskell to do this. Code is here I did not use standard System.Random library to generate random number but mwc-random package. I can’t say if speed is better or worse.

I wanted to see how fast we converge to the value of pi. On x-axis, we have number of samples (N), and for each N, we compute the value of pi 50 times. This way we can talk about mean and variance in computed pi (which we plot on y-axis). It seems that for N >= 1000, we get a reasonable approximation of pi with low variance.

For a 2-D sphere (aka circle: x_1^2 + x_2 ^ 2 = 1 ), The convergence is following.


For a 3-D sphere, (that is x_1 ^2 + x_2 ^2 + x_3 ^3 = 1), I get the following:


Now I am bit more ambitious, I use a 10-D sphere:

As you can see, we converge to a a reasonable value of pi with low variance but convergence is not improving monotonically as in previous cases. The mean starts from a very low value and get to a respectable value only after N > 1000 (in low-dimensions cases we were swinging around 3.1415). The answer: we need to sample minimum number of points from hyper-cube to get some reasonable estimate.


How variance decreases with N (when N is significantly large, 100 repeats of each computation)? Following plots starts from 2D and goes onto 10 dimensions.


Problem with GINaC library

While modeling a rat’s neuron axon, I needed to do some symbolic computation. I first used sympy. It failed to return any solution. I tried playing with its option but I did not pursue it too much.

Then, I went to wolfram alpha and pasted my equation there. Click me to see answer.

Then I thought of using GiNaC library of C++. It was some work but I managed to write a small program. This program is available here. The file axon.cpp is the main file and compartment.cpp is the file which creates this equation. Run ./ to compile and run the program. It is a cmake based project.

To my surprise, GiNaC failed to compute a solution.

-- Configuring done
-- Generating done
-- Build files have been written to: /home/dilawar/Work/GITHUB/courses/NeuroCourse/Assignments/Second/Problem01/cpp
Scanning dependencies of target Axon
[100%] Building CXX object CMakeFiles/Axon.dir/axon.cpp.o
Linking CXX executable Axon
[100%] Built target Axon
And the equation is (0.015915494309189535082+(3.1830988618379066628)*r)^(-1)*(3.5014087480216975145E-4-(0.12095775674984045018)*r)
 Puting wolfram answer back -4.2814382506543584682E-4
terminate called after throwing an instance of 'GiNaC::pole_error'
  what():  power::eval(): division by zero
./ line 2: 24873 Aborted                 (core dumped) ./Axon

This line in output computes the expression by putting the wolfram answer. Ideally it should be zero but it is not the case.

Puting wolfram answer back -4.2814382506543584682E-4

When I change this value and compute the expression again, I see large changes in output i.e. the function changes rapidly around its solution. I am not sure what I am doing wrong with this library. If everything is fine with my program, then capabilities of this library are rather limited and I’d recommend not using it for any serious lab work.

Coding with float

Some people trust computer to a degree that they will accept its computation as correct when algorithm is right and program has no bug. Some mistrust it to an extent that they wont trust any calculation returned by it.

There are serious limitations. Not only one can not represent 10/3 accurately as finite decimal representation, but one can’t represent some finite decimal representation such as 0.1 correctly in binary format [1]. See [2] for a detailed commentary on issues involved with computation with real numbers.

These issues have led many to explore alternatives of computation [3]. Till people come up with a nice model to compute real numbers, we can find ways to deal with them on computer.

On Windows machine, try computing \sqrt{4} - 2 or 2- \sqrt{4} in its inbuilt calculator. If you find its answer surprising and think of it as bug like this guy you should read [1].

Reference [1] explains why and when x=x+1.0 is true and when and why x=x is false when dealing with floating point numbers. Its a great reference for electrical engineering students who wants to implement a processor with floating point units in some HDL.

A rule of thumb is that when adding two floating point numbers, their relative values make a huge difference and one should be prepared for surprises. One can start his research into this by this wiki article. Ordering matters in such cases. Adding a float a into float b can be very different from adding b into a. Here is Haskell code which proves it. It computes the function f(x) = (2- x) - (2^3 - \frac{x^3}{12}) . One can translate it into other language also. This also serve as a beauty of Haskell syntax.

-- This is type signature of function. It says that function calc takes an
-- argument of type Float and returns a Float 
calc :: Float -> Float

-- This is the definition of function. Quite readable.
calc x = (x - (x ^ 3) / 12)

-- Haskell can infer the type by itself. So we can choose not to write a type
-- declaration. This function subtract f(x) from f(2).
calc2 x = (calc 2) - (calc x)

-- This function takes a list of Float and apply function calc2 on all of its
-- elements and add them up. In python, equivalent is
-- def calcList1(l):
--    newlist = []
--    for a in l:
--       newlist.append(calc2(a))
--    return (0.0 + sum(newlist))
calcList1 :: [Float] -> Float
calcList1 l = foldl (+) 0.0 (map calc2 l)

-- This is like the previous function but sum is done in reverse order. This is
-- not entirely correct. we are using foldr (+) and foldl (+) function to sum a
-- list. While the foldr adds first to the last, foldl adds from last to first.
-- Again this is not entirely correct. 

calcList2 :: [Float] -> Float
calcList2 l = foldr (+) 0.0 (map calc2 l)

-- Here we are applying the same function on the element of list but adding them
-- in different order. The input step construct a list l where each element is
-- step size away from the previos one.
-- e.g. [0.0,0.5..2.0] is [0.0, 0.5, 1.0, 1.5, 2.0]

test1 :: Float -> Float
test1 step = (calcList1 l) - (calcList2 l) 
        l = [0.0,step..2.0]

Let’s see now what happens when we run this script filename.hs in ghci filename.hs

*Main> test1 0.1
*Main> test1 0.01
*Main> test1 0.001
*Main> test1 0.0001

One expects 0.0 but this is not what is happening here. Life with floating point numbers is not easy, and it can be terribly confusing. If one is writing simulator, it pays to be aware of potential traps.


  1. What every computer scientist should know about floating-point arithmatic
  2. How are real number specified in computation
  3. Continuum Computing – The Next Step in the Theory of Computing