A csv reader based on Haskell-cassava library : performance

I implemented my own csv reader using cassava library. The reader from missingh library was taking too long (~ 17 seconds) for a file with 43200 lines. I compared the result with python-numpy and python-pandas csv reader. Below is rough comparison.

cassava (ignore #) 3.3 sec
cassava (no support for ignoring #) 2.7 sec
numpy loadtxt > 10 sec
pandas read_csv 1.5 sec

As obvious, pandas does really well at reading csv file. I was hoping that my csv reader would do better but it didn’t. But it still beats the parsec based reader hands down.

The code is here https://github.com/dilawar/HBatteries/blob/master/src/HBatteries/CSV.hs

Posted in Haskell, Uncategorized | Tagged , , | Leave a comment

Thresholding numpy array, well sort of

Here is a test case

>>> import numpy as np
>>> a = np.array( [ 0.0, 1, 2, 0.2, 0.0, 0.0, 2, 3] )

I want to turn all non-zero elements of this array to 1. I can do it using np.where and numpy indexing.

>>>  a[ np.where( a != 0 ) ] = 1
>>> a
array([ 0.,  1.,  1.,  1.,  0.,  0.,  1.,  1.])

note: np.where returns the indices where the condition is true. e.g. if you want to change all 0 to -1.

>>> a[ np.where[ a == 0] ] = -1.0

That’s it. Checkout np.clip as well.

Posted in Python, Uncategorized | Tagged , | Leave a comment

C++11 library to write std::vector to numpy `npy` file (format 2)

Here is the use scenario: You want to write/append your STL vector to a numpy binary file which you can process using python-numpy/scipy.

Another scenario: Your application generates a huge amount of data. If it is not written to disk, you will end up consuming so much RAM that kernel will kill the application. Occasionally you have to flush the data to a file to keep going.


The quick approach is to write to a text file (which proper formatting of double datatype  — check boost::lexical_cast<double>(string) convenient if already using boost, or sprintf which is fastest, stringstreams are comparatively slow), also std::to_string does not have very high precision). Unfortunately text-files are bulky; and implementing binary format is too much work.

Numpy format is quite minimal http://docs.scipy.org/doc/numpy/neps/npy-format.html. It has two version. There is already a pretty good library which supports version 1 of numpy format  https://github.com/rogersce/cnpy. And it can also write to `npz` format (zipped). The size of headers are limited in version 1.


I needed the ability to write bigger header (numpy structured arrays). I wrote this https://github.com/dilawar/cnpy2 . It writes/appends data to a given file in numpy file format version 2. See the README.md file for more details. It only supports writing to npy format. That’s all! Unfortunately, only numpy >= 1.9  supports version 2. So to make this library a bit more useful, version 1 is also planned.

Posted in Library | Tagged | Leave a comment

Safe level of mercury in soil, water and food

Following are verbatim from book chapter ( http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3096006/ ) .


The World Health Organization guideline value for inorganic mercury vapor is 1 μg/m3 as an annual average.227 A tolerable concentration is 0.2 μg/m3 for long-term inhalation exposure to elemental mercury vapor, and a tolerable intake of total mercury is 2 μg/kg body weight per day.70


Food and Agriculture Organization/World Health Organization Codex Alimentarius—Commission guideline levels for methylmercury in fish 0.5 mg/kg for predatory fish (such as shark, swordfish, tuna, pike and others) is 1 mg/kg.228


For methylmercury, the Joint Food and Agriculture Organization/World Health Organization Expert Committee on Food Additives (JECFA) set in 2004 a tolerable weekly intake of 1.6 μg/kg body weight per week to protect the developing fetus from neurotoxic effects.229 JEFCA230 confirmed this provisional tolerable weekly intake level, taking into account that adults might not be as sensitive as the developing fetus, in 2003 (JECFA/61/SC http://www.who.int/ipcs/food/jecfa/summaries/en/summary_61.pdf) and 2006 (JECFA/67/SC http://www.who.int/ipcs/food/jecfa/summaries/summary67.pdf).231,232


United Nations Environment Programme Global Mercury Assessment quotes for soil, preliminary critical limits to prevent ecological effects due to mercury in organic soils with 0.07-0.3 mg/kg for the total mercury content in soil.4


Posted in Uncategorized | Tagged | Leave a comment

Benchmark ODE solver: GSL V/s Boost Odeint library

For our neural simulator, MOOSE, we use GNU Scientific Library (GSL) for random number generation, for solving system of non-linear equations, and for solving ODE system.

Recently I checked the performance of GSL ode solver V/s Boost ode solver; both using Runge-Kutta 4. The boost-odeint outperformed GSL by approximately by a factor of 4. The numerical results were same. Both implementation were compiled with -O3 switch.

Below are the numerical results. In second subplot, a molecule is changing is concentration ( calcium ) and on the top, another molecule is produced (CaMKII). This network has more than 30 reactions and approximately 20 molecules.

GSL took approximately 39 seconds to simulate the system for 1 year. While Boost-odeint took only 8.6 seconds!


Posted in Biological systems, Numerical computation, Uncategorized | Tagged , , , , | Leave a comment

Multi-dimentional root fiding using Newton-Raphson method

In short, It does the same thing as GNU-gsl `gnewton` solver for finding roots of a system of non-linear system.

The standard caveats apply when finding the multi-dimensional roots.

The templated class (must be compiled with  -std=c++11). It uses boost::numeric::ublas library.

The code is available here:


The header file contains the class, file `test_root_finding.cpp` shows how to use it.


Posted in Uncategorized | Tagged , , , , | Leave a comment

Performance of random number generator (mersenne twister)

I compared 4 implementations of mersenne twister (mt19937, 32 bit) algorithm: C++-11 standard template library <random>, boost, GNU Scientific Library, and our own implementation for MOOSE simulator. The code is here (You can customize the code to run it on your platform.  You have to disable MOOSE related code.). Flag -O3 flag passed to gcc-4.8 compiler, on OpenSUSE-Leap 42.1 OS.

I generated random numbers in a tight loop of size N. Numbers were stored in a per-initialized vector. For the bench-marking, I subtracted the time taken to put N values into std::vector. I ran the benchmark 5 times for each. Here is a sample run:

Wrote 1000000 sampled to sample_numbers.csv file
GSL random number generator: mt19937
MOOSE=3135507266, STL=2357136044,BOOST=2357136044,GSL=4293858116
MOOSE=1811477324, STL=2546248239,BOOST=2546248239,GSL=699692587
MOOSE=2095834071, STL=3071714933,BOOST=3071714933,GSL=1213834231
MOOSE=258599318, STL=3626093760,BOOST=3626093760,GSL=4068197670
MOOSE=1470212236, STL=2588848963,BOOST=2588848963,GSL=994957275
MOOSE=3009017253, STL=3684848379,BOOST=3684848379,GSL=2082945813
MOOSE=2280525416, STL=2340255427,BOOST=2340255427,GSL=4112332215
MOOSE=2165689929, STL=3638918503,BOOST=3638918503,GSL=3196767107
MOOSE=551388967, STL=1819583497,BOOST=1819583497,GSL=2319469851
MOOSE=368560217, STL=2678185683,BOOST=2678185683,GSL=3178073856
N = 100
 Baseline (vector storage time) ends. Time 2.46e-07
 MOOSE start.. ends. Time 1.309e-06
 STL starts .. ends. Time 1.6e-07
 BOOST starts .. ends. Time 1.54e-07
 GSL starts .. ends. Time 5.07e-07
N = 1000
 Baseline (vector storage time) ends. Time 2.26e-07
 MOOSE start.. ends. Time 6.63e-06
 STL starts .. ends. Time 3.883e-06
 BOOST starts .. ends. Time 2.3e-06
 GSL starts .. ends. Time 5.844e-06
N = 10000
 Baseline (vector storage time) ends. Time 1.213e-06
 MOOSE start.. ends. Time 6.1885e-05
 STL starts .. ends. Time 4.4721e-05
 BOOST starts .. ends. Time 2.4108e-05
 GSL starts .. ends. Time 6.3407e-05
N = 100000
 Baseline (vector storage time) ends. Time 1.3876e-05
 MOOSE start.. ends. Time 0.000616378
 STL starts .. ends. Time 0.000445375
 BOOST starts .. ends. Time 0.00023736
 GSL starts .. ends. Time 0.000630819
N = 1000000
 Baseline (vector storage time) ends. Time 0.000281644
 MOOSE start.. ends. Time 0.00591533
 STL starts .. ends. Time 0.00426978
 BOOST starts .. ends. Time 0.00222824
 GSL starts .. ends. Time 0.00612132
N = 10000000
 Baseline (vector storage time) ends. Time 0.00418973
 MOOSE start.. ends. Time 0.0574826
 STL starts .. ends. Time 0.0411453
 BOOST starts .. ends. Time 0.0204587
 GSL starts .. ends. Time 0.0590613
N = 100000000
 Baseline (vector storage time) ends. Time 0.0465758
 MOOSE start.. ends. Time 0.569934
 STL starts .. ends. Time 0.399875
 BOOST starts .. ends. Time 0.195795
 GSL starts .. ends. Time 0.598316
N = 1000000000
 Baseline (vector storage time) ends. Time 0.458415
 MOOSE start.. ends. Time 5.69154
 STL starts .. ends. Time 4.09148
 BOOST starts .. ends. Time 2.02454
 GSL starts .. ends. Time 5.85985
[100%] Built target run_randnum_benchmark

Here is the summary: boost random number generator is approximately 2x faster than C++-STL <random> implementation. BOOST is approximately 4x faster than GSL (which is as fast as MOOSE random number generator).


MOOSE, C++11 <random> (STL), BOOST, and GSL are compared for mt19937 random number generator. For N > 100 to 10^9 values, run time are plotted in log-scale.


Comparison of speed. GSL and MOOSE are equally fast and slowest. BOOST is the fastest (somewhere around 3x than MOOSE or GSL and 2x than STL).

Posted in Algorithms, Uncategorized | Tagged , , , , , | Leave a comment