Performance of “sorting dictionary by values” in python2, python3 and pypy

The script is hosted here http://github.com/dilawar/playground/raw/master/Python/test_dict_sorting.py . It is based on the work of https://writeonly.wordpress.com/2008/08/30/sorting-dictionaries-by-value-in-python-improved/

My script has been changed to accommodate python3 (iteritems is gone and replaced by items — not sure whether it is a fair replacement). For method names and how they are implemented, please refer to script or the blog post.

Following chart shows the comparison. PyPy does not boost up the performance for simple reason that dictionary sorted is not large enough. I’ve put it here just for making a point and PyPy can slow thing down on small size computation.

The fastest method is sbv6 which is based on PEP-0265 https://www.python.org/dev/peps/pep-0265/ is the fastest. Python3 always performing better than python2.

sort_dict_python

 

 

Writing Maxima expression to text file in TeX format (for LaTeX)

You want to write an Maxima expression to a file which can be read by other application e.g. LaTeX.

Lets say the expression is sys which contains variable RM. You first want to replace RM by R_m .  Be sure to load mactex-utilities if you have matrix. Without loading this module, the tex command generates TeX output, not LaTeX.

load( "mactex-utilities" )$ 
sys : RM * a / b * log( 10 )$
texput( RM, "R_m")$
sysTex : tex( sys, false)$
with_stdout( "outout.txt", display( sysTex ) )$

Other methods such as stringout, save and write put extra non-TeX characters in file.

I get the following in file outout.txt after executing the above.

{{\log 10\,R_m\,a}\over{b}}

A csv reader based on Haskell-cassava library : performance

I implemented my own csv reader using cassava library. The reader from missingh library was taking too long (~ 17 seconds) for a file with 43200 lines. I compared the result with python-numpy and python-pandas csv reader. Below is rough comparison.

cassava (ignore #) 3.3 sec
cassava (no support for ignoring #) 2.7 sec
numpy loadtxt > 10 sec
pandas read_csv 1.5 sec

As obvious, pandas does really well at reading csv file. I was hoping that my csv reader would do better but it didn’t. But it still beats the parsec based reader hands down.

The code is here https://github.com/dilawar/HBatteries/blob/master/src/HBatteries/CSV.hs

Thresholding numpy array, well sort of

Here is a test case

>>> import numpy as np
>>> a = np.array( [ 0.0, 1, 2, 0.2, 0.0, 0.0, 2, 3] )

I want to turn all non-zero elements of this array to 1. I can do it using np.where and numpy indexing.

>>>  a[ np.where( a != 0 ) ] = 1
>>> a
array([ 0.,  1.,  1.,  1.,  0.,  0.,  1.,  1.])

note: np.where returns the indices where the condition is true. e.g. if you want to change all 0 to -1.

>>> a[ np.where[ a == 0] ] = -1.0

That’s it. Checkout np.clip as well.

Safe level of mercury in soil, water and food

Following are verbatim from book chapter ( http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3096006/ ) .

Air

The World Health Organization guideline value for inorganic mercury vapor is 1 μg/m3 as an annual average.227 A tolerable concentration is 0.2 μg/m3 for long-term inhalation exposure to elemental mercury vapor, and a tolerable intake of total mercury is 2 μg/kg body weight per day.70

Fish

Food and Agriculture Organization/World Health Organization Codex Alimentarius—Commission guideline levels for methylmercury in fish 0.5 mg/kg for predatory fish (such as shark, swordfish, tuna, pike and others) is 1 mg/kg.228

Food

For methylmercury, the Joint Food and Agriculture Organization/World Health Organization Expert Committee on Food Additives (JECFA) set in 2004 a tolerable weekly intake of 1.6 μg/kg body weight per week to protect the developing fetus from neurotoxic effects.229 JEFCA230 confirmed this provisional tolerable weekly intake level, taking into account that adults might not be as sensitive as the developing fetus, in 2003 (JECFA/61/SC http://www.who.int/ipcs/food/jecfa/summaries/en/summary_61.pdf) and 2006 (JECFA/67/SC http://www.who.int/ipcs/food/jecfa/summaries/summary67.pdf).231,232

Soil

United Nations Environment Programme Global Mercury Assessment quotes for soil, preliminary critical limits to prevent ecological effects due to mercury in organic soils with 0.07-0.3 mg/kg for the total mercury content in soil.4

 

Benchmark ODE solver: GSL V/s Boost Odeint library

For our neural simulator, MOOSE, we use GNU Scientific Library (GSL) for random number generation, for solving system of non-linear equations, and for solving ODE system.

Recently I checked the performance of GSL ode solver V/s Boost ode solver; both using Runge-Kutta 4. The boost-odeint outperformed GSL by approximately by a factor of 4. The numerical results were same. Both implementation were compiled with -O3 switch.

Below are the numerical results. In second subplot, a molecule is changing is concentration ( calcium ) and on the top, another molecule is produced (CaMKII). This network has more than 30 reactions and approximately 20 molecules.

GSL took approximately 39 seconds to simulate the system for 1 year. While Boost-odeint took only 8.6 seconds. (This turns out to be a corner case)

Update: If I let both solvers choose the step size by themselves for given accuracy, boost usually outperforms the GSL solver by a factor of 1.2x to 2.9x. These tests were done during development of a signaling network which has many many reactions. Under no circumstances I have tested, BOOSE ODE solver was slower than GSL solver.