Performance of “sorting dictionary by values” in python2, python3 and pypy

The script is hosted here http://github.com/dilawar/playground/raw/master/Python/test_dict_sorting.py . It is based on the work of https://writeonly.wordpress.com/2008/08/30/sorting-dictionaries-by-value-in-python-improved/

My script has been changed to accommodate python3 (iteritems is gone and replaced by items — not sure whether it is a fair replacement). For method names and how they are implemented, please refer to script or the blog post.

Following chart shows the comparison. PyPy does not boost up the performance for simple reason that dictionary sorted is not large enough. I’ve put it here just for making a point and PyPy can slow thing down on small size computation.

The fastest method is sbv6 which is based on PEP-0265 https://www.python.org/dev/peps/pep-0265/ is the fastest. Python3 always performing better than python2.

sort_dict_python

 

 

Advertisements

Writing Maxima expression to text file in TeX format (for LaTeX)

You want to write an Maxima expression to a file which can be read by other application e.g. LaTeX.

Lets say the expression is sys which contains variable RM. You first want to replace RM by R_m .  Be sure to load mactex-utilities if you have matrix. Without loading this module, the tex command generates TeX output, not LaTeX.

load( "mactex-utilities" )$ 
sys : RM * a / b * log( 10 )$
texput( RM, "R_m")$
sysTex : tex( sys, false)$
with_stdout( "outout.txt", display( sysTex ) )$

Other methods such as stringout, save and write put extra non-TeX characters in file.

I get the following in file outout.txt after executing the above.

{{\log 10\,R_m\,a}\over{b}}

Image stabilization using OpenCV

This application deals with video of neural recordings. In such recordings, feature sizes are small. On top of it, recordings are quite noisy. Animal head movements introduces sharp shakes. Out of the box video stabilizer may not work very well on such recordings. Though there are quite a lot of plugins for ImageJ to do such a work, I haven’t compared their performance with this application. This application is hosted here https://github.com/dilawar/video_stabilizer and a demo video is available on youtube here  https://www.youtube.com/watch?v=vGjIFvzOOQ8 .

The summary of basic principle is following:

0. Collect all frames in a list/vector.

1. Use bilateral filter to smooth out each frame. Bilateral filter smoothens image without distorting the edges (well to a certain extent).

2.  Calculate optical flow between previous frame and current frame. This is a proxy for movement. Construct a transformation and store them in a vector. OpenCV function `goodFeatureToTrack` does almost all the work for us.

  1. Take average of these transformations and apply it on each frame of original recording; that’s correct motion.

A csv reader based on Haskell-cassava library : performance

I implemented my own csv reader using cassava library. The reader from missingh library was taking too long (~ 17 seconds) for a file with 43200 lines. I compared the result with python-numpy and python-pandas csv reader. Below is rough comparison.

cassava (ignore #) 3.3 sec
cassava (no support for ignoring #) 2.7 sec
numpy loadtxt > 10 sec
pandas read_csv 1.5 sec

As obvious, pandas does really well at reading csv file. I was hoping that my csv reader would do better but it didn’t. But it still beats the parsec based reader hands down.

The code is here https://github.com/dilawar/HBatteries/blob/master/src/HBatteries/CSV.hs

Thresholding numpy array, well sort of

Here is a test case

>>> import numpy as np
>>> a = np.array( [ 0.0, 1, 2, 0.2, 0.0, 0.0, 2, 3] )

I want to turn all non-zero elements of this array to 1. I can do it using np.where and numpy indexing.

>>>  a[ np.where( a != 0 ) ] = 1
>>> a
array([ 0.,  1.,  1.,  1.,  0.,  0.,  1.,  1.])

note: np.where returns the indices where the condition is true. e.g. if you want to change all 0 to -1.

>>> a[ np.where[ a == 0] ] = -1.0

That’s it. Checkout np.clip as well.

C++11 library to write std::vector to numpy `npy` file (format 2)

Here is the use scenario: You want to write/append your STL vector to a numpy binary file which you can process using python-numpy/scipy.

Another scenario: Your application generates a huge amount of data. If it is not written to disk, you will end up consuming so much RAM that kernel will kill the application. Occasionally you have to flush the data to a file to keep going.

 

The quick approach is to write to a text file (which proper formatting of double datatype  — check boost::lexical_cast<double>(string) convenient if already using boost, or sprintf which is fastest, stringstreams are comparatively slow), also std::to_string does not have very high precision). Unfortunately text-files are bulky; and implementing binary format is too much work.

Numpy format is quite minimal http://docs.scipy.org/doc/numpy/neps/npy-format.html. It has two version. There is already a pretty good library which supports version 1 of numpy format  https://github.com/rogersce/cnpy. And it can also write to `npz` format (zipped). The size of headers are limited in version 1.

 

I needed the ability to write bigger header (numpy structured arrays). I wrote this https://github.com/dilawar/cnpy2 . It writes/appends data to a given file in numpy file format version 2. See the README.md file for more details. It only supports writing to npy format. That’s all! Unfortunately, only numpy >= 1.9  supports version 2. So to make this library a bit more useful, version 1 is also planned.