Performance of “sorting dictionary by values” in python2, python3 and pypy

The script is hosted here . It is based on the work of

My script has been changed to accommodate python3 (iteritems is gone and replaced by items — not sure whether it is a fair replacement). For method names and how they are implemented, please refer to script or the blog post.

Following chart shows the comparison. PyPy does not boost up the performance for simple reason that dictionary sorted is not large enough. I’ve put it here just for making a point and PyPy can slow thing down on small size computation.

The fastest method is sbv6 which is based on PEP-0265 is the fastest. Python3 always performing better than python2.




Thresholding numpy array, well sort of

Here is a test case

>>> import numpy as np
>>> a = np.array( [ 0.0, 1, 2, 0.2, 0.0, 0.0, 2, 3] )

I want to turn all non-zero elements of this array to 1. I can do it using np.where and numpy indexing.

>>>  a[ np.where( a != 0 ) ] = 1
>>> a
array([ 0.,  1.,  1.,  1.,  0.,  0.,  1.,  1.])

note: np.where returns the indices where the condition is true. e.g. if you want to change all 0 to -1.

>>> a[ np.where[ a == 0] ] = -1.0

That’s it. Checkout np.clip as well.

Comparision of various matplotlib/pylab plotting styles, version-1.(4, 5)

Global font size is set to 8 in all figures.


Style ‘ggplot’ is my first preference but the overlapping histogram does not have good contrast. So I settled for ‘seahorse-darkgrid’.

Search files on linux containing a given pattern

The linux utility grep is great. But somehow I do not like it. Too much to remember and regex format sucks.

I wrote a python script which does something similar. This is how one should use this script

./ path pattern_to_search [file_pattern]

First argument path is the directory in which we want to search. It will search recursively inside dir. Second argument pattern_to_search is a regular expression which we want to search in a file. This regex in defined in python re library. The . matches newline alos.

Third argument file_pattern is optional. This is regular expression of filename. Only those files which matches this regex will be considered. For example if I want to search python files with extension py containing Pool( followed by Adaptor, I do the following,

./ . "Pool\(.*?Adaptor"  .*py

And voila, it generates path of matches files and line no at which the match was found. If more than one match was found, then each line no will be appended to filename.

Enhanced by Zemanta

The game of “Bulls and Cows” : A python implementation

Recently, a friend taught me a simple game of bulls and cows. This game goes like this. There are two players, A and B. A thinks of a 4-letter word and asks B to guess it. Assume that A thought of “link” and B guesses “lean”. The bulls are the number of letters which are common to both words at the same location in both words , for example “l” is a bull in “link” and “lean”. The cows  are number of letters which are common to both words but not bull. Letter “l” and “n” are common to both but “l” is the bull, so “n” is the only cow. This guess or “lean” has one bull and one cow. Player A tells B the number of bulls and cows. This helps B in making the next guess; the process goes on till B finds the word.

I wrote a python program to play this game on your computer. I am not sure if it will work on windows. It will definitely work on Linux. Check out repository implementation of the game.

I will also write an algorithm which would find the word of A in minimum steps.