C++11 library to write std::vector to numpy `npy` file (format 2)

Here is the use scenario: You want to write/append your STL vector to a numpy binary file which you can process using python-numpy/scipy.

Another scenario: Your application generates a huge amount of data. If it is not written to disk, you will end up consuming so much RAM that kernel will kill the application. Occasionally you have to flush the data to a file to keep going.

 

The quick approach is to write to a text file (which proper formatting of double datatype¬† — check boost::lexical_cast<double>(string) convenient if already using boost, or sprintf which is fastest, stringstreams are comparatively slow), also std::to_string does not have very high precision). Unfortunately text-files are bulky; and implementing binary format is too much work.

Numpy format is quite minimal http://docs.scipy.org/doc/numpy/neps/npy-format.html. It has two version. There is already a pretty good library which supports version 1 of numpy format  https://github.com/rogersce/cnpy. And it can also write to `npz` format (zipped). The size of headers are limited in version 1.

 

I needed the ability to write bigger header (numpy structured arrays). I wrote this https://github.com/dilawar/cnpy2 . It writes/appends data to a given file in numpy file format version 2. See the README.md file for more details. It only supports writing to npy format. That’s all! Unfortunately, only numpy >= 1.9¬† supports version 2. So to make this library a bit more useful, version 1 is also planned.

Advertisements