Commit c09a4db9 authored by Michael Krause's avatar Michael Krause 🎉
Browse files

matlab: lot's of typos in efficient saving block

parent 58a41d1d
Pipeline #756 passed with stages
in 42 seconds
......@@ -4,6 +4,9 @@ Welcome to Tardis's documentation!
Changelog
=========
3.5 (07.07.2018)
+ added section about (non-)efficient saving of objects in Matlab
3.4.5 (09.02.2018)
+ major changes with ASHS
......
......@@ -300,12 +300,12 @@ There are two aspects to consider when saving larger objects with Matlab:
With objects larger than 2GB unfortunately, people usually resort to saving
with the Matlab File format v7.3. The default behaviour of Matlab's
:programe:`save(.., '-v7.3')` is suboptimal and you might want to save your
:program:`save(.., '-v7.3')` is suboptimal and you might want to save your
objects differently. The following examples highlight why this is necessary and
also why it's not trivial to just recommend a better alternative.
Consider two extreme variants of data, a truly random matrix and a matrix full
of zeros. Both in size 5GB:
of zeros. Both with a size of 5GB:
.. code-block:: bash
......@@ -313,7 +313,7 @@ of zeros. Both in size 5GB:
R = randn(128,1024,1024,5);
Z = zeros(128,1024,1024,5);
On a drained tardis saving these two objects naively takes **forever**
On a drained Tardis, saving these two objects naively takes **forever**
.. code-block:: bash
......@@ -347,13 +347,13 @@ saving directly to HDF5 could look like this:
>> tic; h5write('test.h5', '/R', R); toc;
Elapsed time is 15.225184 seconds.
In comparison to the naive ::program:`save()`, we are fast by a factor of 22.
In comparison to the naive :program:`save()`, we are faster by a factor of 22.
But this is not the whole story unfortunately. One downside to this approach is
connected to the internal structure of HDF5 objects. Saving a struct with
multiple, nested objects of different types (think string annotations, integer
arrays and float matrices) is much more tedious. Timothy E. Holy wrote
a wrapper script, that kind of automatically creates the necessary structures
and published the function ::program:`savefast` on `Matlab's filexchange`_. It
and published the function :program:`savefast` on `Matlab's filexchange`_. It
has a similar interface to the original save and can be used as a drop-in
replacement in many cases. You need to add the function to your search path of
course.
......@@ -371,10 +371,10 @@ waste storage space and saturates the disk arrays with only 4 concurrent jobs
In other words, using some kind of compression is necessary - unless you
**know** that you generated random or close to random data. To highlight the
differences that come with compression level, let's look at the 5GB of zeros
stored in Z again. Matlab is a bit fast in compressing and saving with
::program:`save()`. But it still takes 145 seconds. You might be tempted to
stored in Z again. Matlab is a bit faster in compressing and saving with
:program:`save()`. But it still takes 145 seconds. You might be tempted to
combine the savefast() approach and simply compress with something fast like
::program:`gzip` afterwards. This will actually speed things up _and_ save
::program:`gzip` afterwards. This would actually speed things up *and* save
a lot of space:
.. code-block:: bash
......@@ -398,7 +398,7 @@ Using HDF5's low level function you can fine tune the compression level from
level however, you also need to set a chunk size, probably because compression
is done chunk-wise. Recommending a generic compression level is hard and
depends very much on your data. Of course you don't want to waste time by
maximizing compression ratio, gaining only a couple of megabytes, but you also
maximizing the compression ratio, gaining only a couple of megabytes, but you also
don't want to waste bandwidth by saving overhead data. Consider our zeros again:
.. code-block:: bash
......@@ -427,10 +427,12 @@ don't want to waste bandwidth by saving overhead data. Consider our zeros again:
>> ls -lh test.h5
-rw-r--r-- 1 krause domain users 5.1G May 16 18:38 test.h5
Here i picked a chunk size of 1GB, compressing with levels 9,6,3, and 0. Not
surprisingly the optimal value in this group is 3, as it only takes 20seconds
to save the data, while still reducing the 5GB file to 23MB.
Here I picked a chunk size of 1GB, compressing with levels 9, 6, 3, and 0. Not
surprisingly the optimal value in this group is somewhere in the middle (3) and
it only takes 20seconds to save the data, while still reducing the 5GB file to
23MB.
.. _`MATLAB Distributed Computing Server`: http://de.mathworks.com/help/mdce/index.html
.. _`Tardis Website`: https://tardis.mpib-berlin.mpg.de/nodes
.. _`Matlabs' fileexchange`: https://de.mathworks.com/matlabcentral/fileexchange/39721-save-mat-files-more-quickly
.. _`Matlab's fileexchange`: https://de.mathworks.com/matlabcentral/fileexchange/39721-save-mat-files-more-quickly
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment