NetCDF-C
  1. NetCDF-C
  2. NCF-8

Improve compression to GRIB2 levels

    Details

    • Story Points:
      128

      Description

      Implement bit-shaving or other filter to improve netCDF-4 compression so that it's about as good (or better than) GRIB2.

      This compression will only be available for integer and floating point types.

      The current candidate for implementation is a C version of the Java wavelet compression implemented by RAL.

        Activity

        Hide
        Ed Hartnett added a comment -
        While I should have been doing other things, this afternoon I wrote a compression filter for HDF5. It was pretty easy. A callback function is called for each chunk of data, with some parameters, including whether this chunk is being read (in which case it must be uncompressed) or written (in which case it should be compressed). HDF5 filters can process the data in place, but this makes no sense with compression, in which a new buffer must be allocated, and the final number of bytes in the chunk will change.

        The sample program demonstrates how two parameters may be passed into the filter, type of the data and the number of bits of significant data. I have added this as test h5_test/tst_h_filters.c.

        It should be noted that until HDF5 supports this kind of filter, as it does zlib, h5dump would not be able to read the data (though ncdump would).

        My simple filter is called ed_compress_2000 (like jpeg2000, but much better). It compresses the data by 50% with the very computationally easy algorithm of throwing away every other byte of data. I don't know why no one has thought of this method of compression before, it is very fast and predicable. It's true that the data are a little garbled, but that's the way the cookie crumbles. Probably those scientific types are going to say that the ed_compress_2000 algorithm is not good enough for science data. But I like it.

        The program does demonstrate how any compression algorithm may be added to HDF5. It doesn't have to be ed_compress_2000. It could be jpeg2000, which would not be as cool, but might work better for scientists.
        Show
        Ed Hartnett added a comment - While I should have been doing other things, this afternoon I wrote a compression filter for HDF5. It was pretty easy. A callback function is called for each chunk of data, with some parameters, including whether this chunk is being read (in which case it must be uncompressed) or written (in which case it should be compressed). HDF5 filters can process the data in place, but this makes no sense with compression, in which a new buffer must be allocated, and the final number of bytes in the chunk will change. The sample program demonstrates how two parameters may be passed into the filter, type of the data and the number of bits of significant data. I have added this as test h5_test/tst_h_filters.c. It should be noted that until HDF5 supports this kind of filter, as it does zlib, h5dump would not be able to read the data (though ncdump would). My simple filter is called ed_compress_2000 (like jpeg2000, but much better). It compresses the data by 50% with the very computationally easy algorithm of throwing away every other byte of data. I don't know why no one has thought of this method of compression before, it is very fast and predicable. It's true that the data are a little garbled, but that's the way the cookie crumbles. Probably those scientific types are going to say that the ed_compress_2000 algorithm is not good enough for science data. But I like it. The program does demonstrate how any compression algorithm may be added to HDF5. It doesn't have to be ed_compress_2000. It could be jpeg2000, which would not be as cool, but might work better for scientists.
        Hide
        Ed Hartnett added a comment -
        My tst_h_filters program failed on mort and on buddy for 64-bit builds, with a segfault. I could see no obvious reason for this, but I commented out the test in the Makefile.am until after the 4.1.3 release.
        Show
        Ed Hartnett added a comment - My tst_h_filters program failed on mort and on buddy for 64-bit builds, with a segfault. I could see no obvious reason for this, but I commented out the test in the Makefile.am until after the 4.1.3 release.
        Hide
        Ed Hartnett added a comment -
        Some good background on compression can be found in these EE times articles:
        http://www.eetimes.com/design/signal-processing-dsp/4017497/Data-compression-tutorial-Part-1?pageNumber=0
        Show
        Ed Hartnett added a comment - Some good background on compression can be found in these EE times articles: http://www.eetimes.com/design/signal-processing-dsp/4017497/Data-compression-tutorial-Part-1?pageNumber=0
        Hide
        Ed Hartnett added a comment -
        There seem to be two free software JPEG2000 implementations:

        OpenJPEG: http://code.google.com/p/openjpeg/
        Jasper: http://www.ece.uvic.ca/~mdadams/jasper/ (official reference implementation)
        Show
        Ed Hartnett added a comment - There seem to be two free software JPEG2000 implementations: OpenJPEG: http://code.google.com/p/openjpeg/ Jasper: http://www.ece.uvic.ca/~mdadams/jasper/ (official reference implementation)
        Hide
        Ed Hartnett added a comment -
        Steve has sent a test grib/netcdf pair of files, which I've put in /home/ed/grib:

        * -rw-r--r-- 1 ed ustaff 144382 Jul 6 09:23 RUC252b_20110214_20110214_i00_f003_RUC252b.grb2_noaaportRuc252b_CLWMR.grb2
        * -rw-r--r-- 1 ed ustaff 246610 Jul 6 09:23 RUC252b_20110214_20110214_i00_f003_RUC252b.grb2_noaaportRuc252b_CLWMR.nc
        Show
        Ed Hartnett added a comment - Steve has sent a test grib/netcdf pair of files, which I've put in /home/ed/grib: * -rw-r--r-- 1 ed ustaff 144382 Jul 6 09:23 RUC252b_20110214_20110214_i00_f003_RUC252b.grb2_noaaportRuc252b_CLWMR.grb2 * -rw-r--r-- 1 ed ustaff 246610 Jul 6 09:23 RUC252b_20110214_20110214_i00_f003_RUC252b.grb2_noaaportRuc252b_CLWMR.nc
        Hide
        Russ Rew added a comment -
        A project in NCAR's Research Applications Laboratory (RAL) is developing an HDF5 filter for wavelet compression that will eventually provide a solution for this problem in a way that can be included under the current netCDF license. An architecture, report, and prototype (initially in Java) should be available by September 2012 for us to evaluate.
        Show
        Russ Rew added a comment - A project in NCAR's Research Applications Laboratory (RAL) is developing an HDF5 filter for wavelet compression that will eventually provide a solution for this problem in a way that can be included under the current netCDF license. An architecture, report, and prototype (initially in Java) should be available by September 2012 for us to evaluate.
        Hide
        Russ Rew added a comment -
        The RAL project is funded through 2012-09-30, when the report is due. It will produce a prototype implementation of Java software for testing.
        Show
        Russ Rew added a comment - The RAL project is funded through 2012-09-30, when the report is due. It will produce a prototype implementation of Java software for testing.
        Hide
        Ward Fisher added a comment -
        This is a test comment, to test Jira notifications.
        Show
        Ward Fisher added a comment - This is a test comment, to test Jira notifications.
        Hide
        Russ Rew added a comment -
        The RAL work by Steve Sullivan resulted in a Java implementation of "Sengcom" (short for scientific and engineering data compression) software, a new wavelet compression system to be considered as a candidate for netCDF-4 and HDF5 floating-point data. The software is labeled "PROOF OF CONCEPT CODE, FOR RESEARCH PURPOSES ONLY", so it still has some gaps and directions for future development. Results of testing the compression and comparing it with GRIB2 JPEG2000 wavelet compression are presented in the report:

          http://www.unidata.ucar.edu/software/netcdf/papers/sengcom.pdf

        In most cases the resulting compression is not as good as GRIB2, but it apparently is better than zlib in many cases according to Sullivan, and the conclusions suggest several possible paths forward to improve its compression.

        The source code is code is now open source and publicly available from an SVN repository.
          in ucar:
            svn co https://subversion.ucar.edu/waveletCompression
          outside ucar:
            svn co https://proxy.subversion.ucar.edu/waveletCompression

        Show
        Russ Rew added a comment - The RAL work by Steve Sullivan resulted in a Java implementation of "Sengcom" (short for scientific and engineering data compression) software, a new wavelet compression system to be considered as a candidate for netCDF-4 and HDF5 floating-point data. The software is labeled "PROOF OF CONCEPT CODE, FOR RESEARCH PURPOSES ONLY", so it still has some gaps and directions for future development. Results of testing the compression and comparing it with GRIB2 JPEG2000 wavelet compression are presented in the report:    http://www.unidata.ucar.edu/software/netcdf/papers/sengcom.pdf In most cases the resulting compression is not as good as GRIB2, but it apparently is better than zlib in many cases according to Sullivan, and the conclusions suggest several possible paths forward to improve its compression. The source code is code is now open source and publicly available from an SVN repository.   in ucar:     svn co https://subversion.ucar.edu/waveletCompression   outside ucar:     svn co https://proxy.subversion.ucar.edu/waveletCompression

          People

          • Assignee:
            Ward Fisher
            Reporter:
            Ed Hartnett
          • Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:

              Development