Archived

This forum has been archived. Please start a new discussion on GitHub.

Icepatch2 checksum memory usage

When my icepatch2 client is calculating checksum of a large (in my case 600MB) file, the process' memory usage increases by the same 600MB.
It seems like the icepatch2 client is loading the entire file into memory in order to calculate it's checksum.

Is this the intended behaviour?

The icepatch2 client is running on Windows XP, and is based on Ice-3.0.0 C++ libraries, and build with Visual C++ .NET 2003.

The Icepatch2Server is running on a Linux server, and on the server-side icepatch2calc shows the same behaviour (allocating and using 600MB memory when calculating the checksum of a 600MB file).

In contrast, linux command-line utility "md5sum(1)" allocates 3.5MB memory (of which it uses ~500kB) to calculate a checksum of the same file.

/ric

Comments

  • matthew
    matthew NL, Canada
    The checksum calculation reads the entire file into memory and the calculates the checksum -- so yes, this is the intended behaviour.

    Now as to why: We never really intended for icepatch to be used to distribute such large files. If any of the content of the enormous file changes then icepatch will resend the entire content, not only the changes.

    Since we only intended it to be used to distribute relatively small files we didn't think reading the entire file into memory represented a big problem. Not knowing what this large file contains, or how you intend to manage change its hard for me to give sensible advice on how to solve your problem.
  • matthew wrote:
    If any of the content of the enormous file changes then icepatch will resend the entire content, not only the changes.

    Yes, I've noticed that.
    I'm currently working on a "pre-patch" system that uses xdelta to apply a diff to the large file and then uses icepatch2 to update all smaller files, verify checksum of all files and provide a fallback mechanism (i.e. downloading the 600MB with icepatch2) if xdelta patching failed.
    matthew wrote:
    Not knowing what this large file contains, or how you intend to manage change its hard for me to give sensible advice on how to solve your problem.

    The full distribution managed by icepatch2 consistes of about 500 small files and one honkin' big (600MB+) graphics&sound database in Qube format.

    And of course, the larger a file is the bigger the probability that a given change in distribution will affect that specific file... :rolleyes:

    /ric
  • matthew
    matthew NL, Canada
    A thought that occurs to me is that you might want to think about a modification to icepatch to actually catalog and download specific parts of the archive.
  • matthew wrote:
    A thought that occurs to me is that you might want to think about a modification to icepatch to actually catalog and download specific parts of the archive.
    If I follow that path, I'll just end up reimplementing rsync in icepatch. :)

    I tested a few different solutions to see how well they updated a 600MB file with small changes scattered through the file.
    • Icepatch2: Sent new compressed file of about 400MB.
    • Rsync: Created and sent a 25MB compressed patch-data.
    • xdelta: Created a 2.5MB delta-file.

    Rsync is the most elegant solution since it created a patch on the fly, making a diff with src and dst on seperate computers and still only transmitting 25MB over the net.
    xdelta requires strict versioning and pre-calculated patches, but is very effective.

    Perhaps the rsync protocol is an idea for Icepatch3? ;)
    The rsync protocol and diff-method specification is open for anyone to implement.

    /ric
  • marc
    marc Florida
    Ric wrote:
    Rsync is the most elegant solution since it created a patch on the fly, making a diff with src and dst on seperate computers and still only transmitting 25MB over the net.

    This approach might be elegant, but it is not usable for IcePatch. IcePatch is intended to be used by hundreds to thousands of simultanous clients (and has been tested with so many). If it would calculate individual patches for each client on the fly, it would be much more computing intensive, and couldn't possibly serve so many clients.