Monday, April 21, 2008

Update for April 2008:

I started making beta releases for Xdelta-3.x four years ago. There were a lot of bugs then. Now the bug reports are dwindling, so much so that I've had a chance to work on new features, such as one requested by issue 36. Announcing the xdelta3 merge command. Syntax:
xdelta3 merge -m input0 [-m input1 [-m input2 ... ]] inputN output
This command allows you to combine a sequence of deltas, producing a single delta that combines the effect of applying two or more deltas into one. Since I haven't finished testing this feature, the code is only available in subversion. See here and enjoy.

Thursday, December 06, 2007

Xdelta-3.0t release notes:

Thursday, November 08, 2007

Xdelta-3.0s release notes:
Xdelta-3.0r release notes:
As an example of the new recode command and secondary compression options:
$ ./xdelta3 -S none -s xdelta3.h xdelta3.c OUT  # 53618 bytes
$ ./xdelta3 recode -S djw1 OUT OUT-djw1 # 51023 bytes
$ ./xdelta3 recode -S djw3 OUT OUT-djw3 # 47729 bytes
$ ./xdelta3 recode -S djw6 OUT OUT-djw6 # 45946 bytes
$ ./xdelta3 recode -S djw7 OUT OUT-djw7 # 45676 bytes
$ ./xdelta3 recode -S djw8 OUT OUT-djw8 # 45658 bytes
$ ./xdelta3 recode -S djw9 OUT OUT-djw9 # 45721 bytes
Secondary compression remains off by default. Passing -S djw is equivalent to -S djw6.

Wednesday, July 18, 2007

I'm happy about an e-mail from a manager at Pocket Soft, clarifying what was written in my previous post. Obviously, Pocket Soft deserves recognition here because, commercially speaking, they're the only basis for comparison sent by users. I am posting the entire content here.I've received a steady stream of e-mail regarding Xdelta ever since version 0.13 was released (Oct 12 1997) and this is one of the nicest ones ever. Thanks.

Re: Patents and the Oracles of the world

The threat model is: I will sell a software license excluding you from the (copyright-related) terms of the GPL, giving you unlimited use for a flat fee, but it's done without representations, warranties, liabilities, indemnities, etc. The argument is, your company could be sued over intellectual property rights if any of the following technologies and programs should fall to a claim (although they have never to my knowledge been in doubt): Zlib-1.x, Xdelta-1.x, and the draft-standard RFC 3284 (VCDIFF).

I will say this:But, patents aren't the real issue to me, in this post or the last, it's about support and features. This really is more than "just byte-level differencing at play".

Re: Support

In the unlikely event that you find an Xdelta crash or incorrect result, I'm really interested in fixing it. I keep track of issues. I respond to e-mail, like this one about directory-diff support:Here's a user-submitted perl script for recursive directory diff. (To which the sender replied, "I don't know barely nothing about perl," making two of us.) If you have your own engineers, if Xdelta passes your tests, you probably don't need support. If you're McAfee or the US Navy, you need support. Here's someone who needs support:I'd rather work on my car project than self-registering files, thank you very much. =)

Thursday, June 28, 2007

No new versions posted since March, so a few updates. One user sent a MSDOS .bat script for xdelta1/xdelta3 command-line compatibility, another sent a perl script for recursive directory diff, one user reports good performance for an in-kernel application (sample code), and some feature requests. Given the lack of bug reports, it's about time to take xdelta3 out of beta.

Several of you have requested a feature for supporting in-place updates, allowing you to apply deltas without making a copy of the file, which brings me to another bit of user feedback:And the funny part is, users were saying 5 years ago that Xdelta 1.x beats RTPatch. :)

I have to thank the IETF and previous work in open source (e.g., RFC 1950 – Zlib, RFC 3284 – VCDIFF) for making this possible. Zlib bills itself "A Massively Spiffy Yet Delicately Unobtrusive Compression Library (Also Free, Not to Mention Unencumbered by Patents)", and in fact Zlib inspired Xdelta's API from the start (it's "unobtrusive"). Let's not forget Zlib's other main advantage (it's "unencumbered"). As for the the previous request (in-place updates), interest is strong but patents could become an issue.

Multi-threaded encoding/decoding is another frequent request. The idea is that more CPUs can encode/decode faster by running in parallel over interleaved segments of the inputs. That's future work, and probably a lot of it, but I like the idea.

Xdelta 3.0q has 11,480 downloads. It's you the user that feeds open source, and thanks for the great feedback!

Sunday, March 25, 2007

Xdelta 3.0q features a new MSI installer for Windows.

Thanks to many of you for your feedback on Windows installation (issues 16, 26, 27). Thanks especially to Nikola Dudar for explaining how to do it.

Release 3.0q fixes Windows-specific issues: (1) do not buffer stderr, (2) allow file-read sharing. Thanks for the feedback!

Thanks for the following build tools:

Windows Installer XML (WiX) toolset
Microsoft Visual C++ 2005 Redistributable Package (x86)
Microsoft Visual C++ 2005 Express Edition
Microsoft Platform SDK for Windows Server 2003 R2

Tuesday, February 27, 2007



Plot shows the performance of variable compression-level settings, x (time) and y (compressed size), tested on a specific pair of cached 130MB inputs.

Compression has an inverse relation between time and space performance. The green line is a hyperbola for reference, f(x) = 1.33MB + (30KB*s) / (x - 2.45s). Sample points:

1.49MB in 2.9sec at ~45MB/sec (98.9% compression)
1.34MB in 6.5sec at ~20MB/sec (99.0% compression)

Sunday, February 18, 2007

Re: Xdelta 3.0p (download)

Xdelta-3.x processes a window of input at a time, using a fixed memory budget. This release raises some of the default maximums, for better large-file compression. The default memory settings for large files add up to about 100Mb. (*)

A user wrote to confirm good performance on 3.7Gb WIM files, great!At the other extreme, a developer wrote to ask about using xdelta3 in the Xen kernel. Here's an example of xdelta3 configured for a 4Kb page using 32Kb.

A user writes about code size:Thanks! Xdelta3 is faster and better than Xdelta1 with compression disabled (the -0 flag), thanks to VCDIFF and other improvements.

I'm looking for your ideas on recursive directory diff. One solution uses tar:This approach is better than computing pairwise differences, since data can be copied across files/directories. Pay attention to file order and source buffer size. Microsoft developers, consider using the WIM disk image format.

Thanks for the feedback (file a new issue).

(*) Xdelta-1.x processes the whole input at once, and its data structure takes linear space, (source file size/2)+O(1). Xdelta-1.x reads the the entire source file fully two times (to generate checksums, to match the input), using at least half as much memory as the source file size.

Wednesday, February 07, 2007

Users want speed, especially video gamers. I tested with some of your data: Unreal tournament and Wesnoth patches. These patches save 50-100MB per download.

Xdelta1 remains popular today because of speed, and xdelta3 until now hasn't been as fast (debdev has tests). Xdelta-3.0o has improved compression levels (download).

Over my sample data, the new default compression level (same as the command-line flag -1) is a little faster than, with compression roughly the same as Xdelta-1.x. Compression levels -3, -6, and -9 are also improved.

This release also features Swig support, e.g.,

Re: SVN teaser

SVN 125 has a new XDELTA environment variable for passing flags off the command-line, so you can use xdelta3 in conjunction with tar:This creates/extracts-from a delta-compressed tar file, without using intermediate files.

Thursday, February 01, 2007

Re: Performance

From The Old Joel on Software Forum:Thanks! 5 years later.

Another post in the same thread writes:Longest common subsequence isn't quite the same problem, but it's true that compression performance should be measured in several dimensions: size (of compression), memory usage, and speed. Xdelta3 supports compression levels -1 (fast) through -9 (best).

Virtual memory does not ease the space consideration. Reading from disk is terribly slow, so Xdelta3 avoids seeking backwards in the source file during compression. Read more about xdelta3 memory settings.

Re: SVN 100

If you're keeping up-to-date by subversion, with the xdelta source code, version 100 has a few recent changes: (1) compiles on cygwin (1.x and 3.x), (2) responding to bug report 17.

Sunday, January 28, 2007

Re: Xdelta 1.1.4

An especially grateful user wrote me to say thanks for the open-source software:Thanks! Another writes:Thanks again! =)

This is a maintenence release: Xdelta 1.1.4 remains substantially unchanged since 1999. This release fixes a bug: Compressed data from 32-bit platforms failed to decompress on 64-bit platforms. This is fixed in the decoder (it was a badly-designed "hint", now ignored), so you can now read old 32-bit patches on 64-bit platforms. Patches produced by 1.1.4 are still readable by 1.1.3 on the same platform. Still, Xdelta 1.1.x is losing its edge.

Xdelta3 compresses faster and better, using a standardized data format—VCDIFF, and has no dependence on gzip or bzip2. If using a standardized encoding is not particularly important for you, Xdelta3 supports secondary compression options. Xdelta3 (with the -9 -S djw flags) is comparible in terms of compression, but much faster than bsdiff. Xdelta3 includes a Windows .exe in the official release.

As always, I'm interested in your feedback (file a new issue). Are you compressing gigabyte files with Xdelta3? Have you used dfc-gorilla (by the makers of RTPatch)?

Sunday, January 21, 2007

Xdelta3 has a stream-oriented C/C++ interface. The application program can compress and decompress data streams using methods named xd3_encode_input() and xd3_decode_input(). With a non-blocking API, it's about as easy as programming Zlib.

Read about it here.

Thanks for your feedback (file a new issue).

Monday, January 15, 2007

Release 3.0l (download)

This release raises the instruction buffer size and documents the related performance issue. Problems related to setting -W (input window size) especially small or especially large were fixed: the new minimum is 16KB, the new maximum is 16MB. A regression in the unit test was fixed: the compression-level changes in 3.0k had broken several hard-coded test values.

The encoder has compression-level settings to optimize variously for time and space, such as the width of the checksum function, the number of duplicates to check, and what length is considered good enough. There are 10 parameters (Zlib, by comparision, has 4), but the flag which sets them (-C) is undocumented. I am documenting these and developing experiments to find better defaults.

There's a new page about external compression.

Thanks for your feedback (file a new issue).

Friday, January 12, 2007

Release 3.0k (download)

This is the first release making only performance improvements, not bug fixes. The default source buffer size has increased from 8 to 64 megabytes, and I've written some notes on tuning memory performance for large files. I've been running experiments to find better compression-level defaults. This release has two default compression levels, fast (-1 through -5) and the default slow (-6 through -9), both of which are faster and better than the previous settings. There's more work to do on tuning in both regards, memory and compression level, but this is a starting point.

There is a new wiki on command line syntax. Thanks for your feedback (file a new issue).

Sunday, January 07, 2007

Release 3.0j (download)

The self-test (run by xdelta3 test) now passes. There had been a regression related to external-compression, and several tests had to be disabled on Windows. Also fixes VCDIFF info commands on Windows (e.g., xdelta3 printdelta input) and memory errors in the Python module.

Thanks for your continued feedback (file a new issue). A user reports that xdelta3.exe should not depend on the C++ 8.0 Runtime. I agree—this is written in C. The source release includes a .vcproj file, in case you'd like to try for yourself.

Saturday, December 16, 2006

Thanks for your feedback. (Submit a new report).

Release 3.0i builds with native Windows I/O routines (enabled by /DXD3_WIN32=1) and has been tested on 64 bit files. (Issue closed).

Windows: download
Source: download

Sunday, December 10, 2006

#include <windows.h>

Ladies and Gents,
Version 3.0h runs on Windows.
Please head straight for the latest download of your choice:

Source
Windows x86-32
OSX PPC

I thought I'd share this first and test it later, let you be the judge.

There are not a lot of platform dependencies. The main() routine has helpful options:A call to gettimeofday() had to be replaced:

static long
get_millisecs_now (void)
{
#ifndef WIN32
struct timeval tv;
gettimeofday (& tv, NULL);
return (tv.tv_sec) * 1000L + (tv.tv_usec) / 1000;
#else
SYSTEMTIME st;
FILETIME ft;
__int64 *pi = (__int64*)&ft;
GetLocalTime(&st);
SystemTimeToFileTime(&st, &ft);
return (long)((*pi) / 10000);
#endif
}
The remaining changes were minimal, such as the printf format string for 64bit file offsets. I haven't run a 64bit test on Windows—I was too busy posting this. :-)

Please file issues here or send mail to <josh.macdonald@gmail.com>.

(Thanks to TortoiseSVN for keeping us in sync.)

To: Microsoft
Re: Windows support

Dear sirs,

Thanks for the free downloads!

Microsoft Visual C++ 2005 Express Edition
Microsoft Platform SDK for Windows Server 2003 R2

Wednesday, September 27, 2006

KDE.org asked how to use xdelta3. Like gzip with the additional -s SOURCE. Like gzip, -d means to decompress, and the default is to compress. For output, -c and -f flags behave likewise. Unlike gzip, xdelta3 defaults to stdout (instead of having an automatic extension). Without -s SOURCE, xdelta3 behaves like gzip for stdin/stdout purposes. See also.

Compress examples:

xdelta3 -s SOURCE TARGET > OUT
xdelta3 -s SOURCE TARGET OUT
xdelta3 -s SOURCE < TARGET > OUT

Decompress examples:

xdelta3 -d -s SOURCE OUT > TARGET
xdelta3 -d -s SOURCE OUT TARGET
xdelta3 -d -s SOURCE < OUT > TARGET

Sunday, September 24, 2006

The latest release 3.0g works—finally!—with 64-bit files. xdelta30g.tar.gz (subversion 11)

Xdelta3 has support for secondary compression, part of VCDIFF (RFC 3284) that allows external compression algorithms for the three parts of a VCDIFF window (instruction, address, data). VCDIFF is entirely based on byte codes, not variable-length codes.

Most compression programs are based on Huffman coding, and there's the well-known Burrows–Wheeler transform implemented by bzip2. But there's more to it. I asked Julian Seward and he pointed me at two very interesting articles by D. J. Wheeler.

The decription, and the code. Fascinating.

The algorithm works with multiple Huffman-code tables and several iterations. The input is divided into fixed-length chunks, each chunk assigned to a Huffman table. Each iteration, chunks are assigned to the table which gives the shortest encoding, then the tables are recomputed according to the chunks they were assigned.

The chunk encodings plus the chunk–table assignments are then transformed by move-to-front. Move-to-front works very well (especially following Burrows–Wheeler), but it presents another problem for later Huffman coding, because after move-to-front coding there tend to be many 0s, and a symbol having frequency greater than 50% will not be efficiently encoded by Huffman coding—even the shortest 1-bit code has redundency. To address this problem, the 0 symbol is replaced by two symbols (call them 0_0 and 0_1). These two symbols are used to code the run-length of 0s in binary.

What really fascinates me is how Wheeler does this in 780 lines of code—including the Burrows–Wheeler transform. Amazing.

Xdelta3 has a secondary compressor based on DJW enabled by -S djw (2000 lines w/ no Burrows–Wheeler transform). For comparison (*), there's another secondary compression (enabled by -S fgk), based on D. E. Knuth's Dynamic Huffman coding. A dynamic huffman code updates code-table frequencies after each symbol is encoded or decoded. The routines are efficient (it's Knuth, after all), but still slower than DJW.

Since there are no standards written for secondary compression, secondary compression is turned off by default, but I recommend giving -S djw a try. Sample results:

source7,064,064
target7,031,808
xdelta -e -1613,032
xdelta -e -9610,560
xdelta -e -1 -S djw461,298
xdelta -e -9 -S djw458,859
xdelta -e -1 -S fgk476,742
xdelta -e -9 -S fgk474,244

(*) only because I once implemented FGK in school

Sunday, August 27, 2006

Thanks to Google code hosting, there's a new Subversion repository for xdelta3 for us to keep in sync. Recently replaced some code, still having trouble crossing the 32bit/64bit boundary. Stay tuned...

Sunday, July 02, 2006

Thanks for your continuing reports. Release 3.0f fixes a bug in xd3_iopt_flush_instructions:

/* If forcing, pick instructions until the list is empty, otherwise this empties 50% of
* the queue. */
for (flushed = 0; ! xd3_rlist_empty (& stream->iopt.used); )
{
- if ((ret = xd3_iopt_add_encoding (stream,
- xd3_rlist_pop_front (& stream->iopt.used)))) { return ret; }
- // TODO: what about this fraction??
- if (! force && ++flushed > stream->iopt_size / 2) { break; }
+ xd3_rinst *renc = xd3_rlist_pop_front (& stream->iopt.used);
+ if ((ret = xd3_iopt_add_encoding (stream, renc)))
+ {
+ return ret;
+ }
+
+ if (! force)
+ {
+ if (++flushed > stream->iopt_size / 2)
+ {
+ break;
+ }
+
+ /* If there are only two instructions remaining, break, because they were
+ * not optimized. This means there were more than 50% eliminated by the
+ * loop above. */
+ r1 = xd3_rlist_front (& stream->iopt.used);
+ if (xd3_rlist_end(& stream->iopt.used, r1) ||
+ xd3_rlist_end(& stream->iopt.used, r2 = xd3_rlist_next (r1)) ||
+ xd3_rlist_end(& stream->iopt.used, r3 = xd3_rlist_next (r2)))
+ {
+ break;
+ }
+ }
}

Saturday, May 27, 2006

And we're back... The site was down for most of this month. It's a terribly uninteresting story. I've been on vacation, but before leaving I put together a new release, 3.0e, which fixes major bugs. Approaching a stable release? Possibly.

I'd like to thank the users for sending detailed reports, especially test cases.

Saturday, February 18, 2006

Latest release: xdelta30d.tar.gz fixes a bug sent in by at least three users. Thanks for great reporting. The bug was an assertion failure during encode, so it's one you would have noticed, had it happened to you.

Several of you have sent patches for building on Windows, which I appreciate. This version should at least be easier to build (e.g., rename a field from near to near_array, add "b" to the fopen() mode, eliminate unnecessary sys/ includes, ...).

The author of kdiff3 mailed, asking for help visualizing binary diffs. We'll get that working, but in the meanwhile there is a way to print the contents of a VCDIFF input, xdelta3 printdelta INPUT. It prints the VCDIFF header, followed by the instruction list. For example:

Offset Code Type1 Size1 @Addr1 + Type2 Size2 @Addr2
000000 019 CPY_0 59357 @0
059357 013 ADD 12
059369 037 CPY_1 5 @120794
059374 019 CPY_0 2066 @59370

Friday, February 17, 2006

A number of people have written me recently regarding xdelta3 on windows. Fact is, I would love it. One writes:

> Hello,
>
> I have compiled xdelta with cygwin. I have noticed several versions of
> executables produced, but my question is quite simple.
> I want to make diff-like delta from one file to another in the way I could
> in xdelta 1.x, the syntax was
>
> xdelta delta original_file new_file delta_file
>
> and to create new file from delta
>
> xdelta patch delta_file original_file new_file
>
> What combination of arguments and what commands shall I issue with xdelta3
> to gain the same effect?

The syntax is similar to gzip, with the addition of a "-s original_file" argument. By themselves,

xdelta3 -df
xdelta3 -f

operate the similarly to gzip, even without the -s flag.


> I can't figured out what is wrong with the module. But I solve the
> problem calling the C binary from os.system('/usr/bin/xdelta3...').
>
> I whish to tell you that I'm very impressed by xdelta3 and would like
> to see finished and polished. I just hope you have some time to
> deliver an xdelta3 release.
>
> Again, thank you for xdelta3!


You're welcome. Thanks for writing in. I have a big pile of mail still to go through, and I have at least one xdelta3 bug report. Stay tuned...

Saturday, November 05, 2005

Think what you will of sourceforge.net, but thanks for all the downloads.


Sunday, September 25, 2005

I made the last xdelta3 release on Linux T20 Thinkpad. Today, a Mac G5. After an hour fiddling with minor things (size_t changes, test buffer sizes). The OS X debug binary passes its unittest. The optimized version, not so. Hmmm. Release xdelta30c.tar.gz. (Needs tuning!)

Wednesday, July 21, 2004

I admit, the teasers were mean. Here's what you need to know: the first public xdelta3 release. Phew, glad that's out of the way. Here's some more information. Here's an obligatory link to zlib, the inspiration behind the new API.

I was in Bruxelles 6 years ago today, following a conference talk here. I remember there were lots of bunnys hopping around the university. Happy Independence Day, Belgium.

A word on licensing. After much deliberation over the merits of this license and that license, I have chosen to apply the GNU Public License version 2 to my code.

Re: Licensing

You're still reading, I see. I am not a lawyer, nor are you. As of this point in time, I wrote every single line of xdelta3 code, except a few snippets of graciously copied from the public domain, as noted in the code, such as adler32(). Read my acknowledgements. If the terms of the GPL are not to your liking, please contact me.

Re: Does it work?

Yes, and there are tests to prove it. However, I decided to release the code as-is and see what comes of it. I imagine there are correctness bugs remaining, but it's more likely you'll find a performance bug. There are a lot of knobs in the string-matching implementation, which is relatively untuned.

If you find a bug, I'll fix it. If you're working for an open-source project and you want help, let me know!

Sunday, May 30, 2004

Teaser1.
Teaser2.
The regression test has more bugs than it turns up, all I really need is a license. Hmm. Happy Memorial Day.

Sunday, May 09, 2004

Xdelta release 3 is moving so, so slowly but these days I am working a bit. My old Linux machine died and it took a while to repair my data and re-install. I could not compile PRCS with g++-3.2, so I spent the afternoon fixing it (prcs-1.3.3).

Getting back to Xdelta, I built Python-2.3 because SuSe-8.1's Python-2.2 installation could not compile modules. Prior to losing that hard drive, nine months ago, I had been working on a regression test. Sadly, there's not much more weekend.

Happy Mothers Day. Happy Victory Day, Russia.

Tuesday, March 02, 2004

I promise, I'll post something here soon about past, present, and future releases of xdelta. I have a bunch of brand-new code to implement RFC 3284 sitting on my hard-drive. When will I have time to finish it? Until then, I have a personal blog. BTW: I work at Google, they keep me busy.

This page is powered by Blogger. Isn't yours?