<$BlogRSDUrl$>

Monday, January 13, 2014

Re: Release 3.0.8 (source) (win32) (x86-64)

Windows build now includes support for liblzma secondary compression with "-S lzma". (Sorry for the delay.)

(I would like to say not thank you to code.google.com for disabling download support two days hence.)

Monday, January 21, 2013

Re: Release 3.0.6 (source) (win32) (win-x64)

This is a bug fix for a performance regression in 3.0.5, which I made available on November 12, 2012 without announcement. The 3.0.5 encoder would achieve poor compression for inputs larger than the source window, due to an improper fix for issue 149.

Please accept my apologies! This has demonstrated several needed improvements in the release process I'm using, which I'll work on putting in the next release.

Sunday, August 19, 2012

Re: 3.0.4

Includes bug-fixes for Windows, issues noted 143, 144, 145 (incorrect UNICODE setting).

Non-Windows build now includes support for liblzma secondary compression with "-S lzma". Windows liblzma support coming soon.

Sunday, July 15, 2012

Release 3.0.3 (source)

This release fixes issues with external compression and potential buffer overflows when using the -v setting (verbose output) with very large input files.

This release includes automake / autoconf / libtool support.

A sample iOS application is included, demonstrating how to use xdelta3 with the iPad / iPhone.

I've updated the 32-bit Windows binary here, I'll post an update when a 64-bit build is available.


Saturday, January 08, 2011

Release 3.0.0 (source)

A minor change in behavior from previous releases. If you run xdelta3 on source files smaller than 64MB, you may notice xdelta3 using more memory than it has in the past. If this is an issue, lower the -B flag (see issue 113).

3.0 is a stable release series.


Monday, August 02, 2010

Release 3.0z (source)

This release includes both 32-bit and 64-bit Windows executables and adds support for reading the source file from a Windows named pipe (issue 101).

External compression updates. 3.0z prints a new warning when it decompresses externally-compressed inputs, since I've received a few reports of users confused by checksum failures. Remember, it is not possible for xdelta3 to decompress and recompress a file and ensure it has the same checksum. 3.0z improves error handling for externally-compressed inputs with "trailing garbage" and also includes a new flag to force the external compression command. Pass '-F' to xdelta3 and it will pass '-f' to the external compression command.

Monday, February 15, 2010

Re: 3.0y (source)
Re: 3.0x (source)

Version 3.0x 3.0y fixes several regressions introduced in 3.0w related to the new support for streaming the source file from a FIFO. This was a long-requested feature and I'm pleased to report that now, with the fixes in 3.0x 3.0y, it appears to be working quite well. The upshot of this feature is that you can encode or decode based on a compressed source file, without decompressing it to an intermediate file. In fact, you can expect the same compression with or without a streaming source file.

There were also reports of the encoder becoming I/O bound in previous releases, caused by the encoder improperly seeking backwards farther than the settings (namely, the -B flag) allowed. This is also fixed, and there's a new test to ensure it won't happen again.

Windows users: I need to investigate issue 101 before building a release of3.0x 3.0y. Until I can confirm that streaming-source-file support works on Windows, please continue using the 3.0u release.

Update: The built-in support for automatic decompression of inputs is interacting badly with the new source handling logic, results in poor compression.

Sunday, October 25, 2009

Re: 3.0w (source)

With such a good state of affairs (i.e., no bug reports), I was able to tackle a top-requested feature (59, 73). Many of you have asked to be able to encode deltas using a FIFO as the source file, because it means you can encode/decode from a compressed-on-disk source file when you don't have enough disk space for a temporary uncompressed copy. This is now supported, with one caveat.

When decoding with a non-seekable source file, the -B flag, which determines how much space is dedicated to its block cache, must be set at least as large as was used for encoding. If the decoder cannot proceed because -B was not set large enough, you will see:

xdelta3: non-seekable source: copy is too far back (try raising -B): XD3_INTERNAL

The stream->src->size field has been eliminated. Internally, a new stream->src->eof_known state has been introduced. This was a big improvement in code quality because, now, the source and target files are treated the same with respect to external (de)compression and several branches of code are gone for good.

Update Issue 95: command-line decode I/O-bound in 3.0w

Wednesday, May 06, 2009

Re: MS "Linker Version Info" in 3.0u

I can't explain this problem report *SEE BELOW*, indicating that Xdelta on Microsoft platforms is slower than the previous build by a factor of two.

?

Hello ... !

First of all I want to thank you for creating XDelta tool.
Realy nice and well documented stuff which sometimes helps a lot.
When I changed from v3.0t to v3.0u I've noticed some slowdown
in processing time. Well, it was just visual observation so I decided
to conduct a couple of tests. Here's what I've done.
Test was conducted on virtual memory disk using RAMDiskXP v2.0.0.100
from Cenatek Inc. It was done to exclude any impact from IO system.
Processing time was measured with Timer v8.00 tool by Igor Pavlov.
Test files size is about 275 MB. More exactly speaking 288 769 595
bytes for Test.dat.old and 288 771 262 bytes for Test.dat
Command line for XDelta is:
timer xdelta3 -v -9 -A= -B x -W x -I 0 -P x -s Test.dat.old Test.dat Diff.dat
where x is one of the values given below with the test results.

3.0t 3.0u
----- ----- --------------
16384 3.687 6.391 73.3% slower
65536 3.469 6.125 76.6% slower
1048576 2.578 5.453 111.5% slower
4194304 3.281 6.625 101.9% slower
8388608 3.953 7.718 95.2% slower

As you see v3.0u is averagely 91.7% SLOWER !!! I don't think it's a
some evil coincidence cause I redone every test twice. I have only one
clue. I see Linker Version Info in v3.0t exe-file is 8.0 while is 9.0
for the v3.0u exe-file so I suppose you changed or compilator itself
or its version.
I'll be very glad to hear from you.
Truly yours, ...

Wednesday, March 11, 2009

Re: 3.0v source release

I'm releasing SVN 281, which has an API fix (see issue 79). There's a new group for future announcements.

Thursday, January 29, 2009


The wiki has great comments:

All I want to do if create a delta (diff) of two binary files.

How do I do that? If indeed xdelta can do this?

Make patch:
xdelta3.exe -e -s old_file new_file delta_file

Apply patch:
xdelta3.exe -d -s old_file delta_file decoded_new_file

(From Batch file for xdelta1/xdelta3 compatibility)

I am using xdelta in windows. If a file is opened in microsoft word and while the file is still open I try to use xdelta to produce difference with another file, the following error happens. xdelta3: file open failed: read: : The process cannot access the file because it is being used by another process.

I suspect that this is because xdelta tries to open the file in non share mode. I found that if ms word is using a file and we try to open that file in a c# program in non shared mode the same error is thrown as above but if we open the the file in shared mode while in use by MS word,no error occurs.

Yes, I too can confirm the problem that a xdelta file cannot be created if the source file is being shared with another application.

if you are using either Windows 2003 server or Windows 2008 server, you can use 'volume shadow copy' vssadmin create shadow /For=C:

then use dosdev.exe to map a drive to the shadow, you can then access the files without there being an issue with locks, this applies to any file that was located on that drive including sql databases and any other locked files.

if you are running .Net Framework v3.5 I have written a small utility to automate the creation and deletion of the shadows and map the drive, delete the drive. Email me directly at alex at intralan dot co dot uk and I will send a copy.

my friend and i r using this program but he gets better compression than me and is not tellin me how he is getting almost double the compression, i make the file 650mb from 1gb but he makes it 349mb, same source files r used the command m using is delta3.exe -e -9 -S djw -s source target patch what is the command line for maximum compression

How can I get the python version to work?

This is what I get when I run setup. I compiled it with Cygwin then with Visual Studio, both times I get the same msg when running setup. Any suggestions?

c:/xdelta3> python setup.py install --verbose --compile --force running install running build running build_ext error: Python was built with Visual Studio 2003; extensions must be built with a compiler than can generate compatible binaries. Visual Studio 2003 was not found on this system. If you have Cygwin installed, you can try compiling with MingW32?, by passing "-c mingw32" to setup.py.

What are the commands for the new merging/combining features?

Is there a way to get xdelta to take two folder path, recurse through them and do a delta for dir1 vs dir2?

Thank you for xdelta! <3

Answers to the final two questions:

To merge/combine:

xdelta3.exe merge -m file.1.2 -m file.2.3 file.3.4 patch.1.4

There is not a way to take two dirs, recurse through them and do a delta for dir1 vs dir2, but that would be a great feature. (Issue 21)

Other top issues and requests:

1. Source file from a FIFO: currently xdelta3 does not support reading the source file from a non-seekable file, but it's not because the of algorithm, it's a small code limitation and can be fixed. (Issue 59 is a dup of 73, which includes a patch).
2. Smart ISO image compression: If you take a set of files and produce ISO1, then produce a later set of files as ISO2, but the files are in a completely different order, compression suffers. Try setting -B size equal to the ISO size, if possible. (Issue 71)

Saturday, September 13, 2008

Re: Xdelta-3.0u (source)

2008.10.12 update: Windows executable posted

Release Notes:The new merge command allows you to combine a sequence of deltas, to produce a single output that represents the net effect of the sequence. The command is reasonably efficient because it computes the result directly, without constructing any intermediate copies (and without access to the first-of-chain source). I say "reasonably" because there's a soft spot in this algorithm.

A VCDIFF encoding contains two kinds of copy instructions: (1) copies from the source file, which the merge command can translate with straight-forward arithmetic, and (2) copies from the target file (itself), which are more difficult. Because a target copy can copy from another target copy earlier in the window, merging from a target copy at the end of a window may involve following a long chain of target copies all the way to the beginning of the window. The merge command implements this today using recursion, which is not an efficient solution. A better solution involves computing a couple of auxiliary data structures so that: (1) finding the origin of a target copy is a constant-time operation, (2) finding where (if at all) the origin of a target copy is copied by a subsequent delta in the chain. The second of these structures requires, essentially, a function to compute the inverse mapping of a delta, which is a feature that has its own applications.

Summary: the merge command handles target copies inefficiently, and the code to solve this problem will let us reverse deltas. Together, the merge and reverse commands make a powerful combination, making it possible to store O(N) delta files and construct deltas for updating between O(N^2) pairs using efficient operations. This was the basis of xProxy: A transparent caching and delta transfer system for web objects, which used an older version of Xdelta.

I'm making a source-only release, which I haven't done in a while, because I don't have the necessary build tools for Windows (due to a broken machine) and I don't want to delay this release because of it.

Thursday, September 04, 2008

Re: Google's open-vcdiff

Google released a new open-source library for RFC 3284 (VCDIFF) encoding and decoding, designed to support their proposed Shared Dictionary Compression over HTTP (SDCH, a.k.a. "Sandwich") protocol.

This is great news. The author and I have had numerous discussions over a couple of features that VCDIFF lacks, and now that we have two open-source implementations we're able to make real progress on the standard. Both Google's open-vcdiff and Xdelta-3.x implement a extensions to the standard, but I ran a simple interoperability test and things look good. Run xdelta3 with "-n" to disable checksums and "-A" to disable its application header.

A big thanks to Lincoln Smith and Google's contributors. (Disclaimer: I am an employee of Google, but I did not contribute any code to open-vcdiff.)

P.S. I'm still debugging the xdelta3 merge command, stay tuned to issue 36.

Wednesday, July 09, 2008

Re: Regression test framework

3.0t released in December 2007 has turned out to be extremely stable, and I'm busy preparing the first non-beta release candidate. The problem with being stable is the risk of regression, so I've been nervously putting together a new regression testing framework to help exercise obscure bugs.

The most critical bug reported since 3.0t is a non-blocking API-specific problem. Certain bugs only affect the API, not the xdelta3 command-line application, because the command-line application uses blocking I/O even though the API supports non-blocking I/O. Issue 70 reported an infinite loop processing a certain pair of files. The reporter graciously included sample inputs to reproduce the problem, and I fixed it in SVN 239.

But I wasn't happy with the fix until now, thanks to the new regression testing framework. With ~50 lines of code, the test creates an in-memory file descriptor, then creates two slight variations intended to trigger the bug. SVN 256 contains the new test.

So just what is it that makes me feel 3.x is so stable? It's e-mails like this from Alex White at intralan.co.uk:Thanks for all the feedback! I like stable software too.

Monday, April 21, 2008

Update for April 2008:

I started making beta releases for Xdelta-3.x four years ago. There were a lot of bugs then. Now the bug reports are dwindling, so much so that I've had a chance to work on new features, such as one requested by issue 36. Announcing the xdelta3 merge command. Syntax:
xdelta3 merge -m input0 [-m input1 [-m input2 ... ]] inputN output
This command allows you to combine a sequence of deltas, producing a single delta that combines the effect of applying two or more deltas into one. Since I haven't finished testing this feature, the code is only available in subversion. See here and enjoy.

Thursday, December 06, 2007

Xdelta-3.0t release notes:

Thursday, November 08, 2007

Xdelta-3.0s release notes:
Xdelta-3.0r release notes:
As an example of the new recode command and secondary compression options:
$ ./xdelta3 -S none -s xdelta3.h xdelta3.c OUT  # 53618 bytes
$ ./xdelta3 recode -S djw1 OUT OUT-djw1 # 51023 bytes
$ ./xdelta3 recode -S djw3 OUT OUT-djw3 # 47729 bytes
$ ./xdelta3 recode -S djw6 OUT OUT-djw6 # 45946 bytes
$ ./xdelta3 recode -S djw7 OUT OUT-djw7 # 45676 bytes
$ ./xdelta3 recode -S djw8 OUT OUT-djw8 # 45658 bytes
$ ./xdelta3 recode -S djw9 OUT OUT-djw9 # 45721 bytes
Secondary compression remains off by default. Passing -S djw is equivalent to -S djw6.

Wednesday, July 18, 2007

I'm happy about an e-mail from a manager at Pocket Soft, clarifying what was written in my previous post. Obviously, Pocket Soft deserves recognition here because, commercially speaking, they're the only basis for comparison sent by users. I am posting the entire content here.I've received a steady stream of e-mail regarding Xdelta ever since version 0.13 was released (Oct 12 1997) and this is one of the nicest ones ever. Thanks.

Re: Patents and the Oracles of the world

The threat model is: I will sell a software license excluding you from the (copyright-related) terms of the GPL, giving you unlimited use for a flat fee, but it's done without representations, warranties, liabilities, indemnities, etc. The argument is, your company could be sued over intellectual property rights if any of the following technologies and programs should fall to a claim (although they have never to my knowledge been in doubt): Zlib-1.x, Xdelta-1.x, and the draft-standard RFC 3284 (VCDIFF).

I will say this:But, patents aren't the real issue to me, in this post or the last, it's about support and features. This really is more than "just byte-level differencing at play".

Re: Support

In the unlikely event that you find an Xdelta crash or incorrect result, I'm really interested in fixing it. I keep track of issues. I respond to e-mail, like this one about directory-diff support:Here's a user-submitted perl script for recursive directory diff. (To which the sender replied, "I don't know barely nothing about perl," making two of us.) If you have your own engineers, if Xdelta passes your tests, you probably don't need support. If you're McAfee or the US Navy, you need support. Here's someone who needs support:I'd rather work on my car project than self-registering files, thank you very much. =)

Thursday, June 28, 2007

No new versions posted since March, so a few updates. One user sent a MSDOS .bat script for xdelta1/xdelta3 command-line compatibility, another sent a perl script for recursive directory diff, one user reports good performance for an in-kernel application (sample code), and some feature requests. Given the lack of bug reports, it's about time to take xdelta3 out of beta.

Several of you have requested a feature for supporting in-place updates, allowing you to apply deltas without making a copy of the file, which brings me to another bit of user feedback:And the funny part is, users were saying 5 years ago that Xdelta 1.x beats RTPatch. :)

I have to thank the IETF and previous work in open source (e.g., RFC 1950 – Zlib, RFC 3284 – VCDIFF) for making this possible. Zlib bills itself "A Massively Spiffy Yet Delicately Unobtrusive Compression Library (Also Free, Not to Mention Unencumbered by Patents)", and in fact Zlib inspired Xdelta's API from the start (it's "unobtrusive"). Let's not forget Zlib's other main advantage (it's "unencumbered"). As for the the previous request (in-place updates), interest is strong but patents could become an issue.

Multi-threaded encoding/decoding is another frequent request. The idea is that more CPUs can encode/decode faster by running in parallel over interleaved segments of the inputs. That's future work, and probably a lot of it, but I like the idea.

Xdelta 3.0q has 11,480 downloads. It's you the user that feeds open source, and thanks for the great feedback!

Sunday, March 25, 2007

Xdelta 3.0q features a new MSI installer for Windows.

Thanks to many of you for your feedback on Windows installation (issues 16, 26, 27). Thanks especially to Nikola Dudar for explaining how to do it.

Release 3.0q fixes Windows-specific issues: (1) do not buffer stderr, (2) allow file-read sharing. Thanks for the feedback!

Thanks for the following build tools:

Windows Installer XML (WiX) toolset
Microsoft Visual C++ 2005 Redistributable Package (x86)
Microsoft Visual C++ 2005 Express Edition
Microsoft Platform SDK for Windows Server 2003 R2

Tuesday, February 27, 2007



Plot shows the performance of variable compression-level settings, x (time) and y (compressed size), tested on a specific pair of cached 130MB inputs.

Compression has an inverse relation between time and space performance. The green line is a hyperbola for reference, f(x) = 1.33MB + (30KB*s) / (x - 2.45s). Sample points:

1.49MB in 2.9sec at ~45MB/sec (98.9% compression)
1.34MB in 6.5sec at ~20MB/sec (99.0% compression)

Sunday, February 18, 2007

Re: Xdelta 3.0p (download)

Xdelta-3.x processes a window of input at a time, using a fixed memory budget. This release raises some of the default maximums, for better large-file compression. The default memory settings for large files add up to about 100Mb. (*)

A user wrote to confirm good performance on 3.7Gb WIM files, great!At the other extreme, a developer wrote to ask about using xdelta3 in the Xen kernel. Here's an example of xdelta3 configured for a 4Kb page using 32Kb.

A user writes about code size:Thanks! Xdelta3 is faster and better than Xdelta1 with compression disabled (the -0 flag), thanks to VCDIFF and other improvements.

I'm looking for your ideas on recursive directory diff. One solution uses tar:This approach is better than computing pairwise differences, since data can be copied across files/directories. Pay attention to file order and source buffer size. Microsoft developers, consider using the WIM disk image format.

Thanks for the feedback (file a new issue).

(*) Xdelta-1.x processes the whole input at once, and its data structure takes linear space, (source file size/2)+O(1). Xdelta-1.x reads the the entire source file fully two times (to generate checksums, to match the input), using at least half as much memory as the source file size.

Wednesday, February 07, 2007

Users want speed, especially video gamers. I tested with some of your data: Unreal tournament and Wesnoth patches. These patches save 50-100MB per download.

Xdelta1 remains popular today because of speed, and xdelta3 until now hasn't been as fast (debdev has tests). Xdelta-3.0o has improved compression levels (download).

Over my sample data, the new default compression level (same as the command-line flag -1) is a little faster than, with compression roughly the same as Xdelta-1.x. Compression levels -3, -6, and -9 are also improved.

This release also features Swig support, e.g.,

Re: SVN teaser

SVN 125 has a new XDELTA environment variable for passing flags off the command-line, so you can use xdelta3 in conjunction with tar:This creates/extracts-from a delta-compressed tar file, without using intermediate files.

Thursday, February 01, 2007

Re: Performance

From The Old Joel on Software Forum:Thanks! 5 years later.

Another post in the same thread writes:Longest common subsequence isn't quite the same problem, but it's true that compression performance should be measured in several dimensions: size (of compression), memory usage, and speed. Xdelta3 supports compression levels -1 (fast) through -9 (best).

Virtual memory does not ease the space consideration. Reading from disk is terribly slow, so Xdelta3 avoids seeking backwards in the source file during compression. Read more about xdelta3 memory settings.

Re: SVN 100

If you're keeping up-to-date by subversion, with the xdelta source code, version 100 has a few recent changes: (1) compiles on cygwin (1.x and 3.x), (2) responding to bug report 17.

Sunday, January 28, 2007

Re: Xdelta 1.1.4

An especially grateful user wrote me to say thanks for the open-source software:Thanks! Another writes:Thanks again! =)

This is a maintenence release: Xdelta 1.1.4 remains substantially unchanged since 1999. This release fixes a bug: Compressed data from 32-bit platforms failed to decompress on 64-bit platforms. This is fixed in the decoder (it was a badly-designed "hint", now ignored), so you can now read old 32-bit patches on 64-bit platforms. Patches produced by 1.1.4 are still readable by 1.1.3 on the same platform. Still, Xdelta 1.1.x is losing its edge.

Xdelta3 compresses faster and better, using a standardized data format—VCDIFF, and has no dependence on gzip or bzip2. If using a standardized encoding is not particularly important for you, Xdelta3 supports secondary compression options. Xdelta3 (with the -9 -S djw flags) is comparible in terms of compression, but much faster than bsdiff. Xdelta3 includes a Windows .exe in the official release.

As always, I'm interested in your feedback (file a new issue). Are you compressing gigabyte files with Xdelta3? Have you used dfc-gorilla (by the makers of RTPatch)?

Sunday, January 21, 2007

Xdelta3 has a stream-oriented C/C++ interface. The application program can compress and decompress data streams using methods named xd3_encode_input() and xd3_decode_input(). With a non-blocking API, it's about as easy as programming Zlib.

Read about it here.

Thanks for your feedback (file a new issue).

Monday, January 15, 2007

Release 3.0l (download)

This release raises the instruction buffer size and documents the related performance issue. Problems related to setting -W (input window size) especially small or especially large were fixed: the new minimum is 16KB, the new maximum is 16MB. A regression in the unit test was fixed: the compression-level changes in 3.0k had broken several hard-coded test values.

The encoder has compression-level settings to optimize variously for time and space, such as the width of the checksum function, the number of duplicates to check, and what length is considered good enough. There are 10 parameters (Zlib, by comparision, has 4), but the flag which sets them (-C) is undocumented. I am documenting these and developing experiments to find better defaults.

There's a new page about external compression.

Thanks for your feedback (file a new issue).

Friday, January 12, 2007

Release 3.0k (download)

This is the first release making only performance improvements, not bug fixes. The default source buffer size has increased from 8 to 64 megabytes, and I've written some notes on tuning memory performance for large files. I've been running experiments to find better compression-level defaults. This release has two default compression levels, fast (-1 through -5) and the default slow (-6 through -9), both of which are faster and better than the previous settings. There's more work to do on tuning in both regards, memory and compression level, but this is a starting point.

There is a new wiki on command line syntax. Thanks for your feedback (file a new issue).

Sunday, January 07, 2007

Release 3.0j (download)

The self-test (run by xdelta3 test) now passes. There had been a regression related to external-compression, and several tests had to be disabled on Windows. Also fixes VCDIFF info commands on Windows (e.g., xdelta3 printdelta input) and memory errors in the Python module.

Thanks for your continued feedback (file a new issue). A user reports that xdelta3.exe should not depend on the C++ 8.0 Runtime. I agree—this is written in C. The source release includes a .vcproj file, in case you'd like to try for yourself.

Saturday, December 16, 2006

Thanks for your feedback. (Submit a new report).

Release 3.0i builds with native Windows I/O routines (enabled by /DXD3_WIN32=1) and has been tested on 64 bit files. (Issue closed).

Windows: download
Source: download

Sunday, December 10, 2006

#include <windows.h>

Ladies and Gents,
Version 3.0h runs on Windows.
Please head straight for the latest download of your choice:

Source
Windows x86-32
OSX PPC

I thought I'd share this first and test it later, let you be the judge.

There are not a lot of platform dependencies. The main() routine has helpful options:A call to gettimeofday() had to be replaced:

static long
get_millisecs_now (void)
{
#ifndef WIN32
struct timeval tv;
gettimeofday (& tv, NULL);
return (tv.tv_sec) * 1000L + (tv.tv_usec) / 1000;
#else
SYSTEMTIME st;
FILETIME ft;
__int64 *pi = (__int64*)&ft;
GetLocalTime(&st);
SystemTimeToFileTime(&st, &ft);
return (long)((*pi) / 10000);
#endif
}
The remaining changes were minimal, such as the printf format string for 64bit file offsets. I haven't run a 64bit test on Windows—I was too busy posting this. :-)

Please file issues here or send mail to <josh.macdonald@gmail.com>.

(Thanks to TortoiseSVN for keeping us in sync.)

To: Microsoft
Re: Windows support

Dear sirs,

Thanks for the free downloads!

Microsoft Visual C++ 2005 Express Edition
Microsoft Platform SDK for Windows Server 2003 R2

Wednesday, September 27, 2006

KDE.org asked how to use xdelta3. Like gzip with the additional -s SOURCE. Like gzip, -d means to decompress, and the default is to compress. For output, -c and -f flags behave likewise. Unlike gzip, xdelta3 defaults to stdout (instead of having an automatic extension). Without -s SOURCE, xdelta3 behaves like gzip for stdin/stdout purposes. See also.

Compress examples:

xdelta3 -s SOURCE TARGET > OUT
xdelta3 -s SOURCE TARGET OUT
xdelta3 -s SOURCE < TARGET > OUT

Decompress examples:

xdelta3 -d -s SOURCE OUT > TARGET
xdelta3 -d -s SOURCE OUT TARGET
xdelta3 -d -s SOURCE < OUT > TARGET

Sunday, September 24, 2006

The latest release 3.0g works—finally!—with 64-bit files. xdelta30g.tar.gz (subversion 11)

Xdelta3 has support for secondary compression, part of VCDIFF (RFC 3284) that allows external compression algorithms for the three parts of a VCDIFF window (instruction, address, data). VCDIFF is entirely based on byte codes, not variable-length codes.

Most compression programs are based on Huffman coding, and there's the well-known Burrows–Wheeler transform implemented by bzip2. But there's more to it. I asked Julian Seward and he pointed me at two very interesting articles by D. J. Wheeler.

The decription, and the code. Fascinating.

The algorithm works with multiple Huffman-code tables and several iterations. The input is divided into fixed-length chunks, each chunk assigned to a Huffman table. Each iteration, chunks are assigned to the table which gives the shortest encoding, then the tables are recomputed according to the chunks they were assigned.

The chunk encodings plus the chunk–table assignments are then transformed by move-to-front. Move-to-front works very well (especially following Burrows–Wheeler), but it presents another problem for later Huffman coding, because after move-to-front coding there tend to be many 0s, and a symbol having frequency greater than 50% will not be efficiently encoded by Huffman coding—even the shortest 1-bit code has redundency. To address this problem, the 0 symbol is replaced by two symbols (call them 0_0 and 0_1). These two symbols are used to code the run-length of 0s in binary.

What really fascinates me is how Wheeler does this in 780 lines of code—including the Burrows–Wheeler transform. Amazing.

Xdelta3 has a secondary compressor based on DJW enabled by -S djw (2000 lines w/ no Burrows–Wheeler transform). For comparison (*), there's another secondary compression (enabled by -S fgk), based on D. E. Knuth's Dynamic Huffman coding. A dynamic huffman code updates code-table frequencies after each symbol is encoded or decoded. The routines are efficient (it's Knuth, after all), but still slower than DJW.

Since there are no standards written for secondary compression, secondary compression is turned off by default, but I recommend giving -S djw a try. Sample results:

source7,064,064
target7,031,808
xdelta -e -1613,032
xdelta -e -9610,560
xdelta -e -1 -S djw461,298
xdelta -e -9 -S djw458,859
xdelta -e -1 -S fgk476,742
xdelta -e -9 -S fgk474,244

(*) only because I once implemented FGK in school

Sunday, August 27, 2006

Thanks to Google code hosting, there's a new Subversion repository for xdelta3 for us to keep in sync. Recently replaced some code, still having trouble crossing the 32bit/64bit boundary. Stay tuned...

Sunday, July 02, 2006

Thanks for your continuing reports. Release 3.0f fixes a bug in xd3_iopt_flush_instructions:

/* If forcing, pick instructions until the list is empty, otherwise this empties 50% of
* the queue. */
for (flushed = 0; ! xd3_rlist_empty (& stream->iopt.used); )
{
- if ((ret = xd3_iopt_add_encoding (stream,
- xd3_rlist_pop_front (& stream->iopt.used)))) { return ret; }
- // TODO: what about this fraction??
- if (! force && ++flushed > stream->iopt_size / 2) { break; }
+ xd3_rinst *renc = xd3_rlist_pop_front (& stream->iopt.used);
+ if ((ret = xd3_iopt_add_encoding (stream, renc)))
+ {
+ return ret;
+ }
+
+ if (! force)
+ {
+ if (++flushed > stream->iopt_size / 2)
+ {
+ break;
+ }
+
+ /* If there are only two instructions remaining, break, because they were
+ * not optimized. This means there were more than 50% eliminated by the
+ * loop above. */
+ r1 = xd3_rlist_front (& stream->iopt.used);
+ if (xd3_rlist_end(& stream->iopt.used, r1) ||
+ xd3_rlist_end(& stream->iopt.used, r2 = xd3_rlist_next (r1)) ||
+ xd3_rlist_end(& stream->iopt.used, r3 = xd3_rlist_next (r2)))
+ {
+ break;
+ }
+ }
}

Saturday, May 27, 2006

And we're back... The site was down for most of this month. It's a terribly uninteresting story. I've been on vacation, but before leaving I put together a new release, 3.0e, which fixes major bugs. Approaching a stable release? Possibly.

I'd like to thank the users for sending detailed reports, especially test cases.

This page is powered by Blogger. Isn't yours?