Too clever by half

Systems for achieving very high levels of data compression use amazingly inventive techniques. But when they are deployed, you should remain aware that what you’re getting is not identical to the original.

With ‘lossy’ compression techniques such as MP3 and JPEG, ‘loss’ refers to loss of data and loss of accuracy, and that does not necessarily translate to loss of identifiable specifics in the signal. And it is unlikely that amongst the many noise artefacts that they generate would be details that could possibly be confused with the real new details.

But somehow Xerox has managed to pull off this trick. It turns out that thanks to a compression system employed in certain of Xerox’s enterprise level scanners and photocopiers it can change certain digits into other digits, totally invalidating the documents copied. It turns out in some modes the JBIG2 image compression system is used. This achieves high performance by, in part, identifying standard patterns which can then be tagged for replacement from a saved set of patterns.

Unfortunately, at certain sizes this can result in ‘6’s being replaced by ‘8’s, and ‘4s’ by ‘7’s.

Which might result in anything from court cases over contract breaches all the way to bridges collapsing.

[A bit of my own accidental substitution there: this post was originally entitled ‘Two clever by half’. I could pretend I was being clever, playing ‘two’ against its inverse (1/2, half) but in fact I just wrongly put in the incorrect word.]

This entry was posted in Compression, Misc. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *