I’ve run BDInfo on the forthcoming Australian release of Slumdog Millionaire on Blu-ray from Icon Film Distribution. This turns out to be a very different encode to the US version.
The main title details of the US version are here. The main title details for the Australian version, posted by me, are here. In brief, the US version gets MPEG4 AVC, we get VC1 (both have healthy bitrates in the high 20s of megabits per second). Australia gets a PIP video commentary which the US doesn’t get.
Both versions get DTS-HD Master Audio 5.1 channel sound. But the US version gets 24 bits and 48kHz, whereas we get 16 bits and 48kHz. The US version has an average bitrate of 3962kbps. The Australian version averages 2032kbps. Note: throughout this Blog post, ‘k’ equals 1,000, not 1,024.
I find those bitrates interesting. Are they comparable? What do they tell us about lossless codecs?
I’m inclined to think that aside from the 24 vs 16 bit thing, the two audio tracks are very similar, but are not identical. The run length of the US release is around 30 seconds longer than the Australian release. If you look at the chapter breakdowns at those links, you will see that chapters 2 through 27 are the same length, and for chapter 28, the last chapter, the Australian version is actually about a second longer. The major timing difference is accounted for by the first chapter, and this is most likely to be due to to different company logos at the very start of the movie.
Aside from the different sound of the logos, I’d be extremely surprised if anything was different in the source sound between the two versions. It is a very recent movie, so I expect the original multichannel PCM recording was used for both versions.
As I understand it, the only way to do DTS-HD Master Audio compression is to use equipment and software supplied by DTS, so there should be no differences in the encoding methodology. If all these are the case, then the overwhelming responsibility for the difference in bitrates for the two audio tracks would be due to the 16 vs 24 bit question.
Now 3,962 is nearly twice 2,032 (1.95x anyway). An uncompressed LPCM 24 bit, 48kHz 5.1 channel audio track runs at 6,912kbps. The 16 bit version runs at 4,608kbps. Unsurprisingly, the former is 1.5 times the size of the latter.
So why does the efficiency of DTS-HD MA seemingly fall off so much with 24 bits? I’m not sure. But let’s look at how DTS-HD MA works.
The lossless compression I know best is Dolby TrueHD. This is very similar indeed to the Meridian Lossless Packing used on DVD Audio, which Dolby licensed from Meridian and championed for that purpose. Audio typically doesn’t compress very well using ‘traditional’ computer compression processes (eg. WinZip), which largely rely on finding and eliminating redundancy. I have just dragged a 16 bit, 44.1kHz music file into a Zip folder and managed to reduce its size by 7%. Even our relatively inefficient compression of the 24 bit sound on this movie got it down by 43%. The 16 bit sound was reduced by 56%!
A 7% reduction is probably not worth the trouble. But 43% and 56% certainly are.
So how to get big compression factors? The trick used by MLP and Dolby TrueHD is to build in an algorithm that, based on the sound so far, deduces what the sound will be in the future. If a waveform is increasing, then it’s highly likely that in the next sample it will still be rising. The algorithm will do that, and set the next sample at a reasonable guess for how much it will have risen by, given the preceding samples. Except for some sudden transient, which the system treats as an exception to be dealt with by other means, this should give an adequate approximation of the sound.
By ‘adequate’, I do not mean adequate for listening purposes, but adequate for the next stage of the process. That stage tweaks the sample to make it accurate. Consider, instead of using, say, 16 bit samples to describe a sound wave you use 8 bit samples. The amount of data you would be handling would be halved. Normal 16 bit sound describes a series of sample sizes. But another way of thinking about this is that it describes a series of offsets: how far each sample diverges from a particular value. With uncompressed PCM, that particular value is zero. But with MLP and TrueHD, the value is different for each sample, and is derived from that approximation algorithm.
So MLP and TrueHD basically work by using a formula to guess what the sound will be, based on what it has been, and then a series of offsets to correct it. Because the guess is typically very good, the offsets are small and you can use 8 or 6 or 4 bits to communicate these, much of the time. Exceptions are provided for, of course, but most sound is efficiently compressed. And since the encoder and decoder both use the same algorithm for the ‘guessing’ part of the process, the reconstruction of the sound (guess + offset) is conducted perfectly.
That’s Dolby. How about DTS?
Both Dolby TrueHD (but not MLP) and DTS-HD Master Audio carry, on Blu-ray, a ‘core’ within themselves to cater for equipment that doesn’t support the new audio formats. Dolby TrueHD carries a Dolby Digital core (typically at 640kbps, but sometimes at 448kbps). DTS-HD Master Audio carries a DTS core. In every case I have looked at so far, the DTS core is a high bitrate core at 1,536kbps (sometimes reported as 1,509kbps). Most normal DTS tracks on DVD and many on Blu-ray use a half bitrate of 768kbps.
If your system will decode Dolby TrueHD itself, then the Dolby Digital core is totally ignored. The TrueHD component of the bitstream stands alone. DTS works a little differently. Presumably the DTS engineers thought to themselves: ‘If our audio tracks are going to be carrying a bit 1.5Mbps data load anyway, we might as well make use of it.’ Note, also, that DTS has always claimed that DTS is nearly lossless. That is, DTS thinks that much of the time the entire 5.1 channels of PCM can be completely and perfectly reconstructed from the 1,536kbps. I am certainly not competent to dispute this claim, although I would note that this is rather more likely to be the case with 16 bit than 24 bit sound.
So the standard DTS core forms an integral part of DTS-HD Master Audio sound. The decoder uses both the core and the rest of the bitstream to losslessly reconstuct the original sound.
Now we get to some educated guesswork on my part. It seems likely to me that DTS-HD Master Audio works in the same way as Dolby TrueHD: it uses relatively compact offsets to tweak an approximate representation of the signal into perfection. But whereas Dolby TrueHD uses a predictive algorithm, DTS uses the regular DTS core as its approximation (I suspect that regular DTS uses a much cruder predictive algorithm anyway).
Now let us consider the sizes of the tweaks involved. Both of our Slumdog Millionaire audio tracks have DTS cores of 1,536kbps. The 16 bit version requires tweaks of a modest 496kbps (2,032-1,536). The 24 bit version needs tweaks of 2,426kbps, even though the standard DTS core approximation is itself claimed to be 24 bits in resolution!
Is 24 bits it worth it? That’s a question for another day.