Direct Stream Digital and the Power of Imagination

(First published in Australian HI-FI, March/Apr 2016, v.47#02, pp.26-28, as ‘DSD vs. PCM – Which is the Best’)

Around fifteen years have elapsed since the Super Audio CD was introduced. Sony and Philips had hoped that this new 12cm disc format might replace the CD. As we all know, both SACD and DVD Audio — the competing high resolution audio disc format — never made any headway outside the audiophile community.

But now, with increasingly fast Internet speeds and increasingly cheap digital storage, the high resolution digital audio first provided on SACD and DVD Audio is now available for download. The SACD-equivalent downloads are in the form of DSD – Direct Stream Digital, the underlying digital format used for DSD. The DVD Audio-equivalent downloads are typically in FLAC — Free Lossless Audio Codec — which simply contains the granddaddy of digital audio formats: PCM (Pulse Code Modulation).

But while the ground has shifted, the battle continues. Since the start there have been those convinced that SACD is in some fundamental way ‘better sounding’ than PCM-based audio. Today it is now DSD vs PCM.

Many of the DSD claims are, I think, a product of Sony and Philips’ marketing hype at the time of the launch of SACD. They sound like harsh words, and they are, but let us consider what these companies achieved.

They managed to convince the audiophile community that this digital format was somehow more ‘analogue’ than Pulse Code Modulation. But of course it isn’t. Digital is digital. These are alternative ways of digitally capturing a wave form, but in the end, each maxes out at a particular level of accuracy determined in large part by the bitrate of the digital data.

SACD Insert

Sony and Philips also made more specific claims. In an insert in many SACDs, especially in the early days, they said: ‘Where CD frequency response extends to 20,000 Hz, DSD technology can theoretically reach 100,000 Hz. Where CD has dynamic range of 96 dB, DSD recording can achieve 120 dB across the entire audible range.’

Cool, huh?

But as we shall see, these claims are highly misleading. In reality, it is a rare DSD recording that has an effective frequency response much beyond 30,000 hertz. That isn’t because of the recording. That’s because of the design of the Direct Stream Digital process itself that was touted as an improvement on PCM.

Indeed, it’s extremely hard to reconcile these two statements:

– ‘DSD technology can theoretically reach 100,000 Hz’, and
– ‘On-Chip 50 kHz Filter to Meet Scarlet Book SACD Recommendations’

DSD FilterThe second of those is from the datasheet of a popular PCM/DSD DAC chip, the Cirrus Logic CS4398 (PDF). Yes, even while Sony and Philips were promoting that 100kHz bandwidth for SACD, its formal specification ‘recommended’ that players filter the output at 50kHz.

WHAT IS DSD? HOW DOES IT DIFFER FROM PCM?

PCM (for Pulse Code Modulation) is simple to understand. At regular intervals the instantaneous level of an analogue signal is measured. A record is kept of those measurements. That record can be used to reconstruct the original signal. Because a discrete numbering system is used to take the measurements, multiple generations of digital copying ought not degrade the quality of the record at all. The resolution of the system is determined by two factors: how frequently the measurements are taken, and the size of the number space upon which the measurements are mapped. A CD measures samples at 44,100 hertz (samples per second) and uses a sixteen bit number space (allowing 65,536 levels to be measured). DVD Audio and Blu-ray permit up to 192,000 samples per second and 24 bits (16.7 million levels).

DSD (for Direct Stream Digital) does things differently. Instead of mapping the instantaneous level of an analogue signal across a defined number space, it represents the momentary level of an analogue wave by the density of digital ‘1’s in a stream of single bit digits. A series of ‘0’s represents a low point in the wave form, a series of ‘1’s is a high point, while alternating ‘0’s and ‘1’s is the half way point (that is, a zero voltage). There’s more to DSD than this, but let’s pause briefly to note that this form of digital coding is called ‘Pulse Density Modulation’, one of several alternatives to PCM.

To properly represent the analogue signal by means of how tightly the ‘1’s are packed, a lot of numbers need to be flying. This is indeed the case with DSD. The form of it used on SACD, and the great majority of extant DSD material, has its bitstream proceeding at 2,822,400 bits per second per channel. By design, this is exactly four times the bit rate for a channel of the audio CD. The use of a multiple of the CD bitrate allows a relatively clean downsampling to CD standards for the release of recordings in that format. It is also a multiple of the sampling frequency used for CD: 64 times CD’s 44.1kHz. This form of DSD has come to be known as DSD64, or 2.8MHz DSD.

In recent years a form of DSD with double the sampling rate — 5,644,800 bits per second — has appeared. This is known as DSD128 or 5.6MHz DSD. (And, yes, there’s now a Quad DSD, aka DSD256 starting to become available.)

With efficient digital encoding, and in the absence of lossless compression, the bitrate is proportional to the resolution of the digital signal (that is, the accuracy with which it is encoded). So it seems pretty obvious that the DSD64 is significantly better than the PCM system used on the CD.

That a newer digital system should be better than a particular form of PCM specified in the late 1970s and early 1980s shouldn’t be surprising. But is it better than PCM in general? That’s the real question.

Let’s use those 2,822,400 bits per second per channel, and see what they’d yield if we used PCM at that rate. At 2,822,400bps a PCM system with 24 bits of resolution could have a sampling frequency of 117,600 hertz, yielding a useful top end of 55,000 hertz. With 20 bits of resolution the Fs would be 141,120 hertz, giving close to a 70,000 hertz bandwidth.

As you can see, there is no reason in principle why a similarly specified PCM system shouldn’t give startlingly good results. Especially in light of studies which suggest that in double blind trials the sound of DSD and CD-standard PCM is indistinguishable.

BUT WHAT ABOUT 100KHZ?

You can read the Sony/Philips claim about SACD as suggesting that DSD can carry up to 100kHz in bandwidth, and at the same time offer a noise floor of 120dB, which would indeed be an impressive feat since PCM couldn’t do that in the same bandwidth (unless compressed, like DVD Audio).

But there are tricks that you can pull. A raw implementation of most digital encoding systems results in a low level of quantisation noise that is evenly spread across the frequency spectrum. Such a thing is rarely permitted these days. Possibly audible quantisation distortion is eliminated by including an extremely low level of random noise (called ‘dither’). But this is random with a purpose: it is shaped so that most of the noise is in the top octave of the bandwidth, where the human ear is relatively insensitive, leaving the critical mid-bands with much lower levels of noise. This is the heart of Sony’s Super Bit Map digital processing, but there are plenty of other variations.

Including that in the Sony/Philips DSD process. To make sure it gets that -120dB noise ‘across the entire audible range’, as it puts it, it shapes the noise so that the great majority is up at the top end of the frequency spectrum.

Indeed, there is so much noise up there that in the real world, any claims of a 100kHz bandwidth — in any usable sense — are ludicrous. The reason is straight forward: at some frequency which is usually between 23,000 and 30,000 hertz, the level of the signal falls below the level of the noise. As frequency increases, the level of the signal continues to fall, but the level of the noise continues to rise.

This is best seen in pictures. Let’s start with something ideal. I created a clean non-dithered 1kHz sine wave in 192kHz, 24 bit PCM format at a level of -12dBFS (peak). Then I converted it to DSD64 and DSD128 using Korg AudioGate (v.3.0.4) software. And then, for analysis, back again to 192kHz, 24 PCM. Why this last step? There are very few analysis and editing tools that work on native DSD. Some facilities which use DSD convert to DXD — 24 bit PCM at 352.4kHz — for mixing. Do be aware that there are some parameter choices available for DSD encoding and decoding, so different encoders may yield different results. But only slightly different. In the main, these results are representative.

So here’s a spectrum showing the resulting signal:

1kHz sine noise spectrum comparo web

Blue is the original PCM, Mustard is DSD64, Green is DSD128. There’s the noise inherent in DSD, shifted off into the ultrasonic band. In the case of DSD64 it doesn’t kick in until above 20kHz. DSD128 pushes this out to 40kHz. The rapid drop off in the DSD64 noise above about 40kHz is the output ultrasonic filtering (the software does this for PCM conversion, but usually it’s built into the DAC).

Let’s look at these test signals a different way. Let us zoom in at the top of our 1kHz sine waves:

1kHz waveform comparo web

Same colours. All were at the same level. I have just shifted them down by 0.25 or 0.5dB for visual clarity. The slightly larger dots are samples. Notice how the PCM at the top is smooth, the DSD64 is somewhat bumpy, and the DSD128 is wobbling up and down from sample to sample. Let’s not be confused. The bumps and the wobbles aren’t audible. They are simply the ultrasonic noise inherent in DSD. I present this here to demonstrate one thing only: when it comes to clinical accuracy, PCM is superior to DSD.

But, you might object, all that’s test signals. How about real music? Good point. I played a snippet of DSD64 music from an audiophile DSD label — recorded in 2015, direct to DSD — and captured the output from the DAC. I chose a section with acoustic guitar recorded quite loudly to ensure plenty of high frequency content. And here’s the result:

2015 Audiophile DSD64 Acoustic Recording web

As you can see, at 28.5kHz the actual signal sinks below the level of the DSD noise, which continues to rise to around 40kHz, where it starts to be defeated by the DAC’s low pass filter.

Now let’s pause and consider all this for a moment. It is not in the least surprising that the inherent resolution of 192kHz at 24 bits is better than DSD64. Remember DSD64’s bitrate is around 2.8 megabits per second. The per-channel bitrate for 24/192 around 4.6Mbps. But clearly 24/192 PCM is also more inherently accurate than DSD128, even though the latter has an even higher bitrate: 5.6Mbps.

Yes, it is claimed that DSD can capture transients more effectively than similar bitrate PCM. Often this is represented as a picture of a square wave, with startlingly fast rise times for DSD, slower ones for PCM. Mathematically, DSD may indeed be capable of superior resolution on this front, except for one thing …

ADA CONVERSION

So DSD’s virtues, perhaps, lay not in the intrinsic accuracy of the format, but in the accuracy of conversion from analogue to digital and back to analogue. This would have been an extremely strong argument back in the early 1980s, when PCM was just finding its place. The first Philips CD player was well reviewed because it was considered that compared to the 16 bit DAC of the first Sony, its 14 bit DAC was a safer choice.

But engineers are clever. As an analogy, we should note that today’s computer technology is in just about every countable respect three orders of magnitude bigger, better and faster than it was then. Literally a thousand times.

So while 16 bit digital to analogue and analogue to digital PCM conversion may have been hard then, 24 bits is easy now. Especially since a straight conversion is rarely used. These days there’s oversampling, multibit Sigma Delta conversion and other techniques. Anti-aliasing filtering is performed digitally, eliminating phase shift, and with 96kHz and 192kHz sampling frequencies, sharp filters are largely avoided anyway.

At least, that’s the case for PCM.

DSD has a significant weakness, though. Especially DSD64. One ought to be reluctant to pour high levels of ultrasonic noise into one’s tweeters. Who knows what effect it may have on the actual treble signal, and indeed on the longevity of the tweeter.

So DSD has to be filtered. And unless you are going to convert it to PCM before decoding — which rather defeats the point of it — that filtering must be performed in the analogue domain. How that is implemented is up to the DAC maker. As we’ve seen, SACD’s are supposed to have a filter of -3dB at 50kHz.

Perhaps the DSD encoder can capture that transient beyond the capability of high bitrate PCM. Perhaps DSD can hold it. But it’s unlikely that it will survive intact a -3dB filter at 50kHz.

Here are the frequency responses of three different DSD-capable DACs, using a test signal designed for 192kHz sampling (converted to DSD using the AudioGate software):

Astell & Kern AK380 comparo web

Chord Hugo TT comparo web

Korg DS-DAC-100m comparo web

In all three cases the PCM version (white trace) is more extended than the DSD128 version (blue trace) which in turn is more extended than the DSD64 version (green trace). The differences between the three depend upon the DAC designers’ choices. The first two units retail at $5K+, the last is the Korg unit which is under $400. All three are -3dB at well under 50kHz for DSD64.

CONCLUSION

Now one can claim that these various measurements and illustrations fail to capture an essence of DSD that is somehow superior to PCM and is able to be detected by the human ear. That indeed may well be the case. But what we can see is that in every respect in which these things can be measured, in every respect in which they can be visually illustrated, DSD is less accurate than high resolution PCM. That should make one at least cautious about claims which are significantly less easy to demonstrate.

Indeed, the argument I am making is not that PCM is superior to DSD, or that DSD doesn’t sound as good as PCM.

I am proclaiming the wondrous fact that we now live in an age where the delivery formats — whether DSD or high resolution PCM — so far exceed the resolution of the human ear that they are, for our purposes, practically perfect!