Take any audio forum or Facebook group, or even a conversation at a pub among audio-interested people, and sooner or later someone will come up with the idea that they need better converters.
Or at least they wonder if if they should.
Are the ones in your computer or inexpensive interface good enough? Do they make a difference to the sound? Do you need better converters than you have to achieve that mythical "professional" quality that everybody seems so keen on? Is it ok to plug in your $2000 preamp into your interface or you need another $2000's worth of converter box to make it justice? Is it worth spending your hard-earned dosh on upgrading them or is a regular audio interface (or even your computer mic input) good enough?
Well... as usual, it turns out the answer is (annoyingly) “it depends”.
(this post is long: if you just want the conclusion, feel free to jump to the end!)
First of all, if you've read What Makes A Great Recording, you know that converters are way down the list of critical things to think about, when recording or mixing. At least if you re using an audio interface produced in the last decade. That's because, unless they are catastrophically bad, A/D converters contribute even less than preamps to the overall quality of your recording. This post aims to help you understand why.
The same bottom line as for preamps is valid here: when it comes to the quality and greatness of your recording, unless you're already covered with everything that comes before (performance, room, mics etc), it's not really worth thinking about the A/D conversion.
Just let it go, and buy some acoustic treatment instead, or a better mic. Or even better, some singing lessons if you are the vocalist.
It's gonna be far better value for money and make much more of an improvement.
What? No? Okay, let's assume that:
if you are the performer, you are as good as Elvis
your room is as good as Ocean Ways
you've chosen a vintage U47
you've spent a good half an hour carefully positioning it in the room
the mic is plugged into that spanking new Millennia STT-1
which you have set up with perfect gain staging
you're ready to record the vocal line of the century.
It's time to push "record".
You look down, follow the XLR cable out of the preamp to a the line-in of your audio interface.,. and you realize that said interface is not-particularly-high-end.
Suddenly you get cold feet.
Will it be good enough?
Do you need to go buy a better converter?
Let's find out.
What are A/D converters supposed to do?
Some background first.
A/D converters do a conceptually simple job: they take an analog signal (in most audio equipment, an electrical AC voltage), measure it over a short time (i.e. they "sample" it), and translate the measurement to a number that can be represented with 24 bits (or whatever is the word length that the converter produces). That sampled number is called (guess!) a "sample".
This sample is then made available at the converter outputs (in digital audio formats such as AES/EBU or ADAT or whatever). A computer audio interface, which is able to understand the digital format, can then pass on the samples to a computer application (a DAW, for example), for further processing. Sound!
D/A converters do the opposite job - taking a stream of samples and re-generating the analog signal so that it can drive some speakers.
In summary, A/D and D/A converters perform a sort of Star Trek transporter trick: they transform an analog, physical thing (voltage) into information, and then rematerialize it as a voltage someplace else. Neat!
So far, so easy.
The following figure (which, be warned, is quite misleading) should give an idea.
At times t1, t2, etc the A/D converter finds the value s1, s2, etc, converts it on base one a given scale, and there's your "sample": simply a number.
Do we lose information with digital audio?
No. I did say the figure above is misleading. And it is misleading because, looking at it, it may look that way: the blocky sequence doesn't seem remotely as smooth and precise as the continuous line of the waveform.
Which is right, because it isn't!
It's obvious that connecting the dots in the picture above does not produce a good approximation of the "real" analogue waveform.
But that's only because we are sampling at a pitifully slow sample rate, which indeed does lose information.
Between say t1 and t2, the waveform changes a lot, and since so much time goes between these two instants, we miss a lot of these changes. We lose information.
But what if we sampled much more often (i.e. more times per second, aka with a higher frequency, or "rate")? The samples would approximate the original signal much better:
Now, what if sampled so fast that the waveform changed very little between two consecutive sampling instants?
What if the signal change between two instants was so small, that it exceeded our ability to perceive changes? (the "ability to perceive change" is simply the max frequency we can hear. In people's case, about 20 KHz average).
So in other words, is there a sample rate fast enough to capture enough data, so that any information loss is outside our ability to perceive it - and therefore irrelevant?
Turns out there is.
Quite a few years ago, a really smart fellow named Claude Shannon (working on some idea by another smart fellow called Nyquist), proved that, in order not to lose any information about a signal limited to a certain frequency band, we've got to use a the sample rate which is double the max frequency we want to sample - aka the maximum ability of a device to detect change (for audio, that device is the human ear). This result is called the "sampling theorem".
In average, our ears perceive frequencies from about 20 Hz up to 20 KHz (and this top frequency decreases significantly with age).
That means that, to ensure we capture all there is is in that 20-20KHz interval ("frequency band"), we need to sample it at (at least) double 20 KHz, that is to say 40 KHz.
"CD quality" - that is 44.1KHz - is well above 40KHz, so we're jolly good: no information audible to a human ear is loss (but a bat, of course, will think the recording quality is horrible!).
Couple things work noticing: some few very young people can perceive frequencies well in excess of 20 KHz (up to 23 KHz) and for them 44.1 sample rate is not good enough (48KHz is better, for example). But due to the mechanical way ears work, that ability unfortunately doesn't last long; and at that age they're probably not that interested in audio quality.
Also, a 44.1 sampling rate is all good for humans, but sucks for dogs, which when young can hear frequencies up to 40KHz. Every time you put on a CD, think about the poor dog nearby!
(photo by Jacknunn - own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=91729752)
Real world filters and aliasing
One important detail on the result above is that, for sampling to work well, the signal must be band limited, that means that outside our 20 Hz - 20KHz band there must be nothing.
Otherwise, information outside the audio band will be captured and mistakenly used to rebuild the audio signal (i.e. end up inside the audio band) by the D/A stage, producing sound that wasn't there at all in the original signal... a sound which will have nothing to do with what we originally sampled, so most likely will be crap!
The production of these unwanted sounds is called aliasing and it's critically important to avoid it: we certainly don't want to our nice guitar solo to sound like we can't play!
So before sampling, we need to filter the signal, using a low-pass filter that removes everything exactly over 20KHz.
The difficulty is that, as usual, in the real world it is impossible to filter frequencies abruptly. In the universe it takes time to do anything!
Therefore, as you can see from the figure below, real-world filters can be very steep, but are never perfectly vertical.
Since we don't want to lose audible information, a real-world filter will need to start "cutting" from 20KHz, leaving a little space for some ultrasonic frequency to be present (see figure below).
The consequence is that we need to sample a little higher than 40KHz, (to ensure that no aliasing occurs in the 20Hz-20KHz range), and then discard any sample data for frequencies outside that range, so that we are left with audio-band samples only.
CD quality specifies 44.1 KHz, which is obviously much greater than 40 KHz and therefore works fine, and leaves a whopping 2.05 KHz "space" on each side.
A brief history of 44.1 KHz...
And now just for the fun... why 44,1KHz?
Well, it's a bit of a trivia, but when CDs were in the process of being invented, people had weird hairdos, pastel colors were all the rage, and hard drives were small.
When you have a stream of samples ("Pulse Code Modulation data stream", or "PCM stream" for friends) they are essentially numbers. Numbers can be stored as sequences of bits ("0"s and "1"s) but PCM streams are big.. their size is easy several megabytes.
Nowadays a phone has easily thousands of megabytes of storage.. but back them, a megabyte was a serious amount of data, very expensive to store and big to process.
Around the time the CD was invented there was one type og gear with enough capacity to record large amounts of digital data: video recording equipment.
Turns out that back in the times where moustaches where hot, digital video recording kit already used a sampling rate which was a bit higher than 40KHz-... specifically - guess what? - 44.1 KHz!
For video equipment, that specific number was chosen not because of the audio, but because it made easy to support both PAL and NTSC's "screen lines" vs. "line drawing" frequency, while needing only 3 samples rate for each line.
So that gear got used for audio - and it suited it well since it allowed the filter to have 2.05 KHz of "space" for the cutting curve (meaning that the filter could be relatively rough and thus not that expensive to produce).
And then inertia did the rest. So 44.1Khz is still with us.
As of 24 bits as word length, having 8 bits more than 16 allows better results should the PCM stream be processed by some calculations.. such as plugin effects and the mixing engine in DAWs - since computers aren't that good with numbers - they need all the help they can get.
Another dose of reality
So we're all good. All of the above tells us we can sample any audio signal and then reconstruct it identical to how it was. Just like Star Trek transporters do with people. Why then having many converters?
There is only a little problem: real converters aren't perfect.
Just like it was for preamps (and for band-limiting filters), most of what we've gone thru so far is the working of an ideal, perfect converter. If it were possible to build perfect real-world converters, it would be all that you'd ever need.
But it's not possible.
Converters are made with physical components that can only be manufactured with so much precision (and often, at a price point), and physics itself has the bad habit of completely ignoring the beauty and elegance of mathematical models.
In other words, when sampling, A/D converters make errors. And so D/As, of course.
And that's not all: different converters, based on different classes of technology (or different designs in the same class), will make slightly different errors.
That means that (given the same analogue signal) when using different hardware, the sample stream produced by an A/D converter and the output produced by a D/A converter will be a little different. As in different sound.
Conversion errors for dummies
How are these errors?
Well, there's lots of ways real-word converters can be imprecise, and it really depends on the technology in use.
Without going too much in detail, here's a rough, superficial list:
timing issues (the frequency of sampling must be governed by a very precise clock, and if the sampling duration - also determined by the clock - is not exactly identical every time, some sample may capture - and translate - slightly different amount of signal.. while, on reconstruction, the D/A converter will happily assume that all samples come from the same timing (or, of course, having its own, different, timing errors);
how the gain-to-digital level is interpreted (ideally should be a straight line, but physical components may not behave so, see picture below);
the fact that the analog circuitry which brings the signal to the A/D converter may present different electrical impedance to different frequencies,with certain frequencies being a little bit "reduced" (impedance is a kind of "resistance", but for alternate current, check out the excellent primer here by Hugh Robjohns);
Also "boundary conditions" like the circuit operating temperature (both room-dependent, but also voltage-dependent) can affect the signal detection. Different designs will attempt to compensate for these physical effects in different ways (usually the better they are, the more costly, with the inevitable law of diminishing returns creeping in).
There's many others - phase shifts, quality of the analogue filter etc, and more depending on the specific sampling technology. AND of course, any noise or distortion in the analog part (more on that later)
All of these may result in errors - meaning the A/D converter does not behave exactly as its theoretical, ideal counterpart.
It's important to keep in mind however that these effects are really minimal.
For example, the time scale with which music is understandable to us (say milliseconds) is far bigger than the time scale with which a converter operates (even a humble 44.1Hz is 44100 times per second, an order of magnitude greater).
Nevertheless: errors exist; and such errors result in slightly different (but musically equivalent) sample streams and reconstructed-analog-wave. Both "different from the ideal", and "different among different hardware"... which means that the various A/D (and D/A) converters may not indeed "sound" the same.
Back in the studio
Let's now go back to your Elvis-like performance: are you going to get a good result plugging your microphone in your inexpensive (but modern) interface instead of using an external preamp with an A/D box that costs ten times more?
Does it matter?
Sonically, the short answer shoud by now be clear: yes, but in general not so much.
With modern 24 bit units, conversion errors are generally minuscule with respect to all that comes before. Therefore the differences between different converters (making slightly different errors) will be extremely small, and in most cases irrelevant for real music... like "in not distinguishable in an A/B test". Errors are so small, that an audio signal must be subjected to many conversion cycles in order to hear any degradation (how many, it depends on the model... converters with lower noise, distortion and jitter will allow literally thousands of cycles without perceptible degradation).
However, it also depends a little bit on your ears, and the amount of high frequency detail provided by better converters (producing more "informative" bits in every 24 bits word) can be felt during mixing - there's just more information for that EQ to work on (and for your ears to detect).
By the way, can I use the converters built-in in my PC?
After reading so far, you may have realized that, since your computers is a capable of receiving audio (via a mic or line in input, normally) and produce sound, it has to have converters onboard. And so it is.
Can you use them?
Generally, it's not a good idea.
While there may be exceptions, on-board converters are usually not that great, especially on PCs. It's simply a matter of cost. Onboard converters will often have cheaper analogue front ends, making them noisier and more distorting, maybe produce only 16 bits, clocks will have more jitter etc. Simply put, audio is not a primary function for a PC motherboard, so what is given does the job, but just barely. You most likely need an external audio interface with onboard converters as opposite to plug your preamp directly into the computer line input mini-jack.
A few more considerations
We've found out that different converters sound a little different. So how audio interfaces and high end converter boxes compare after all?
Errors are really minuscule
Modern, not super high-end audio interfaces are generally set up to produce19-20 meaningful bits in every 24 bit sample (that's called the resolution, the remaining bits being either "silence" or very low noise), their clocking is quite accurate, their analog input front end is stable (at least with signals not pushed too hot), they are relatively independent from temperature swings and so forth.
High end converters reach up to 20-21 bits and are usually a little more accurate, but it's like having 100 millions or 110. Nicer with 110, but not critical! And considering that already 16 bits are more than enough to represent any audio signal that can be perceived by people, a casual listener won't notice errors at all. If you can survive a little math, take for example timing errors:
at 44.1KHz, the ideal sampling time is every 1 / 44100 = 0.00002267573.. seconds.
16 bits can represent 65 536 distinct possible values, so the minimal significant error i.e. the minima difference between two values is one bit of difference at the rightmost bit of the 16.. that is to say, 1/65 536 = 0.00001, several orders of magnitude greater.
if we sample for 0.00002267572 seconds, the 0.00000000001 seconds of difference will be insignificant.
In order to be significant, errors would have to creep up towards the 17-bit range, near to the bits that begin to represent "audible" differences. It may sometimes happen, but generally only as a result of a particular unfortunate calculation algorithm or sequence of plugins.
A little more detail
Plugins (and certainly DAW mixing engines) are built with "stable" algorithms, that is to say calculation methods that do not amplify approximation errors (if that's not the case, A/D conversion is the last of your problems!), so errors in the 19th bit or even the 18th will stay there and be insignificant (especially when internal dithering, floating point calculations and other techniques are used).
However, the less errors there are, the better. As I wrote above, higher-end converters will sometimes manage to produce 20-21 "informational" bits (and counting) in the 24 bit samples- meaning more bits that carry meaningful information and not noise.
That translates in more available detail when processing the audio, aka mixing. Since these bits are the least significant, we're talking of very small differences at high frequencies. A skilled mixed engineer with good playback equipment may well perceive them.
Errors (and resulting "sound") may affect some bands more than others
It's also worth noticing that with some conversion technologies, timing errors may impact certain sounds more than others, and specifically high frequencies more than the rest: high frequencies by definition change much faster per second (a 15 KHz sound is air going up and down 15000 times per second, meaning each change lasts only approx 0.000067 seconds), so a timing error of say 0.00001 seconds in determining the sampling instant would affect the sampling of these frequencies quite catastrophically! Luckily, modern converters have sampling clocks accurate down to the femtosecond scale ( 1/1,000,000,000,000,000 seconds).
Low end sounds have of course much slower oscillations (an 80 Hz sound makes air jump up and down only 80 times per second, with a rather longish cycle time of 0.125 seconds) so they aren't affected as much by quantization timing errors. In their case, what's important is the A/D resolution: since they carry high energy they need big numbers to be represented, so the more bits, the better (remember, 16 bit allows for 65000+ distinct values, 24 bits for over 16 million values, so if you start from zero, you can get to much higher numbers with 24 bits - or you can separate the same amplitude scale in much finer pieces).
So, the18-19 bit of real converter resolution still allows you to capture very high energy levels (such as, you guessed, that booming deep bass!) with good headroom, so long gain staging is right (i.e. you don't record in the yellow).
It is really astonishing how not-so-expensive equipment has progressed!
The analogue front-end
What can be a bit surprising is that conversion nowadays is generally built on a single chip, and both regular interfaces and high-end converters tend to use more or less the same bricks.
A big difference, however is in how the signal gets to the chip, and how the chip is made aware of the passing of time.
That's because A/D converters are digital devices, but the signal that comes in is still very analogue.
That means that it must be "massaged" a little to make sure it hits the sweet spot of the converter, in terms of the usual analogue properties voltage level, distortion, noise, band-limiting, change in input topology (single ended to differential, especially for music-oriented converters), etc..
This job is done by a section of the converter box called the "analogue front end", which kinda "conditions" the signal in order to present it as best as possible to the actual sampling stage. Here's where high end converters have the upper hand, for the usual reasons as with any analogue equipment: higher end means better (and more expensive) components, with smaller tolerances, more time and budget to design the optimal circuitry, better cards layout, more testing cycles and so on.
That often results in less noise and distortion, less unwanted band-related behavior, etc.. obviously each converter box needs to be tested, to see if its designers have actually done the job well (as opposite to just put on a well know badge and jacked up the price)... but, if there's a section where difference between high-end and "normal" can be seen, it's this.
The same goes for the "helping" circuitry: in high end converters the clocking section may be better, with less jitter ("jitter" is how much something that's supposed to be precisely periodic, like a clock tick-tock, really isn't). That also applies to the "digital" clocking - i.e. the clock controlling the digital stream as sent by the A/D converter output.
That's what you pay for.
Ok then, should I get better converters?
How much these differences are important is really up to you and how much you value the outmost perceivable audio quality (and details when mixing), and at which costs. And of course bragging rights!
Here's some considerations:
The converters in your (modern) interface will not be an obstacle (or a great help) in making a hit record. The differences on mixed music simply aren't remotely big enough to be noticeable or relevant for commercial success. Converters are (emphatically) not what makes a good recording.
Good modern interfaces use in general good converter chips, with reasonable analog front ends, reasonable input dynamic range and reasonable real resolution. If your gain staging is appropriate, they will do the work just fine.
External high-end boxes will most likely have better analogue front-end and clocking, resulting in a better signal into the conversion stage, more significant bits in the 24 bit output and less degradation per A/D/A cycle. This can be audible in some circumstances, or more often felt during mixing, where you have more information at your disposal for these small EQ moves on the cymbals.
Funnily enough, sometimes, a converter box will change the sound a litte, in a way that is perceived as good, regardless of theoretical perfection (we seem to like a little bit of distortion and noise, as enthusiasts of both vinyl records and old studio consoles seem to prove). Most often this is due the what the specific analogue front end does to the signal passing thru it and it's always very subtle. In this case, what sounds good to you is good!
When it comes to physical controls, form factor and durability, unlike preamps, there's not much to go about between interfaces and external boxes, because the converters' job is pretty fixed so you don't depend on cheaper potentiometers etc. At most you have a sample rate selection switch.
One practical thing that external A/D conversion boxes may have but most interfaces may not, are meters, which show you the signal level in the analog front end (for every conversion channel), so that you're sure you're not going overboard with levels ("clipping the converters"). This is not so critical, however, as these meters give pretty much the same information that's shown in your DAW's meters. And as we went about in "how DAW destroy recordings", you will anyways always record with levels in the middle of the scale... so your AD converter will be happy no matter what.
External boxes obviously have additional hardware outputs (and the relatively digital circuitry) to pass out the AES/EBU stream (or ADAT, or whatever digital streaming format is supported) while many interfaces don't. But that makes no difference to the sound at all. The most obvious use of this is to extend your channel count - for example to use outboard.
An interesting consequence of all of the above is that, while external boxes tend to be a little better than your run-of-the mill interface, differences between high end different converters will be even smaller - because they will all be nearer to the ideal, totally transparent conversion.
In conclusion, here's a little checklist for you to decide:
Are you using your computer line-in? Go grab an interface!
Do you need to more channels for your interface and your interface supports digital connectivity? Go grab that external converter box!
You're high on talent and short on cash? A Plug in your microphone (or external pre) into your regular interface and you'll do just fine.
You're low on talent and high on cash? Waste of money. You need that high end converter box as a penguin needs a house in the Sahara. On the other side, it's your money and you have a lot of it.
You've got both talent and cash? Check the rest of your chain - material, musical skills, recording room, microphone, recording skills, preamp, mixing room if you do your own mixing... If some of it is not perfect, spend your money there. Otherwise, go for it!
Do you feel a little lack of details when mixing? Beware of self delusions and bias, of course, but it could be worth trying mixing something recorded both on your interface and on a really good A/D converter (and mixing it thru a really good D/A) and see if it feels different (possibly without initially knowing what's what). So go rent a A/D/A box!
And that's it.
The original title of this post was "Zen and the art of A/D conversion".
I changed it because I decided not to go into any details on the various conversion technologies, how the actual samples are produced in each of them, hypersampling and so on - this post is already crazy long as it is and I'll be surprised if you've read so far. There's also a whole bunch of information on sample rates and when and how you want more than 44.1 KHz, which deserves its own entry.. so no Zen. Yet. On the plus side, this title's more catchy. Happy recordings!