–and How to Get It

by Robert E. Greene

Address to the San Diego Music and Audio Guild: July 27, 2005

What I want to talk about today is one of the fundamental divisions in the audio world. Of course, divisions are common: tubes versus transistors, analogue versus digital, box speakers versus planars. But the division I want to talk about is more basic, more truly at the heart of things-and yet it is so seldom discussed explicitly, fundamental though it is, that the issue does not even really have a name.

The basic question is: What would you have your system do if you could make it any way you liked? And there are two distinctly different answers that one could come up with reasonably.

The first one is: In theoretical stereo, perfect stereo is the arrival at the listening position of the acoustic replica of the recorded signals of each channel. It is what you would hear in an anechoic chamber from perfect speakers. Let us call this “direct arrival”.

The alternative has no real theoretical basis at all, but de facto it is likely to be what actually happens, and people are used to it. The sound you are hearing is the “direct arrival” sound plus a lot of sound from the room around you. How much and what kind of indirect sound you hear is a bit up for grabs here, but it has to satisfy the criterion of sounding at least reasonably “uncolored” and somehow all right in terms of stereo.

These two things sound very different from each other. To make the difference conspicuous and explicit, let us look at a simple question:
What will a close-miked mono vocal, a singer singing into a mono microphone, sound like when played back?

Most of us would answer this question with what we usually get: the singer will sound like a point source midway between the speakers at the height of the speakers (actually usually the speaker’s tweeters) and in the vertical plane through the speakers. In particular, how far away the vocalist sounds depends on how far away the speakers are.
We are all used to this and take it for granted. But of course if you think about it in terms of theoretical sound reproduction, literal replication of recorded sound, it makes no sense at all. What has the position of the speakers got to do with anything? That position is not part of the recorded information at all. It is interesting that Gunther Theile, who became one of the great authorities on sound perception, wrote his dissertation on this phenomenon more than twenty years ago (in part: the rest was about the Gestalt psychology of stereo perception).

So how does such a mono vocal sound anechoically? It sounds centered but at a somewhat indefinite, though usually close position. Depending on the tonal balance and volume it can sound right in front of your nose.

Of course, the first thing that might come to mind is why would anyone want a stereo that did such an odd thing to a common type of recording?
This is not to mention that this type of playback reveals rather abruptly the incoherencies of a great many of the stereo microphone techniques in common use, which rely on the blurring effect of room reflections to sound integrated.

The answer might be something like: it is the truth, and if the recording is like that, it should not be. But actually, the kind of theoretical perfection I am discussing sounds rather good (to me) on most recordings, once you get used to it. And it has some great advantages, which I shall now describe.

The most obvious one is the expansion of possibilities. If one is using the recordings, rather than the listening room itself, to provide reflections and perception of acoustics, then one can generate any acoustic desired. If one uses the listening room, one is stuck with it. A properly made, say one point stereo, recording in a natural acoustic environment for the kind of music involved can sound amazingly good and equally amazingly realistic when played back this “direct arrival only” way.

The strange thing is that often people are really quite aware of this issue without coming to terms with it. Almost everyone talks about “they are here” versus “you are there” recordings. And one often reads that a good way to test loudspeakers is to make a recording of a person in one’s household outdoors and then play it back on the speakers, checking for resemblance to the person’s actual voice speaking in the room next to the speakers. It is often strongly emphasized about making the recording outdoors, too. But it seldom seems to occur to the people making the suggestion that if one makes the recording out of doors it should sound out of doors when you play it back! And yet, what else makes any sense? If an anechoic (outdoors) recording sounds like a voice in room, what will a voice recorded in an auditorium sound like? Auditorium plus your room combined is the only possible answer, and a musically wrong answer it is, not to mention being an unpredictable result from what ought to be a predictable process. A recording after all is a communication channel. The person making the recording ought to be able to know what it will sound like to the listener in a predictable way.

The second point has to do with coloration. Reflected sound changes the tonal character (for lack of a better phrase) of what one hears. And it changes it unpredictably. Later reflections are largely ignored (cf., the Archimedes/Eureka project results). But the early ones do shift things (A/E again). The (first) floor reflection is especially bad here. These room induced colorations are again unpredictable.

Then there is the correction question. If one wishes to correct speaker/room sound, then it is clear what to do if direct arrival predominates. But once a lot of room sound is mixed in, then the whole question of what is right begins to get all mushy and indefinite.

This last point is made especially complicated by the fact that the diffuse field response of the ear is far different from the direct arrival response, in the higher frequencies at least. (This is not surprising since the diffuse field in effect injects a good deal of energy straight into the ear canal, which gives maximum high frequency response.) So the perceived high frequency content is quite different if the soundfield is dominated by room sound from what it has if one has direct arrival only. (A truly omni speaker with a flat on axis and hence flat power response sounds very bright compared to a directional speaker. The diffuse field versus frontal arrival difference is over 10 dB at 8 kHz).

So the only way to be sure of what balance will be heard is to control the relative amount of diffuse field and direct arrival. And the easiest way to standardize that is to make the direct arrival into nearly all that one hears. (The alternative would be to make indirect diffuse sound totally predominant, but this would preclude stereo imaging, and surround images as well).

It is hardly any wonder under these circumstances that impressions of recordings in the press as well as informally are so variable and so dependent on which speakers and listening rooms are used.

Solving the Problem in Principle

It is actually possible to make the direct sound truly predominate. But to know what it takes in measurement terms to make the direct sound aurally dominant, one needs to review in outline how the ear/brain sense the character of sound. People often talk about the precedence or Haas effect in this regard, but the true Haas effect is about localization, not perception of tonal character. The rough ideas are similar but the details are different.

Note first that just on a mathematical basis one needs a time window of a cycle or two to sense the content of a sound in a given frequency band at all. For example, to know how much 1kHz (or thereabouts) energy there is, one has to have a sample of the sound which is at least one and preferably two or more milliseconds long. One cycle is one millisecond at 1 kHz, so one needs on the order of milliseconds. A time window of say 0.1 milliseconds would tell you nothing. Similarly, to know how much 20 Hz energy there is, one would need a minimum of 50 milliseconds of time or more.

It might be natural to suppose that the ear/brain senses things as fast as they could be sensed. This turns out not quite to be true-the actual “integration time” that the ear averages over is something like 25 milliseconds at a minimum, although the exact way this happens is a complex matter. But in any case one will certainly be all right for timbre accuracy if the following thing happens: Imagine a sequence of increasing time windows. If the impulse response of the speaker is flat over each successive window in those frequencies where the window is long enough to give meaningful results in the mathematical sense, then the ear/brain is bound to sense flat response.

The graphs illustrate this in practice. They show successive windows with the response gradually filling in at the bottom as the window gets longer, but always remaining (essentially) flat in the whole meaningful frequency range, whatever the window. This speaker (an EQed McIntosh XRT 28) sounds almost totally neutral. This astonishing looking and equally astonishing sounding result is obtained by using a large directional array. This is a large and necessarily expensive speaker (because of the large number of drivers used), but it does deliver the goods in terms of ignoring the room around it, at least above the deep bass.

EQ'ed McIntosh 7,15,25 ms

There are other possible approaches to this problem of room reflections by dealing with the room itself. The most successful is the so-called RFZ(reflection free zone) room, in which the walls, floor and ceiling are configured so that all the early reflections are directed away from the listening position by the geometry of the room itself. This requires a purpose -built and rather strange- looking room. This approach is seldom used in domestic installations, although the results can be truly spectacular. (I once described the Focus Recording Studio in Copenhagen, designed on this principle by Ole Christensen and Paul Ladegaard, as “the world’s best audio system”-and I still think it was/is!)

Solving the Problem on the Practical Level: the Bass

Most audiophiles, no matter how dedicated, are not going to go for RFZ room construction. Many are not willing to go for the large array speakers, either.

So the question arises ,what to do in the real world ,so to speak. And here digital signal processing as well as acoustic treatment of rooms can come to the rescue. But of course the first thing you have to do is decide whether you even want this kind of sound. For that, I would suggest taking your speakers outdoors on a sunny day. Of course you will still have floor/ground bounce, but you will get some idea still. (Do not worry that there is less bass-that will not be an issue when you do what needs to be done indoors).

Once you have decided you want to try to hear the direct sound only, then you can start work. But the work is divided into pieces, because different frequency ranges require different methods (unless you are going for the whole RFZ thing in a very large room).

Let us start with the bass, because actually the bass is the same for everyone, whether or not they want to go for the reflection-free sound approach.

In the bass, the whole idea of discrete reflections arriving after the first arrival does not actually apply. The whole room is involved in the bass from the start, unless you live in the Taj Mahal or Hagia Sophia or some other huge room.

Now the essential point about bass in rooms is that, for any one fixed listening position, it is what is known as a minimum phase phenomenon. The exact mathematical definition of this is not relevant(if you are curious there will be an article covering this concept on my website (www.regonaudio.com). What is relevant is that if one EQs it to be flat (via usual EQ methods), then it will be phase linear, that is , time correct, too.

People seem to have trouble believing this, even some people who ought to know! How, they say, can EQ kill off some nasty resonance? But of course EQ of the usual analogue sort does just that! A anti-resonance of the same Q and size (a response dip in other words where there was a response peak) will cancel the resonance not only in amplitude but in phase (time).
(You can try this with a pair of EQ devices: set one of them up, the other down the same amount at the same frequency, string them together and the combined effect is —no effect!)

So all you have to do to get perfect bass where you sit is EQ. Period. Nothing else–provided of course that you have full range bass already. EQ will not make a mini-monitor into a powerful subwoofer! This is a really important point that is often misunderstood. It is not really a theoretical necessity-some reflection phenomena are not minimum phase -but it seems to be almost always true in practice.

In actual practice the bass in most cases can be made fairly flat (and hence quite well timed , too) with only a few “coefficients” —bell-shaped resonances or anti-resonances with adjustable center frequency and “Q”.
This is the idea behind the Rives PARC. It is an analogue parametric EQ device with just such adjustments (four coefficients per channel, attenuation only). It does quite a nice job, although it is expensive for what it is. I would be tempted to work some with an inexpensive “graphic” EQ first just to get the hang of things.

In a somewhat higher price range (though not that much higher actually) is the Tact unit. This does a LOT more. It is still minimum phase EQ only, with no literal time manipulation as such, but it is automatic and its digital program has many, many coefficients-far more than any analog EQ device could offer in a practical way. Of course, it works on digital signals, or signals converted to digital, only. But it has a lot of resolution in the bass and gets the bass really right.

In applying these devices it is important to keep in mind that cutting peaks down is far better than trying to push dips up. The latter takes power and requires large woofer excursions that lead to distortion (in the case of “infinite null cancellations”, pushing up may not even be possible). So, one should choose an initial setup with peaks but no substantial dips in response. This can usually be arranged by putting the (nondipole) speakers near corners. (The Rives offers attenuation only, anticipating such a set-up).

Actually, on a more unusual but extremely promising note, the Tact allows the use of corner-placed woofers-not subwoofers as such but just woofers, with crossover at 200 Hz. (Potentially, one might be able to use even higher crossover points). Coherence with the out-in-the-room main speakers is accomplished by time delays so that the sound all arrives at the listening position simultaneously. This gives remarkable results. The corner loading makes the woofers coupled to the room more effectively (so the woofers and bass amp do not need to work hard). And the corner position puts all the bass “in phase” so that, as it turns out, the bass made perfect at one point remains very good over a broad area, broader than usually happens with out-in-the room woofer placement. (This is a good idea!)

I have listened to the DEQX speaker correction but have not worked with their room correction system, which seems to be something that is evolving at present. As I understand it, it too will do minimum phase EQ only. This means that except for details of digital signal processing methods it will be identical in effect to the Tact unit if the frequency response target curve is set the same way and measured the same way. (Note that any minimum phase device can be set to give results identical to any other via the right target curve except for limitations of resolution in the frequency domain. In effect, the only differences aside from that resolution question are matters of how the measured response is interpreted.)

DEQX is pursuing an aspect of speaker design that is most promising. Namely, they are systematically exploring use of the fact that in digital filters, phase and amplitude are easily manipulated independently. This opens up new vistas in crossover design, e.g., steep-slope crossovers without the extreme phase anomalies that would usually arise in the analogue domain.

Let us pause to shed a tear for Sigtech. This was the first effective DSP room correction system, but it vanished without a trace, except for some impact on the pro market. Sometimes it is misfortune to be too far ahead of one’s time, in the ultra-conservative world of audio.

(Note: Be careful. The Sigtech is wonderful, but if you buy one on the used market, you have to be careful to get a unit that includes the adaptive programming. A preset unit is useless except in the context for which it was programmed. Many Sigtech’s were sold with the idea that they would be dealer programmed-but now there are no dealers. Unless you know someone with the programming setup, you would not be able to use such a unit effectively.)

In my experience, these analogue parametric and especially multi-coefficient digital EQ units are far more effective than acoustic bass treatment. Of course you can do a lot by careful speaker placement plus treatment but you will never get bass as good by physical means as you can with electronic adjustment. Just look that the graph below and believe!
And remember that these steady state measurements tell the whole story-on account of minimum phase!!

Harbeth Monitor 40, corrected with TACT

The High Frequencies

As it happens, high frequencies are quite easy to absorb. It is also to easy to figure out what points should be treated to do the absorbing. A piece of foam will do the trick, and to figure out where to put such pieces, all you need to do is think of the walls, floor , and ceiling of your room as mirrors and find the “equal angle” reflection points.

This really works. Look at the before and after graphs below. This shows the erasing a high frequency reflection in this case off the ceiling, from a point source speaker(Harbeth Monitor 40). Foam up, reflection gone.

Harbeth Monitor 40 , corrected with foam

Harbeth Monitor 40 , Waterfall

Harbeth Monitor 40 , “Waterfall ” graph

Another example is the effect of soaking up a reflection off the back wall of a 18′ by 20′ room, in this case involving the McIntosh speaker. As noted , it is very directional so there is little coming off the side walls or ceiling, and it is a forward radiator so there is little off the wall behind the speaker. But the wall behind the listener is a different story. Look at the energy after the direct arrival, with no foam and then with foam installed. A nasty spike of 6 kHz energy is just erased.

McIntosh Speaker, foam and no foam

This is all quite easy, if you do not mind a few pieces of foam stuck around(you can make them removable for when appearance becomes a premium.)

And the results are quite spectacularly good. Note, however, that EQ is not effective here, except for the correction of the speaker itself to have an ideal direct arrival. To try to EQ out a high frequency reflection would give a good result over only a very tiny area. Move your head a tad and the correction would be compounding the error! Absorbing things is the way to go here.

People often talk about over-damping the highs. But in my experience, with the kind of (classical concert) music I mostly listen to, this simply never happens. Concert halls start to roll off from air absorption around 4kHz and by 8 kHz their reverberant field is almost devoid of energy, to the point that people do not even bother to measure concert hall response above 8k. As long as you have flat direct arrival, there is going to be plenty of high frequency energy. (You have to add into this picture the fact that the ear has a big peak in diffuse field response around 10k ,which you most definitely do not want to zap with anything.)

The Midrange

If the bass is correctable by EQ and the highs by absorbing, then it seems as though we are almost home. There is nothing left but the middle frequencies. But there’s the rub. Too long in wavelength to be easily absorbed, too short in wavelength to be easily and stably (with respect to listener position) EQed -what to do?

In principle, if you do not mind sitting reasonably still, the type of EQ used in the bass can also be applied in the midrange. The good listening area decreases with increasing frequency (higher frequencies give shorter wavelengths so the geometry of things scales reciprocally). This has to be done carefully. It is possible to end up with a situation that sounds like a colored speaker with a floor bounce -instead of the intended uncolored speaker without floor bounce!

The best chance of success is when there is the least to correct. Sometimes good behavior can be arranged with just a little foam on the floor.

Spendor SP1/2:   foam, no foam

Unfortunately, it more often happens that midrange frequencies bouncing off the floor are an awkward thing to deal with in terms of acoustic treatment. (The Spendor SP1/2 just shown was actually designed to be loaded by the floor for specific speaker height) So one might better try to get speakers that have some vertical directivity.

The ultimate form of vertical directionality is the line source: there is no bounce off the floor at all(or more precisely the bounce off the floor is part of the direct sound itself). And indeed such speaker correct very well indeed. The graph shows the in-room, listening position impulse response of the imposing DALI Megaline: initial arrival plus….well nothing, in effect.

DALI Megaline , impulse response

But such heroic measures are not strictly necessary. An MTM array already helps a lot. (Dunlavys used to correct beautifully with the Sigtech). And speakers like the Gradient Revolution and 1.3, with directional midrange drivers tilted to be aimed away from the floor can do an excellent job of diminishing the floor reflection.

Gradient 1.3, impulse response

Once this happens, the final correction of the small amount of floor bounce actually produces the goal: the sound of a flat response speaker without a floor bounce at all.

How Well This All Works Out In Practice

Audio is full of things that make some audible difference -people are very sensitive to how things CHANGE when they make before and after comparisons. But if one turns not to “can I hear the change” (answer usually yes) to “is the difference important in terms of replication of the original sound” then priorities change. Many of the audiophile tweaks become almost irrelevant. Of course, this sounds like a paradox. If the change is audible, then one of the two possibilities must, it would seem, be better than the other. But this is not really so.

Most audio systems are making fundamental and large errors. In particular, they are making large frequency response errors induced by the room if not by the speaker itself. These errors are not easy to notice because they are not easily changed. There is nothing to compare with! But when the errors are eliminated by DSP, then it becomes obvious that they were there and were so large as simply to dwarf the tiny response changes that arise, say, from changing cables.

Speakers themselves seem to make large errors, independently of room effects. Many are not quite flat enough, and all of them have much more distortion than electronics. So it is natural to ask , just how good can it get if I get the room out of the system and hear the direct speaker sound, or the direct speaker sound improved a bit (or even a lot) by DSP?

There is a way to tell: One can do a by-pass test. This is the one kind of test in audio that involves no external assumptions at all. If one cannot hear the insertion of a device into the chain, the device is effectively audibly perfect.

For speakers, the way to do this is to take a highly accurate microphone, play a signal through a speaker (anechoically) and record the signal the microphone picks up. Then one can compare on any playback system you like the recorded microphone signal and the original signal

I suppose that almost everyone would expect there to be a lot of difference between the original signal and the signal sent through the speaker and recorded. And if you try this with most speakers in an average room set-up, indeed the differences will be large. But with flat response speakers in an anechoic environment, it turns out rather surprisingly that there is almost no audible difference. I asked well-known designer Jim Thiel how close the acoustic signal from his speakers was to their input signal. He replied “Out of doors up in a tree, the similarly is SCARY.” Similar experiments have been done by Gradient using for example the B and W 801 of some years ago. Probably most of us do not think of that as an ideal speaker. And yet without the room effects, the result was startlingly close to facsimile reproduction.

What happened to all the supposed problems, the micro-dynamics issues, the grain, the noise, the distortion, and all the other ills that audio is supposed to be heir to. In the fact, in the actual trial, they do not seem to count for much. And one is driven inevitably to the conclusion that room effects dominate what is wrong with what we are hearing.

This is not a new idea. Peter Walker said many many years ago, in response to someone asking about whether he thought he could improve the Quad ESL-63, that he thought he really could not do much better, “unless you could somehow get the room out of the system. That would be a new era in high fidelity.”

Perhaps the new era is not quite here, but it is getting closer. With the right speakers, the right room treatment, and the some judiciously applied digital signal processing, facsimile reproduction of what is on the recordings is almost here. Now if people would make the recordings right…but that has to wait for another time.