Preprint No. 2998
Presented at the 89th AES Convention
1990 September 21-25
SOUND FUSION AND THE
ACOUSTIC PRESENCE EFFECT
Arthur M. Noxon
Acoustic Sciences Corp.
Eugene, OR 97402 U.S.A.
In the perception of sound, early reflections are corrolated with
the direct signal by the listener. Comb coloration effects arise
when there are too few specular, coherent reflections. Masking develops
with random phase, incoherent reflections. An early arriving, statistically
diffuse group, composed of coherent reflections with random time
offsets produces excellent sound fusion. Essentially an acoustic
presence effect, applications include digital sampling, instrument
and vocal recording, and speech therapy for hearing impaired.
A clean, direct signal is the most common " signal of choice
" in the recording world. The rationale is that any desired
effect can always be added later with processing. Even the most
primitive, one-man jingle shop has a tiny closet, its interior covered
with sound absorptive foam or fiberglass. Inside the "box"
is a basic vocal booth, a mic, windscreen and eventually, the talent.
An acoustic system has been developed to saturate the sound fusion
(Haas effect) time period with a group of statistically diffuse
coherent reflections. Three years ago, the design strategy, mechanical
configurations and the acoustic signatures for this technique was
introduced at the AES as a digital sampling booth. This acoustic
conduction has since been coined QSF, which stands for "Quick
Sound Field". Here is presented a follow up report covering
some of the applications for this acoustic technique which have
developed since its introduction.
An anechoic recording space may seem simple in concept but it is
difficult in practice. Early reflections usually do exist - off
of the script stand, paper, window, light fixtures, the floor and
other patches of sound reflecting surface. A real-world vocal booth
has any number of discrete reflections and resonance problems that
add to and color the direct signal. A highly absorptive space that
is somewhat acoustically dirty is most difficult for the engineer
to mic and for talent to work in.
Mic placement is very sensitive to the coloration effects of discrete
early reflections and resonance. The sound of the talent is colored
by the effects of the mic position. Often, setting up means no more
than choosing the best coloration effects. Since consistent sound
of an audio track is very important to the engineer, dubs take an
inordinate amount of time as the engineer fishes for mic and talent
positions in the room, trying to recapture the coloration of the
prior day's work.
A dead vocal booth provides little to no acoustic feedback for
the talent. Talent suffers sensory deprivation while in the box.
A monitor system is essential for talent to be able to adjust intonation
in real time. Electronics and earphones are resorted to in the absence
of a natural acoustic return. This then further contributes to isolation
of the talent in that the direct sound path of their voice is also
cut off. By the time traditional recording techniques have been
applied, the only natural acoustic feedback left for talent is conduction
through the jawbone.
Sensory deprivation and coloration effects found in a typical vocal
booth limit its effectiveness. Time is a shortage commodity in the
studio. Wasted time in any business, especially the recording studio
is to be avoided. The typical vocal booth wastes studio time. Setting
up a mic is a delicate time consuming balancing act - talent and
mic position vs. room color. Retakes due to a lack of real time
acoustic monitoring for the talent takes up additional studio time.
A dub is very difficult to set up in order to recapture the original
sound. And then, there is the post processing time spent in the
effects rack trying to convert the track into a lifelike, naturally
bright and open sound.
It is to be expected that the traditional vocal booth will eventually
be redefined, steps taken to bolster its positive features and reduce
the negative effects. One form of this is accomplished by putting
to work the Haas effect in which early reflections are corrolated
with the direct signal. By arranging for a diffuse group of coherent
early reflections, the room coloration effects that appear when
there are too few reflections are averaged out. Any low level discrete
reflections that might remain are overwhelmed by the diffuse reflections.
The diffusion must also be rapidly attenuated in order to not stretch
into the echo effect time period, outside of 50 ms. Therefore, in
addition to a strong diffusing function, this new class vocal booth
must retain a very fast decay rate.
2 ETC - VO BOOTH
The generally recommended ETC for control rooms is a direct signal
followed by an early time gap (ETG) due to a reflection-free zone.
Outside of this is found a diffuse room ambience with an RT60 of
about 1/5 to 1/2 second. The purpose of the ETG is to allow the
engineer to hear local colorations of the signal at the mic. It
is therefore 50 to 40 ms long, the time of the Haas or sound fusion
The ETC for a voice over (VO) booth has to fit inside of the ETG
of the control room. The VO Booth has to be at least 50 dB within
the 55 ms ETG. The VO Booth RT60 ought to be on the order of 70
The only remaining detail is to establish the content of the decay
envelope of the VO Booth. There are two phases to the very early
reflections. Echolocation cues occur within the first 5 ms. Ambience
and coloration effects occupy the balance of the time period.
The direct signal needs to have a 5 ms very early time gap (VETG).
This allows time delay phase pan techniques to be used by the engineer.
Beyond the echolocation time gap lies the rapidly decaying ambience
If there are just a few discrete reflections, mic ambience is colored
due to phase add and cancel effects. If there are no reflections,
we have the dead room sound and no ambience. We could have many
reflections at the mic. If they are orderly, as with a flutter echo,
they would produce coloration. If disorderly, they would create
colorless ambience. However, the quality of these reflections needs
to be carefully specified.
Fig. 1 - ETC Control Room
Fig. 2 - ETC VO Booth
3 COHERENT OR INCOHERENT REFLECTIONS
The ear/brain system is a sound processor. But, so is a mic/spectrum
analyzer. While they both recognize the spectral character of sound,
there are important differences. The ear/brain acts as a correlation
type signal detector. The very early reflections are correlated
with the direct signal. By this process the early reflections are
additive to and enhance the definition of the perceived signal.
This is not news - it is the well known Haas, precedence, or sound
On the other hand, a correlation signal processor differentiates
between two types of echo. The coherent reflection has a simple
time delay offset but otherwise is a phase aligned representation
of the direct signal. An incoherent reflection can also be time
delayed but is a phase scrambled representation of the direct signal.
A coherent reflection can have the same spectral content as an
incoherent reflection. They would look identical to a spectrum analyzer.
However, the isolated coherent reflection would produce comb filter,
phase add and cancel effects when added to the direct signal. The
single incoherent reflection would simply add sound power to the
direct signal. In correlation signal enhancement only coherent signals
are processed into a spectral display- Incoherent signals such as
noise, reverberation and including random phase reflections mask
the spectral detail of the direct signal. (This is easily audited
by listening to harmonic detail of a plucked guitar string with
and without random phase reflections in the rearfield.)
An envelope of statistically diffuse but coherent early reflections
that lies within the 50 ms time window of the Haas effect comprises
a near field ambience effect that adds to the quality of the direct
signal. The composite signal has more top end, is brighter and more
natural. It is a more open sound and with air. Statistically diffuse,
Haas effect ambience is an acoustic enhancement technique that puts
signal that the engineers prefer onto tape.
4 THE HAAS BOX
This class of vocal booth must retain a very fast decay rate and
in addition develop a strong diffusion function. It typically has
an RT-60 decay time of 80 to 100 ms and a diffusion rate of over
1000 reflections per second. The booth has absorbers and reflectors
distributed over its entire interior surface. The component of direct
sound that hits a reflector is backscattered, partially back towards
the mic, partially into an absorptive strip and partially onto other
reflectors. This process uses only specular and diffractive diffusion
to maintain the coherent quality in its early reflections.
The mean free path in these small rooms is about 4 feet. The broadband
absorption coefficient is about 50%. That means the expanding wave
front loses about 5 dB every 4 ms. This pencils out to a 60 dB decay
in 80 ms and to a 60 dB decay in 80 ms. The wall of such a vocal
booth would likely have reflectors alternating with absorption on
about 9 inch centers. A 5 foot wide wall would splinter a flat wave
front into maybe 7 separately expanding reflections. This sound
scattering process continues throughout the decay. The result is
easily counted in the ETC and one to two separate reflections per
millisecond is the diffusion pate. For all practical purposes, the
mic receives a direct signal followed by 4 to 5 ms of no sound;
then, as the first arrivals hit, so begins the controlled decay/diffusion
process in the room.
A typical vocal booth has a window. In designer studios it would
be tilted to not reflect signal into the mic. In a highly diffuse/absorptive
room there should not be a large area of untreated reflection regardless
of the angle. Current practice in these rooms sees tall, absorptive/reflective
wall mounted acoustic units with narrow strips of wall space between.
The free wall space between the acoustic control units can easily
be glass or plexiglass strips which provides a mope open feeling
in an otherwise small room. Visual openness contributes to mope
comfort for the talent in long recording sessions.
The statistical populated envelope of very early, coherent reflections
is essential to the stability of the acoustic space inside the booth.
Engineers report a wide and smooth acoustic space. They even lose
track of which mic is open and have to mark the faders. Usually,
in a more traditional rooms an engineer simply hears which mic is
where. In a statistically diffuse space, the mic position can be
changed without changing the envelope. It is the envelope that is
distinguishable and not its internal detail. Moving the mic only
changes the fine structure as to which reflection arrives when and
how strongly. This does not change the statistical envelope or the
quality of sound. In a room 4 foot by 6 foot, there would be a 2
x 4 foot central area in which the sound remains uniform, regardless
of mic or sound source location.
The floor plane is a large reflecting surface. It is left untreated,
to be an acoustic mirror effectively doubling the height of the
room. Ceiling treatment must be accordingly more severe to keep
the vertical decay and diffusion rate up with that of the walls.
Fig. 3 - QSF Vocal Booth
5 VOICE OVER GOBO
The Haas ambience effect can be approximated out in the open room
or field - of course, not to the degree available in an iso booth
format, but this QSF gobo setup boosts the signal to noise ratio
at the mic by 5 to 7 dBA. This is accomplished by increasing the
"direct" signal strength I to 2 dB while reducing the
room noise by 4 to 5 dB.
This " gobo " is not the large, flat rug-covered plywood
gobo of years past. The present method is to use a set of 7 to 9
sound control units, typically placed on 18" centers in a horseshoe
pattern. The mic is located in the middle and the talent occupies
the open heel end of the pattern. These Traps have two sides. The
broadband absorptive side faces outward to intercept inbound room
noise and reflections. The membrane reflective side, effective 400
Hz and above, faces inward to produce the statistical group of early
coherent reflections. In this system, absorption is replaced by
transmission. Sound is not absorbed between the reflectors. It is
leaked out of the space. In either case controlled decay and diffusion
Fig. 4 - QSF Vocal Gobo
Gain of the "direct" signal is accomplished by adding
very early multiple reflections of the direct signal to the direct
signal. This is completed within the first 50 ms of the sound fusion
time period. Although sound fusion generally lasts 50 to 60 ms,
a "smearing" accompanies the presence of strong, late
high frequency reflections. This is undesirable for the recording
engineer. The end of the sound fusion period marks the onset of
echo detection. For lower frequencies the echo onset time is later
and for highs, sooner than 50 ms.
In the QSF method of developing the statistical ambience, the comb
filter effect associated with any individual reflection does not
occur due to the large number of random time offset reflections.
With 20 to 50 reflections occupying a time span of 20 to 25 ms,
the comb filter effect that would arise with any one reflection
is obscured by the averaging effect of the other reflections.
Fig. 5 - QSF ETC, 0-20 ms
A good signal at the mic can be time delayed for stereo phase
pan positioning. The echolocation process occurs within the first
5 ms following the direct signal. Because of the distance between
the mic and the reflecting side of the gobo, no reflections arrive
within the first 5 ms. The direct signal is well isolated for control
in the mix.
Not only is the direct signal enhanced but the ambient noise floor
is reduced at the mic by this technique. The backside of each Trap
is broadband absorptive and facing outwards towards the room. Sound
in the room is absorbed before it gets to the mic.
Sound that does penetrate the perimeter is weakened because the
wavelet expands due to diffractive edge effects. Easily a 5 dBA
noise level reduction is noted inside the gobo. There may be times
when a stronger signal to room noise is required. The closer the
Traps are to each other the less outside noise they will let in
so the direct signal becomes stronger.
Fig. 6 - QSF Gobo Isolation
Noise in a room also originates with the talent. Sound does
leak out between the traps. Some of this is attenuated by the absorptive
half of the trap and the remainder expands rapidly due to edge diffraction
effects. The sound leaked to the room is rapidly diffusing. The
important feature is that a sound from such a gobo produces no flutter
effect. Sound that does bounce off a wall is absorbed by the backside
of the gobo traps. The system can also be used near walls with minimal
Fig. 7 - A Diffusive 'Source'
Incidentally, another application of such a gobo system takes
advantage of its reversibility. If all the Traps are rotated then
the full bandwidth absorptive side faces the mic. This creates the
traditional dead sounding vocal booth. By adjusting a pair of reflectors
slightly inward, the interior diffusive top end can be brought up.
This is best done in pairs to take advantage of diffusive multiple
scattering available from facing reflecting surfaces.
Fig. 8 - Dead Configuration
© 2009 Acoustic Sciences
Corporation. All Rights Reserved.
TO PART TWO