Sweep: Audio Editing, Scrubbing and Latency VisualisationConradParker2003CSIRO AustraliaIntroductionSweep
is an open source editor for digital audio. The project started in 2000, initially based on the waveform
editing widget from Soundtracker. The motivation then was that the task of
editing sound would be made more accessible to average users by mimicking the familiar desktop image
editing interface of the GIMP.
In early 2002 the support of an animation
studio prompted an overhaul of the user interface and the tackling of harder problems related to
application latency. This paper discusses that work, including the motivation for major usability
improvements and new directions, including Sweep's use for live performance.
Motivation
Usability is a critical factor in the design of an audio editor on any personal computer, because
the user interface of such devices is strongly geared towards visual manipulation of data objects.
Digital audio editors in general exhibit a number of usability problems.
Firstly, the visualisation of audio data is often poor, providing little indication
of the audio content. Secondly, although the data being edited is audio, existing software provides
few opportunites to actually hear it beyond a fixed speed playback that cannot be invoked during
editing or even navigation. Lastly, the lag between audio and its visualisation introduced by
buffering in the audio device is often poorly managed by editing software, introducing a disconcerting
delay between user interaction and audible response.
The user interface for Sweep was designed to avoid such shortcomings.
Computer editors for text and visual media such as video allow the user to visually scan and
navigate through a work, and to immediately see the outcome of any editing operation.
Audio editors, on the other hand, generally provide only a rough
outline of the waveform in the form of a graph depicting peak values. This
is sufficient to discern major differences, such as that between silence
and loud sounds, and to notice the effect of large edits, such as cutting.
Such a graph can be used to perform simple editing operations such
as topping and tailing, the
removal of silence at the start and end of a recording. This
representation, however, is completely inadequate for depicting the operation of
more subtle operations like noise reduction or reverberation,
which can greatly change the sonic texture of a sound with little effect on loudness.
Whereas navigation through a text document in a visual text editor implicitly
provides an indication of the content at the cursor position, in an audio file it commonly
takes a number of seconds of listening simply to find the context of one's place. Precise placement of
the audio cursor often requires tedious juggling with fixed-speed playback and transport
controls such as fast-forward and rewind.
However, in the world of analogue audio editing, such as with tape reels, the
user experience is far more tactile. The tape can be moved at an arbitrary
speed back and forth past the playback head, allowing the user a detailed scan of the
material being edited, and a precise search for suitable edit points such as the
onset of musical pauses or the completion of syllables in speech.
The latency introduced by a non-realtime multi-tasking operating system is another
crucial factor in the design of interactive audio software. Due
to the requirement of fair scheduling, it is not possible for such a system
to guarantee that data written to the audio device will be heard at exactly
the right moment; if scheduling delays cause an audio application to be
starved of access to the audio device up to the time when sound is due to
be played, an audible glitch will be heard. Although very brief this sound
is often extremely jarring, may cause damage to speakers and if not detected in
software can cause a loss of synchronisation between the audio and video or other
applications. In order to compensate for unpredictable scheduling, applications can
increase the size and number of the audio driver's buffers. Larger buffers
can go a long way towards ensuring that no glitches are heard, however this
degrades interactivity. The size of the buffers is directly proportional to
the time delay between the application writing to the audio device and the
sound being heard, and for sounds triggered by interactive events this
introduces a delay between user input and the expected sound. For audio editors this
delay manifests itself during playback as a discrepancy between the cursor position on
screen and the sound heard by the user. A delay in responsiveness of more than about 10ms
is easily noticed by the human ear, and can be quite off-putting in musical applications
as it interferes with rhythm of a musician's performance.
These shortcomings are not present when editing or navigating audio in the analogue domain,
as is done with recording on analogue tape reels or cueing songs on vinyl records; in fact
the responsiveness of vinyl is so precise that it is regularly used as a performance artform
in its own right.
Thus the motivation in improving Sweep's usability was to make it comparable to editing in the
analogue domain, and in turn to extend its usefulness as a tool for live performance.
Implementation
The most important new features that have been implemented in the recent version of Sweep
are an advanced form of scrubbing which models the physics of a turntable for playing
vinyl records, and improvements in the visual synchronisation to depict application latency.
This section introduces the implementation of Sweep's waveform visualisation and the recent
improvements.
Visualisation
Sweep 0.1 improved on the visual representation of audio data by combining a display
of the waveform peak with an overlay of the average value. Together these provide the
user with a notion of both the overall loudness and the dynamic shape of the sound.
Additionally, a 3D bevel effect was applied to the waveform rendering, which by
emphasising the differences in peak values, accentuates pitch differences at various
zoom levels, providing a rough indication of sonic texture. Although not strictly
providing any complex analysis, this often provides just enough
extra visual texture to distinguish between simple instrumental and vocal
portions of a recording. An example of Sweep's waveform rendering is shown
in .
Screenshot of waveform view in SweepScrubbing
The major addition to Sweep's usability was the implementation of
interactive scrubbing. Scrubbing in a digital media
editor allows the user to locate specific items of interest or jump directly to specific points
in time by interacting directly with a timeline.
Sweep features a number of innovative, complementary scrubbing methods:
A scrub tool allows the user to jump directly to specific
portions of the waveform on screen, and gives immediate audio feedback. By slowly
dragging the mouse cursor over the waveform, the user can interactively listen to the
waveform to sample accuracy.There is immediate audible feedback when selecting a region or moving the edges of
a region. This makes the editing task more intuitive because the user can hear
the region edges while they are being selected.The timeline above the waveform is available as a scrubbing mechanism
during playback, and otherwise allows direct placement of the cursor.
Simply dragging the horizontal scrollbar during playback allows the user to very
quickly move through the file with audio feedback.
Sweep's scrubbing was modelled on the quality of interaction available
when working with tape reels and vinyl records.
Vinyl records are such a directly responsive format that a skilled user such
as a professional disc jockey is able to use them to quickly cue and mix
together songs, and for some musical genres such as hip-hop, the skilled
practitioner incorporates the audible scanning of the record under finger-tip
control into the music, in an artform known as
turntablism. This advanced level of interactivity
was used as a benchmark -- if a digital audio editor could be created with
such direct responsiveness that it could be used artistically, it would
surely provide a much needed usability boost to the more mundane task of
editing. In turn, this introduces the possibility of easily editing the sounds
that are used in performance, which is of course impractical with vinyl.
The audible characteristics of vinyl, especially when played on the turntable
of a professional disc jockey, are subtly different
and inherently more pleasing than the simple fast playback of a tape
reel. Three contributing factors are wear on the record groove, non-linear filtering
introduced by forced motion of the stylus, and controlled momentum of the turntable
under the action of a slipmat.
Firstly the "smoother" sound of vinyl is somewhat due to physical wear introduced
by contact of the stylus each time a record is played, such that over time the groove
is widened and high-frequency details are smoothed over. This is a general trait
of vinyl records and introduces a constant distortion of the sound, so it is not desirable
to explicitly model it in a digital audio editor as this would misrepresent the audio
data during editing.
Secondly, the physics of moving a stylus quickly through the groove of a
vinyl record introduces a complex filtering. The microscopic shape of a record groove is
depicted in , with stereo channels encoded as horizontal
and sideways variations. Upon forced motion the stylus' increased momentum causes it to
skip over the high-frequency details encoded in the groove.
This filtering removes much of the annoying high frequency components which are
introduced by the increase in playback speed. Although the actual filtering
introduced by a stylus on vinyl is non-linear and would be costly to
implement in software, it is usefully approximated by the application of a
simple lowpass filter.
Cutaway diagram of vinyl groove.
Lastly, the weight of a turntable provides a fair amount of momentum, such that when
a record is sped up by the disc jockey's finger, it takes some time to slow down to
the drive speed of the turntable. This momentum also provides a more subtle smoothing
of the record's motion, such that any sudden changes invoked by the disc jockey produce
a somewhat less marked change in the record's playback. A similar amount of momentum
was modelled in Sweep's scrub tool, such that if desired the cursor can be thrown back
and forth along the waveform display, and such that sudden changes in direction and
speed are smoothed over to provide non-jerky responsiveness.
Monitoring playback latency
Recent efforts have vastly improved the ability of the Linux kernel
to schedule interactive events, including low latency work by
Andrew Morton and
Ingo Molnar,
and Montavista's work on kernel preemption maintained by
Robert Love.
This work has been so effective that with a properly tuned kernel
the latency introduced by audio buffering can be reduced to the vicinity of
1 ms. However this currently requires some configuration on the user's part,
and is specific to Linux. It is also important to realise that the latency
percieved by a user is not only introduced by the kernel, but also by the
application, and it is the application's responsibility to take the total
latency into account when synchronising audio with visuals. The basic
configuration window for selecting the amount of device buffering requested
by Sweep is shown in .
Sweep's device buffering configuration.
For the sake of portability and acceptable behaviour when running stock
kernels, it was necessary in Sweep to introduce some visual feedback of the
delay caused by device buffering. During playback, Sweep displays two
cursors simultaneously, as shown in : the white
cursor to the right is under the user's control, and can be moved by the transport
controls and the scrub tool; the green cursor to the left always displays the position
of the audio that can currently be heard. Hence if the user scans or scrubs through
the file, the white cursor is moved immediately but the green cursor may
lag slightly due to buffering in the audio device, and due to motion smoothing introduced
by the modelling of momentum. Thus the user has a true representation of their influence
over the playback position, and is not misled by contradictory audio and visuals. This
also provides an obvious visual representation of the application latency, which is
otherwise a fairly abstract concept.
Sweep's cursors: playback (left) and user (right)Conclusion
The usability of Sweep has been vastly improved with a goal of
recreating the style of interaction possible in the analogue domain. Along with Sweep's
waveform view of peak and average data values, the implementation of vinyl-like
scrubbing and accurate monitoring of playback latency has greatly improved the overall
usability of the program.
Additionally, the modelling the characteristics of an analogue turntable made it possible
to use Sweep in a completely new way, as a tool for live performance.
A review in the March 2003 issue of Linux Format magazine presented Sweep
as "a capable performance application that edits as well", emphasising the usefulness of the
software as "a DJ's best friend".