搜索 Search
《计算机音乐教程》/Computer Music Tutorial
/
出版于/Published in 2011
译著/translated book
文献作者/Author:
文献译者/Translator:
文献校对/Proofreader:
文献收集/Collector
文献顾问/Consultant:
关键词/Keywords:
读物/Reader:
《计算机音乐教程》分为七个部分,每一部分包括了若干章节。 第一部分基础概念,介绍数字音频和计算机技术。 第二部分集中在数字声音合成。第3章到第8章介绍了主要的合成方法,包括对实验性和商用性都有效的方法。 第三部分混音和信号处理,包括四章内容,让这些有时令人难以捉摸的主题不再显得神秘,包括混音、滤波、延迟效应、混响和空间操控。 第四部分的主题是声音分析,占有支配地位,它是很多音乐应用如声音转化、交互式演奏与音乐录制的关键,包括音高、节奏和频谱的计算机分析。 第五部分主要介绍计算机音乐系统有关音乐家界面的重点主题。其中,第14章的主题是演奏者可操控的物理设备,第15章研究了解释演奏者姿态的软 件。第16章是对音乐编辑系统的一个概观。音乐语言是第17章的主题。第五部分的最后两章(第18、19章)介绍了算法作曲。 第六部分解密计算机音乐系统,第20章考察数字信号处理器的内部构建,第21章讨论流行的MIDI接口协议,最后,第22章介绍了计算机、输人设备及数字信号处理硬件之间的互连。 第七部分介绍心理声学也就是探讨听觉,即人类感知。有关心理声学基本概念的知识在以下几个方面有助于计算机音乐的研究,包括声音设计、混音以及解释信号分析程序的输出。 本书的最后部分是一个技术性附件,为读者介绍傅里叶分析的历史、数学原理及整体设计,特别是快速傅里叶变换,这也是计算机音乐系统普遍使用的工具。
本书不仅是为音乐专业的学生,也是为以计算机音乐为研究方向的工程师和科学家编撰的。本书的很多部分打开了技术的“黑匣子”,揭示了软件和硬件的内部运行机制。为什么这些技术信息与音乐家有关呢?我们的目标是让音乐家更好地掌握和使用音乐技术,而不是要把他们变成工程师。技术上无知的音乐家有时候对这个快速进化的工具的潜在可能性抱有过于狭隘的观念,他们可能还在受过去年代某些过时观念的制约。因为缺乏基础知识,他们在盲目实践中浪费时间,不知道如何将想法变成实用的成果。因此,本书的目的之一就是给予那些想要最终建立和经营私人或公共机构计算机音乐工作室的众多音乐家以相关的知识,使他们在这一领域能有独立的判断力。 每一种音乐设备和软件包都采用一套不同的规程:专用术语、标记系统、指令语法和界面分布等。这些不同的规程均建立在本书所解释的诸多基础概念之上。面对大量不相兼容而又不断变化的技术环境,对于一本教科书来 说,传授基础概念比起详细讲解一个指定语言的特性、软件的运用或者合成器来似乎更为适当。因此,本书并不打算教读者如何操作某一设备或者软件,因为那是每个系统所提供的文档的目标。 在一本涵盖如此丰富的不同主题的书中,为读者进一步的学习研究提供指针是绝对必要的。本书后面的部分包括了大量的引用和多达一千三百余条的参考文献列表。作为对读者提供的深层服务,我们投人了大量时间以保证人名和主题词索引的广泛详尽。
采样合成 “采样”一次是由已有的概念数字样本(samples)以及采样率(sampling rate)的概念衍生而来的。不管是否带有琴键,采样乐器(sampling instruments)已非常普遍。所以采样乐器设计都围绕着一个基本理念,是使其能将先期录制的声音移到期望的音高上回放出来。 采样系统不是由一小段固定波表中读取一个波形周期,而是读取一个大的波表,里面有数千个单独周期,即几秒钟的先期录制声音。由于采样波形在声音事件的起冲、延留、衰减等不同部分的变化,所以能得到丰富且随时间变化的声音。我们随意决定采样波表的长度,唯一可以限制是采样器的存储容量。大部分采样器提供光盘或磁盘驱动的接口,可以迅速地将很多采样加载到采样器内。 具体音乐与采样:背景 20世纪40年代后期在变速唱机上的实验后,皮埃尔·舍费尔在1950年间,于巴黎成立具体音乐工作室。他与皮埃尔·亨利开始使用磁带录音机录制并制作具体声音。具体音乐(musique concrète)所指的,就是使用麦克风录下的声音,而不是在纯电子音乐中使用的合成手段生成的声音。但该词同时也指用声音来操作的一种作曲方式,具体音乐作曲家是直接用声音对象来作曲的。它们的作品需要新的图示记谱法,而不属于传统交响乐记谱的范畴。 Fairlight Computer Music Instrument(CMI)是第一个商用键盘采样器(1979年,澳大利亚)。 循环 循环可将键盘上播放的采样声源延长,如果音乐家按着琴键,采样器会“无接缝的”读取此声音,直到放开键盘为止。这可由指定采样循环的开始点与结束点来实现。在音符的起音结束后,采样器会重复读取波表中的循环部分,直到放开琴键,接着播放此音符波表的完结部分。 制造无接缝而自然的传统乐器取样需要格外小心。循环应从音符起音后开始,在衰减之前结束。 一个循环的开始点与结束点可以在两者共同的取样点上直接叠接(spliced),或用交互淡出(crossfaded)方式连接。叠接是从一个声音直接切到另一个声音。波形叠接会在切点上造成噼啪声,除非循环开始点与结束点恰好相配。交互淡出表示循环的末端会逐渐渐出,而循环前段再缓慢渐入;交互淡出的程序会在按住琴键时一遍一遍重复。 音高移位 在廉价采样器上,可能无法储存原始乐器演奏的每个音高。这些采样器在每三到四个半音间储存一个声音,从邻近的音高移位后得到中间的音高。如果你自己录下一个声音,存到采样器内存内,再按下不同的琴键播放之,采样器也运用了相同的音高移位技巧。简单的音高移位的副作用是,由于所按琴键不同,声音延长的时间长短不一。 音高移位的方法:两种方法都称为时域(time-domain)技术,因为他们都直接操纵时域上的波形。这与频域(frequency-domain)音高变移技术不同。 用恒定的采样播放频率方式获得取样率转换的音高移位。(上图)如果在播放时,每隔一个采样就略过一次的话,信号会被抽取,音高就会向上移动一个八度。(下图)如果在播放时用插值的方式使采样数量增加一倍,信号就会向下移动八度。
采样 获取样本的速率即采样频率(sample frequency),以每秒的采样数标示。这是数字音频系统中很重要的规约。采样频率常被称为采样率(sample rate),以赫兹表示。 模拟信号的重建 数字信号并没有显示柱形之间的值。一个柱形的持续性极窄,可能只持续0.00002秒(十万分之二秒)。这就意味着如果一个原始信号在柱与柱“之间”变化的话,这个变化并不会体现在柱形的高度上,至少等到获得下一个采样。专业术语称图1.3b中的信号是离散(discrete)时间定义的,每一个这样的时间由一个采样(垂直柱形)来表示。 数字声的一个神奇之处就在于,如果这信号是带宽受限的,那么DAC以及附属的硬件就可以根据采样精确地重建原信号!这就意味着,在一定条件下,在“采样之间”丢失的信号可以被复原。这在数字经过DAC和平滑滤波器时会发生。平滑滤波器在离散样本之间“连接各点”。由此,一个扬声器的信号看上去和听起来就像原信号一样。 混叠(迭影) 图1.6g表示一个每10个采样有11个周期的波形。这个关系也可以表示为每个采样为11/10个周期。 在图1.6i中,重新合成的波形则在极为关键的方面与原始信号大相径庭。即重新合成的波形的波长(循环的长度)与原始波形的波长不同。这一类错位被称为混叠(aliasing)或迭影(foldover)。 发生混叠时的频率是可以预计的。假设,尽量取简单的数字,我们每秒取1000个采样,那么,图1.6a中的信号就是每秒125个周波的频率(由于这里每周波有8个采样,即1000/8=125)。 * 1.16d, the signal has a frequency of 500 cycles per second (because 1000/2 = 500). *在图1.6d中,信号的频率是每秒500周波(因为1000/2=500)。 图1.6g中的原始信号的频率就已经被采样率转换(sample rate conversion)的过程改变了,而这对于音乐信号而言是不可接受的改变,必须尽量避免。 采样定理 只要原始波形的周期有不少于2个采样,我们就可以假定重新合成的波形会保持同样的频率。当每个周期少于2个采样,原始信号的频率(或音质)就会丢失。 假定我们将一个26kHz的模拟信号引入一个工作频率为50kHz的模拟-数字转换器中,转换器读取的是24kHz的音,因为50-26kHz=24kHz。 采样定理描述采样率与被传递信号带宽之间的关系。 采样定理的要旨可以简述如下:“为了能够重建一个信号,采样频率必须至少2倍于采样信号的频率。” 最高频率(采样率的一半)被称为“奈奎斯特频率”(Nyquist frequency)。这运用在音乐中,奈奎斯特频率通常是人耳所能听见的最高频率,超过20kHz。这样,采样频率即可设定在至少2倍,即40kHz。 理想采样频率 很多人在听20kHz范围内的信息(称为“空气”)时受听力限制(Neve 1992)。的确如此,鲁道夫·科尼格(Rudolf Koenig)在年届41岁时,他发现自己的听力已经延伸到23kHz(Koenig 1899)。看来奇怪,新数字压缩光盘应该比20世纪60年代发明的留声机录音设备的带宽更窄,一个新型数字录音机也应该比20年老的模拟磁带录音机更窄。很多模拟系统可以生成超过25kHz的频率。科学试验也从物理学和主观观点联方面验证了22kHz以上的声音效果。 在声音合成的运用中,在44.1kHz标准采样频率下缺乏“频率净空”会产生严重问题。这要求合成算法只生成11kHz(44.1kHz采样率)或12kHz(48kHz采样率)以上的正弦波,否则就会产生迭影。这是由于任何带有基音以外分音的高频分量含有超过奈奎斯特速率的频率。例如,12.5kHz的音高,其第三谐音是37.5kHz,这在一个以44.1kHz采样率运行的系统中会被反落到可听见的6600Hz音。在采样和音高移位的应用中,频率净空的缺乏要求样本在向上调变之前通过低通滤波器。这些限制加强的麻烦带来了不便。 抗混叠与镜像滤波器 两个重要的滤波器确保数字声音系统正常工作。一个放在ADC之前以确保输入信号中不包含任何(或越少越好)高于一半采样率的频率。只要这个滤波器正常工作,那么录制过程中就不会产生混叠现象。所以,这个滤波器就很逻辑地被称为“抗混叠滤波器”(antialiasing filter)。 另外一个滤波器被放置在DAC之后,主要的功能是将数字化储存的采样转化为平滑而连续的信号加以呈现。
声音的模拟表示法 留声机唱片的凹槽的两壁上包含着储存在唱片中声音的连续时间表示(continuous-time representation)。当你把一个模拟录音复制到另一个模拟录音时,拷贝永远不会和原始录音一样好。从本质上说,生成或复制数字声涉及将一连串数字转换为我们刚刚讨论过的某种时变性的变化(time-varying changes )。 模拟,数字转换 我们先看一下数字录音到重放的过程。与模拟环境中的连续时间信号不同,数字录音处理不连续时间信号(discrete-time signal)。麦克风感应气压的变化并转化成电压,电压经线路通过模拟一数字转换器(analog-to-digital converter),通常缩写为ADC(读为A-D-C)。这个设备将电压在每一个采样时钟(sample clock)周期上转换为一连串二进制数(binary numbers)。这些二进制数则被储存在数字录音介质,一种储存器之上。 二进制数 与采用由0到9这10个数字额的十进制(或以10为基)不同,二进制(或以2为基)只采用2个数字,0和1。比特(bit)一词是二进制数字(binary digit)的缩略语。 在一种录音媒介中将比特进行编码的物理方法有赖于那种媒介的属性。例如,在数字磁带录音机上,1可能表示为一个正磁荷,而0则表示无磁荷。这与模拟录音带录音不同,后者以连续变化的脉冲来表示。在光学介质上,二进制数据可能被编码为特定位置上的反射比的变化。 数字,模拟转换:简称DAC(发音为“dack”)。 简而言之,我们可以将空气里的声音变为可以被数字化储存的一连串二进制数字。这一转换过程的中心构件时ADC。当我们希望再次听到声音的时候,DAC就可以把那些数字变回声音。 数字录音与MIDI录音 当MIDI音序器通过键盘记录下人的行为,实际上只有相对很少的控制信息键盘传输给音序器。MIDI并不传输声音的采样波形。 例如一个在小型计算机上运行的48轨MIDI音序器录音程序的价格大约是100美元,允许处理每秒4000字节。相比之下,一个48轨数字磁带机则上万美元,每分钟可处理4.6M字节的音频信息,是MIDI数字率的上千倍。 数字录音的优势在于,它可以捕捉包括人声在内的任何麦克风能够捕捉的声音。MIDI音序器录音则仅限于录制对一系列音符时间指示其开始、结束、音高和振幅的控制信号。如果你将MIDI电缆从音序器接入一个与最初演奏该音序的合成器不一样的另外的合成器,出来的声音就可能完全变了。
频率 声音由一个音源发出通过空气传递到听者耳中。听者之所以能听到声音是由于气压在耳朵里起的微妙变化。如果这压力按照一定的重复模式在变化,我们说这声音有周期性波形(periodic waveform)。如果没有可以辨识的模式,那么就称噪音(noise)。在这两个极端之间是半周期声音与准噪音的广大区间。 周期性波形的一个反复称一个周波(cycle);波形的基频(fundamental frequency)指每秒钟发生的周波的数量。我们用Hz指代“每秒周波”(“cycles per second”)。(Hz是Hertz的缩略,以德国声学家Heinrich Hertz的名字命名。) 时域表示 一种描述声音波形的简单方法是绘制一张以空气压力对应时间的坐标图,称为时域(time-domain)表示。 频域表示 除了基频,在一个波形中还可以呈现很多频率。一个频域(frequency-domain)或频谱(spectrum)表示可以显出声音的频率内容。频谱的单一频率分量可称为谐波(harmonics)或分音(partials)。谐波频率是基频的简单整数倍。更常规地说,任何一个频率分量都可以称为一个分音,无论它是否是基频的整数倍。事实上,很多声音并没有独特的基频。 相位 在y轴或振幅轴上的周期性波形的起点就是它的初始相位(initial phase)。例如,一个典型的正弦波始于0振幅点,一个循环后止于0。如果我们将水平轴上的起始点置换为π/2(或90度),那么,正弦曲线波就将在振幅轴上起始于1并止于1。按常规,这被称作余弦波。实际上,一个余弦相当于一个90度相位移(phase shifted)的正弦波。 当两个信号始于同一点时,就称为同相(in phase)或对准相位(phase aligned)。与此形成对照,那些相对于另一个信号略有延迟的信号,我们称这两个信号为异相(out of phase)。当信号A和另一信号B的相位正好相反(错位180度,故信号A的每一个正值都对应信号B的一个负值)。
实验性数字录音 采样(Sampling)是数字录音的核心概念,即将连续的模拟信号(例如来自麦克风的信号)转化为非连续性时间取样信号(time-sampled)。采样的理论基础是采样定理(sampling theorem),它特别规范了采样率与音频带宽之间的关系。虽然这一定律在贝尔电话实验室的H.奈奎斯特的工作之后也被称为奈奎斯特定律(Nyquist 1928),不过,该定律的另一种形式则在1841年即由法国数学家A.柯希率先提出。英国研究者A.里维斯开发出并注册了专利的第一个脉冲编码调制系统(Pulse-code-modulation, PCM)以“振幅对分,时间量化”的(数字)形式传递信息。 面向公众的数字声 数字化声音在1982年通过压缩光盘(CD)的形式,一个由激光束对于读取的12厘米光学碟片,最初抵达普通大众。 适合音乐家的数字声 虽然CD机本身拥有价钱不高的16比特数模转换器,但1988年前,配置有高品质转换器的计算机并不普遍。在此之前,尽管少数计算机音乐中心等机构也特制了模数转换器和数模转换器,个人电脑系统的用户却还得等待些时日。 数字多轨录音 与立体声录音这种左右声道同时录制的方式有所不同,多轨录音机(multitrack recorder)拥有分立的各个声道(channel)或音轨(track),可以在不同的时间分别录制。例如,每个音轨可以录制一个单独的乐器,这样就使稍后各轨的混音留有余地。
MySQL (beta) at CHEARSdotinfo.co.uk Additive synthesis is a class of sound synthesis techniques based on the summation of elementary waveforms to create a more complex waveform. Additive synthesis is one of the oldest and most heavily researched synthesis techniques. Additive synthesis has been used since the earliest days of electrical and electronic music (Cahill 1897; Douglas 1968; die Reihe 1955; Stockhausen 1964). The massive Telharmonium synthesizer unveiled in 1906 summed the sound of dozens of electrical tone generators to create additive tone complexes (figure 4.12). Any method that adds several elementary waveforms to create a new one could be classified as a form of additive synthesis. Fixed-waveform Additive Synthesis Some software packages and synthesizers let the musician create waveforms by harmonic addition. In order to make a waveform with a given spectrum the user adjusts the relative strengths of a set of harmonics of a given fundamental. (The term "harmonic" as an integer multiple of a fundamental frequency was first used by Sauveur [1653 1716] in 1701.)
MySQL (beta) at CHEARSdotinfo.co.uk Filter Banks and Equalizers A filter bank is a group of filters that are fed the same signal in parallel (figure 5.30). Each filter is typically a narrow bandpass filter set at a specific frequency. The filtered signals are often combined to form the output sound. When each filter has its own level control the filter bank is called a spectrum shaper. A graphic equalizer has controls that mirror the shape of the filter's frequency response curve (figure 5.31a). Each filter has a fixed center frequency, a fixed bandwidth (typically one-third of an octave), and a fixed Q. (Some units can switch between several Q settings.) The response of each filter can be varied by means of a linear fader to cut or boost specific frequency bands. The potential frequency response of such a filter is shown in figure 5.31b. A parametric equalizer involves a fewer number of filters, but the control of each filter is more flexible. A typical arrangement is to have three or four filters in parallel. Users can adjust independently the center frequency, the Q, and the amount of cut or boost of each filter. A semiparametric equalizer has a fixed Q. A filter that has several regular sharp curves in its frequency response is called a comb filter. The final filter to mention is an allpass filter. For a steady-state (unchanging) sound fed into it, an allpass filter passes all frequencies equally well with unity gainhence its name.All filters introduce some phase shift while attentuating or boosting certain frequencies, but the main effect of an allpass filter is to shift phase. Time-varying Subtractive Synthesis Filters can be fixed or time-variant. In a fixed filter, all the properties of the filter are predefined and do not change over time. This situation is typical of conventional music recording where the sound engineer sets the equalization for each channel at the beginning of the piece. Time-variant filters have many musical applications, particularly in electronic and computer music where the goal is to surpass the limits of traditional instruments. A bandpass filter whose Q, center frequency, and attenuation change over time can impose a enormous variety of sound colorations, particularly if the signal being filtered is also time-varying. A prime example of a system for time-varying subtractive synthesis is the SYTERa digital signal processor developed in the late 1970s at the Groupe de Recherches Musicale (GRM) studio in Paris by Jean-François Analysis/resynthesis systems based on subtractive filters rather than on additive oscillators are capable of approximating any sound. In practice, most of the analysis and data reduction techniques employed in subtractive analysis/resynthesis are geared toward speech synthesis, since this is where most of research has been concentrated (Flanagan et al. 1970; Flanagan 1972). The Vocoder The original subtractive analysis/synthesis system is the vocoder, demonstrated by a talking robot at the 1936 World's Fair in New York City. In musical applications the separation of the driving functions (or resonance) from the excitation function means that rhythm, pitch, and timbre are independently controllable. For example, a composer can change the pitch of a singing voice (by changing the frequency of the excitation function), but retain the original spectral articulation of the voice. By stretching or shrinking the driving functions over time, a piece of spoken text can be slowed down or sped up without shifting the pitch or affecting the formant structure.
MySQL (beta) at CHEARSdotinfo.co.uk Filter Types and Response Curves The specifications of audio equipment usually include a figure for "frequency response." This term is a shorter form of amplitude-versus-frequency response. Each type of filter has its own characteristic frequency response curve. Typical frequency response curves for four basic types of filters are shown in figure 5.23: lowpass, highpass, bandpass and bandreject or notch. Shelving filters, shown in figure 5.24, boost or cut all frequencies above or below a given threshold.Their names can be confusing,because a high shelving filter acts like a lowpass filter when it is adjusted to cut high frequencies,and a low shelving filter acts like a highpass filter when it is adjusted to cut low frequencies. An important property of a filter is its cutoff frequency. The steepness of a filter's slope is usually specified in terms of decibels of attenuation or boost per octave, abbreviated "dB/octave." For example, a 6 dB/octave slope on a lowpass filter makes a smooth attenuation (or rolloff), while a 90 dB/octave slope makes a sharp cutoff (figure 5.26). Filter Q and Gain Many bandpass filters have a control knob (either in software or hardware) for Q. An intuitive definition of Q is that it represents the degree of "resonance" within a bandpass filter.When the Q is high, as in the narrowest inner curve,the frequency response is sharply focused around a peak (resonant) frequency. Q = Freq. center/(Freq. highcutoff - Ffreq. lowcutoff) Another property of a bandpass or bandreject filter is its gain. This is the amount of boost or cut of a frequency band.It shows up as the height (or depth) of the band in a response curve (figure 5.28). When passing a signal through a high Q filter,care must be taken to ensure that the gain at the resonant frequency (the height at the peak) does not overload the system, causing distortion. Many systems have gain-compensation circuits in their filters that prevent this kind of overload.
MySQL (beta) at CHEARSdotinfo.co.uk Introduction to Filters A filter can be literally any operation on a signal (Rabiner et al.1972)! But the most common use of the term describes devices that boost or attenuate regions of a sound spectrum, which is the usage we take up here. Such filters work by using one or both of these methods: · Delaying a copy of an input signal slightly (by one or several sample periods) and combining the delayed input signal with the new input signal (figure 5.21a) · Delaying a copy of the output signal and combining it with the input signal (figure 5.21b) Although figure 5.21 shows combination by summation (+), the combination can also be by subtraction (-). In either case, the combination of original and delayed signals creates a new waveform with a different spectrum. By inserting more delays or mixing sums and differences in various combinations, one can construct a wide range of filter types.
MySQL (beta) at CHEARSdotinfo.co.uk The Musical Instrument Digital Interface or MIDI protocol has been variously described as an interconnection scheme between instruments and computers, a set of guidelines for transferring data from one instrument to another, and a language for transmitting musical scores between computers and synthesizers. All these definitions capture an aspect of MIDI. MIDI was designed for real-time control of music devices. The MIDI specification stipulates a hardware interconnection scheme and a method for data communications (IMA 1983; Loy 1985c; Moog 1986). It also specifies a grammar for encoding musical performance information. MIDI information is packaged into small messages sent from one device to another. For example, a message can specify the start and stop time of a musical note, its pitch, and its initial amplitude. Another type of message, transmitted at regular intervals, conveys ticks of a master clock, making it possible to synchronize several MIDI instruments to a sequencer that emits these messages. Musical Possibilities of MIDI 1. MIDI separates the input device (for example, a musical keyboard) from the sound generator (synthesizer or sampler). Thus MIDI eliminates the need to have a keyboard attached to every synthesizer. 2. The separation of control from synthesis means that any input device (breath controller, hornlike instrument, drum pad, guitar, etc.) can control a synthesizer. This has led to a wave of innovation in designing input devices (see chapter 14). 3. Software for interactive performance, algorithmic composition, score editing, patch editing, and sequencing can be run on the computer with the results transmitted to the synthesizer. 4. MIDI makes "generic" (device-independent) music software easier to develop. Generic music software runs on a personal computer and drives synthesizers manufactured by different companies. An example of generic software is a sequencer 5. MIDI makes "targeted" music software (i.e., software for a specific device) easier to develop. Targeted music software includes patch editor/librarian programs that adjusting the knobs on the screen image with a mouse, one can control the synthesizer as if one were manipulating its physical controls. 6. MIDI codes can be reinterpreted by devices other than synthesizers, such as signal-processing effects boxes (reverberators, etc.). This offers the possibility of real-time control of effects, such as changing the delay or reverberation time. MIDI can synchronize synthesizers with other media such as lighting systems. MIDI can also be linked with other synchronization protocols (such as SMPTE timecode) to coordinate music with video and graphics. Another specialized application of MIDI is the control of audio mixers. See chapter 9 for a discussion of console automation via MIDI. 7. Through MIDI, score, sequencer, and sample data can be exchanged between devices made by different manufacturers. essentially replace the front panel of a synthesizer, sampler, or effects processor. By pushing graphical buttons and MIDI Hardware MIDI messages transmitted between devices are sent in serial binary form,The standard rate of transmission is 31,250 bits per second.The hardware that handles these signals includes MIDI ports and MIDI computer interfaces, MIDI Ports A MIDI port on a device receives and transmits the messages. The basic port consists of three connectors: IN, OUT, and THRU. This allows the cable to shield without grounding problems over a span of up to 15 meters. MIDI Computer Interfaces Three basic types of interfaces are extant: serial, parallel, and multiline. MIDI Driver Programs Every synthesizer or digital signal processor (DSP) with a MIDI port contains a microprocessor. The program that handles this MIDI input and output function is called the MIDI driver. In effect, the driver "owns" the input/output port; all MIDI communications must be routed through it. MIDI Channels All sixteen channels can be routed over one physical MIDI cable. Each receiving device is set up beforehand to listen to one or more channels. MIDI's Representation of Pitch A note-on message contains a 7-bit field corresponding to a pitch value. Since 27 = 128, this means that the MIDI pitch range extends over 128 pitches. Channel Messages MIDI messages fall into two categories: channel messages and system messages. Status and Data Bytes The stream of MIDI data divides into two types of bytes: status bytes and data bytes (figure 21.9).A status byte begins with a 1 and identifies a particular function, such as note-on, note-off, pitch wheel change, and so on. A data byte begins with a 0 and provides the value associated with the status byte, such as the particular key and channel of a note-on message, how much the pitch wheel has changed, and so on. For example, a note-on event message consists of three bytes (10010000 01000000 00010010). General MIDI Mode That is, devices equipped for GMM respond to MIDI messages according to a standard mapping between channels, patches, and sound categories. GMM preassigns the first ten channels, with channel 4 for melody, channel 8 for harmony, and channel 10 for the percussion part. In addition, all 128 patches are preassigned to specific sound categories, mostly based on traditional instruments or "classic" synthesizer sounds. General MIDI in itself is simply a naming scheme and cannot guarantee that two different devices playing. Continuous Control via MIDI Some aspects of performed music change in a discrete, on/off way, like the keys on a keyboard or the pushbuttons on the front of an effects processor. Other aspects change in a continuous way over time, MIDI input devices usually have both discrete controllers (e.g., switches or keys) and continuous controllers (e.g., levers, wheels, potentiometers, pedals). A point to be aware of is that the stream of messages from a continuous controller can consume a great deal of MIDI's available transmission capacity (figure 21.11). Control Change Messages Control change messages tell a receiving device that the position of a continuous controller is changing. A point to be aware of is that the stream of messages from a continuous controller can consume a great deal of MIDI's available transmission capacity (figure 21.11). Defined Controllers Defined controllers and registered parameters simplify MIDI communications by assigning standard functions to controllers found on most MIDI devices. Some of MIDI's preset controller numbers are vibrato (1), left-right pan (10), volume (7), and damper (sustain) pedal (64). Registered and Unregistered Parameters Typical RPNs include pitch bend sensitivity, fine-tuning, and coarse tuning. Standard MIDI Files SMF can also serve as a common format for program intercommunication in a multitasking operating system running more than one music application. Long-distance communication of MIDI data is also aided by SMF, since musicians running different software can nevertheless exchange sequence data. (See the section on telecommunications in chapter 22.) MIDI Timing MIDI provides two ways to count time: via MIDI Clock messages, or via MIDI Timecode. The next sections describe these techniques. MIDI Machine Control and MIDI Show Control MMC controls tape recorders, videocassette recorders (VCRs), and hard disk recorders via MIDI. A related extension to MIDI is created for control of lighting systems and theatrical productions in general.
MySQL (beta) at CHEARSdotinfo.co.uk Sample Libraries Since a sampler is a type of recording system, the quality of the samples depends on the quality of the recording techniques. Making high-quality samples requires good players with fine instruments, excellent microphones, and favorable recording environments. An Assessment of Samplers In any case, it is understandable that the "naturalness" or "realism" of a sampler should be held up as a criterion for judging between different brands. It is well known that a given instrument tone may sound much more realistic on one sampler than it does on another. In expressive instruments like voices, saxophones, sitars, guitars, and others, each note is created in a musical context.In addition to these contextual cues, transitional sounds like breathing, tonguing, key clicks, and sliding fingers along strings punctuate the phrasing. Constraints of style and taste determine when context-sensitive effects such as rubato, portamento, vibrato, crescendi and diminuendi, and other nuances are applied. These problems can be broken into two parts: (1) How can we model the sound microstructure of note-to-note transitions? (2) How can we interpret (analyze) scores to render a context-sensitive performance according to style-specific rules?
MySQL (beta) at CHEARSdotinfo.co.uk Pitch-shifting In an inexpensive sampler it may not be possible to store every note played by an acoustic instrument. These samplers store only every third or fourth semitone and obtain intermediate notes by shifting the pitch of a nearby stored note. If you record a sound into a sampler memory and play it back by pressing different keys, the sampler carries out the same pitch-shifting technique. A side effect of simple pitch shifting is that the sound's duration increases or decreases, depending on the key pressed. Two methods of simple pitch shifting exist. Both of these methods are called time-domain techniques, since they operate directly on the time-domain waveform. This is different from the frequency-domain pitch-shifting techniques discussed. Pitch-shifting by sample-rate conversion with a constant playback sampling frequency. (Top) If every other sample is skipped on playback,the signal is decimated and the pitch is shifted up an octave. (Bottom) If twice the number of samples are used by means of interpolation on playback, the signal is shifted down an octave. Sample-rate Conversion Without Pitch-shifting Many digital audio recorders operate at the standard sampling rates of 48 or 44.1 KHz. How can we resample a recording at one of these frequencies so as to play it back at the other frequency with no pitch shift?To convert a signal between the standard sampling rates of 44.1 and 48 KHz without a pitch change, a rather elaborate conversion process is required. These ratios can be implemented as six stages of interpolations and decimations by factors of 2, 3, 5, and 7. 1. Interpolate by 4 from 44,100 to 176,400 Hz 2. Decimate by 3 from 176,400 to 58,800 Hz 3. Interpolate by 4 from 58,800 to 235,200 Hz 4. Decimate by 7 from 235,200 to 33,600 Hz 5. Interpolate by 10 from 33,600 to 336,000 Hz 6. Decimate by 7 from 336,000 to 48,000 Hz The signal can then be played back at a sampling rate of 48 KHz with no change of pitch.
MySQL (beta) at CHEARSdotinfo.co.uk Looping Looping extends the duration of sampled sounds played by a musical keyboard. If the musician holds down a key, the sampler should scan "seamlessly" through the note until the musician releases the key. This is accomplished by specifying beginning and ending loop points in the sampled sound. After the attack of the note is finished, the sampler reads repeatedly through the looped part of the wavetable until the key is released; then it plays the note's final portion of the wavetable. Creating a seamless but "natural" loop out of a traditional instrument tone requires care. The loop should begin after the attack of the note and should end before the decay. The beginning and ending points of a loop can either be spliced together at a common sample point or crossfaded. A splice is a cut from one sound to the next. Splicing waveforms results in a click, pop, or thump at the splice point, unless the beginning and ending points are well matched. Crossfading means that the end part of each looped event gradually fades out while the beginning part slowly fades in again. The crossfade looping process repeats over and over as the note is sustained.
MySQL (beta) at CHEARSdotinfo.co.uk Sampling Synthesis The term "sampling" derives from established notions of digital samples and sampling rate. Sampling instruments, with or without musical keyboards, are widely available. All sampling instruments are designed around the basic notion of playing back prerecorded sounds, shifted to the desired pitch. Instead of scanning a small fixed wavetable containing one cycle of a waveform, a sampling system scans a large wavetable that contains thousands of individual cyclesseveral seconds of prerecorded sound. Since the sampled waveform changes over the attack, sustain, and decay portion of the event, the result is a rich and time-varying sound. The length of the sampling wavetable can be arbitrarily long, limited only by the memory capacity of the sampler. Musique Concrète and Sampling: Background After experiments with variable-speed phonographs in the late 1940s, Pierre Schaeffer founded the Studio de Musique Concrète at Paris in 1950 (see figure 4.1). He and Pierre Henry began to use tape recorders to record and manipulate concrète sounds. Musique concrète refers to the use of microphone-recorded sounds, rather than synthetically generated tones as in pure electronic music. But it also refers to the manner of working with such sounds. Composers of musique concrète work directly with sound objects (Schaeffer 1977; Chion 1982). Their compositions demand new forms of graphic notation, outside the boundaries of traditional scores for orchestra (Bayle 1993). The Fairlight Computer Music Instrument (CMI) was the first commercial keyboard sampler (1979, Australia).
MySQL (beta) at CHEARSdotinfo.co.uk Sound synthesis Sampling & additive synthesis -> Musique Concrète and Sampling: Background -> Looping -> Pitch Shifting -> Sample-rate Conversion Without Pitch Shifting -> Sample Libraries Subtractive Synthesis -> Introduction to Filters -> Filter Types and Response Curves -> Filter Q and Gain -> Filter Banks and Equalizers -> Comb and Allpass Filters Modulation Synthesis --- Ring Modulation --- Amplitude Modulation --- Frequency Modulation
MySQL (beta) at CHEARSdotinfo.co.uk -> The Sampling Theorem As long as there are at least two samples per period of the original waveform, we can assume that the resynthesized waveform will have the same frequency. But when there are fewer than two samples per period, the frequency (and perhaps the timbre) of the original signal is lost. The sampling theorem describes the relationship between the sampling rate and the bandwidth of the signal being transmitted. there are at least two samples per period of the original waveform. To give a concrete example, suppose we introduce an analog signal at 26 KHz into an analog-to-digital converter operating at 50 KHz. The converter reads it as a tone at 24 KHz, since 50 -26 = 24 KHz. The essential point of the sampling theorem can be stated precisely as follows: In order to be able to reconstruct a signal, the sampling frequency must be at least twice the frequency of the signal being sampled. The highest frequency that can be produced in a digital audio system is called the Nyquist frequency. In musical applications, the Nyquist frequency is usually in the upper range of human hearing, above 20 KHz. Then the sampling frequency can be specified as being at least twice as much, or above 40 KHz. --- Ideal Sampling Frequency Many people hear information (referred to as "air") in the region around the 20 KHz "limit" on human hearing (Neve 1992). Indeed, Rudolf Koenig, whose precise measurements set international standards for acoustics, observed at age 41 that his own hearing extended to 23 KHz (Koenig 1899). It seems strange that a new digital compact disc should have less bandwidth than a phonograph record made in the 1960s, or a new digital audio recorder should have less bandwidth than a twenty-year old analog tape recorder. Many analog systems can reproduce frequencies beyond 25 KHz. Scientific experiments confirm the effects of sounds above 22 KHz from both physiological and subjective viewpoints (Oohashi et al. 1991; Oohashi et al. 1993). In sound synthesis applications, the lack of "frequency headroom" in standard sampling rates of 44.1 and 48 KHz causes serious problems. It requires that synthesis algorithms generate nothing other than sine waves above 11 KHz (44.1 KHz sampling rate) or 12 KHz (48 KHz sampling rate), or foldover will occur. This is because any high-frequency component with partials beyond the fundamental has a frequency that exceeds the Nyquist rate. The third harmonic of a tone at 12.5 KHz, for example, is 37.5 KHz, which in a system running at 44.1 KHz sampling rate will reflect down to an audible 6600 Hz tone. In sampling and pitch-shifting applications, the lack of frequency headroom requires that samples be lowpass filtered before they are pitch-shifted upward. The trouble these limits impose is inconvenient. --- Antialiasing and Anti-imaging Filters In order to make sure that a digital sound system works properly, two important filters are included. One filter is placed before the ADC, to make sure that nothing (or as little as possible) in the input signal occurs at a frequency higher than half of the sampling frequency. Logically enough, such a filter is called an antialiasing filter. The other filter is placed after the DAC. Its main function is to change the samples stored digitally into a smooth, continuous representation of the signal.
MySQL (beta) at CHEARSdotinfo.co.uk Aliasing (Foldover) figure 1.16g shows a waveform with eleven cycles per ten samples. This means that one cycle takes longer than the interval between samples. This relationship could also be expressed as 11/10 cycles per sample. figure 1.16i, the resynthesized waveform is completely different from the original in one important respect. Namely, the wavelength (length of the cycle) of the resynthesized waveform is different from that of the original. This kind of distortion is called aliasing or foldover. The frequencies at which this aliasing occurs can be predicted. Suppose, just to keep the numbers simple, that we take 1000 samples per second. Then the signal in figure 1.16a has a frequency of 125 cycles per second (since there are eight samples per cycle, and 1000/8 = 125). The frequency of the original signal in figure 1.16g has been changed by the sample rate conversion process.This represents an unacceptable change to a musical signal, which must be avoided if possible.
MySQL (beta) at CHEARSdotinfo.co.uk The rate at which samples are taken the sampling frequencies expressed in terms of samples per second. This is an important specification of digital audio systems. It is often called the sampling rate and is expressed in terms of Hertz. The digital signal does not show the value between the bars. The duration of a bar is extremely narrow, perhaps lasting only 0.00002 second (two hundred-thousandths of a second). This means that if the original signal changes "between" bars, the change is not reflected in the height of a bar, at least until the next sample is taken. In technical terms, we say that the signal is defined at discrete times, each such time represented by one sample (vertical bar). Part of the magic of digitized sound is that if the signal is band limited, the DAC and associated hardware can exactly reconstruct the original signal from these samples! This means that, given certain conditions, the missing part of the signal "between the samples" can be restored. This happens when the numbers are passed through the DAC and smoothing filter. The smoothing filter "connects the dots" between the discrete samples. Thus, a signal sent to the loudspeaker looks and sounds like the original signal.
MySQL (beta) at CHEARSdotinfo.co.uk Digital Audio Recording versus MIDI Recording When a MIDI sequencer records a human performance on a keyboard, only a relatively small amount of control information is actually transmitted from the keyboard to the sequencer. MIDI does not transmit the sampled waveform of the sound. For example, a 48-track MIDI sequence recorder program running on a small computer might cost less than $100 and handle 4000 bytes/second. In contrast, a 48-track digital tape recorder costs tens of thousands of dollars and handles more than 4.6 Mbytes of audio information per second over a thousand times the data rate of MIDI. The advantage of a digital audio recording is that it can capture any sound that can be recorded by a microphone, including the human voice. MIDI sequence recording is limited to recording control signals that indicate the start, end, pitch, and amplitude of a series of note events. If you plug the MIDI cable from the sequencer into a synthesizer that is not the same as the synthesizer on which the original sequence was played, the resulting sound may change radically.
MySQL (beta) at CHEARSdotinfo.co.uk Analog Representations of Sound Analog: continuous-time representation of the sound stored in the record. Fundamental limitations associated with analog recording:the copy is never as good as the original. Reproducing digital sound involves converting a string of numbers into one of the time-varying changes that we have been discussing. Digital Representations of Sound Analog-to-digital Conversion Rather than the continuous-time signals of the analog world, a digital recorder handles discrete-time signals. A microphone transduces air pressure variations into electrical voltages, and the voltages are passed through a wire to the analog-to-digital converter, commonly abbreviated ADC (pronounced "A D C"). This device converts the voltages into a string of binary numbers at each period of the sample clock. The binary numbers are stored in a digital recording mediuma type of memory. Binary numbers: In contrast to decimal (or base ten) numbers, which use the ten digits 0 9, binary (or base two) numbers use only two digits, 0 and 1. The term bit is an abbreviation of binary digit. On a digital audio tape recorder, a 1 might be represented by a positive magnetic charge, while a 0 is indicated by the absence of such a charge. This is different from an analog tape recording, in which the signal is represented as a continuously varying charge. Digital-to-analog Conversion, abbreviated DAC (pronounced "dack"). In summary, we can change a sound in the air into a string of binary numbers that can be stored digitally. The central component in this conversion process is the ADC. When we want to hear the sound again, a DAC can change those numbers back into sound.
MySQL (beta) at CHEARSdotinfo.co.uk * Phase The starting point of a periodic waveform on the y or amplitude axis is its initial phase. For example, a typical sine wave starts at the amplitude point 0 and completes its cycle at 0. If we displace the starting point by 2π on the horizontal axis (or 90 degrees) then the sinusoidal wave starts and ends at 1 on the amplitude axis. By convention this is called a cosine wave. In effect, a cosine is equivalent to a sine wave that is phase shifted by 90 degrees. In phase:two signals start at the same point. Out of phase:a signal that is slightly delayed with respect to another signal, in which the two signals. 180 degrees out of phase: A is the exact opposite phase of another signal B. * Importance of Phase A filter phase shifts a signal(by delaying its input for a short time) and then combines the phase-shifted version with the original signal to create frequency-dependent phase cancellation effects that alter the spectrum of the original. Phase is also important in systems that resynthesize sound on the basis of an analysis of an existing sound. In particular, these systems need to know the starting phase of each frequency component in order to put together the different components in the right order. Finally, much attention has been invested in recent years to audio components that phase shift their input signals as little as possible, because frequency-dependent phase shifts distort musical signals audibly and interfere with loudspeaker imaging. (Imaging is the ability of a set of loudspeakers to create a stable "audio picture" where each audio source is localized to a specific place within the picture.) Unwanted phase shifting is called phase distortion. To make a visual analogy, a phase-distorted signal is "out of focus."
MySQL (beta) at CHEARSdotinfo.co.uk Frequency and Amplitude If the pressure varies according to a repeating pattern we say the sound has a periodic waveform. If there is no discernible pattern it is called noise. In between these two extremes is a vast domain of quasi-periodic and quasi-noisy sounds. One repetition of a periodic waveform is called a cycle, and the fundamental frequency of the waveform is the number of cycles that occur per second.we substitute Hz for "cycles per second" in accordance with standard acoustical terminology. (Hz is an abbreviation for Hertz, named after the German acoustician Heinrich Hertz.) Time-domain Representation A simple method of depicting sound waveforms is to draw them in the form of a graph of air pressure versus time (figure 1.8). Frequency-domain Representation A frequency-domain or spectrum representation shows the frequency content of a sound. The individual frequency components of the spectrum can be referred to as harmonics or partials. Harmonic frequencies are simple integer multiples of the fundamental frequency. More generally, any frequency component can be called a partial, whether or not it is an integer multiple of a fundamental.
MySQL (beta) at CHEARSdotinfo.co.uk Basics of Sound Signals -> Frequency and Amplitude: Time-domain and Frequency-domain Representation -> Phase --- Analog Representations of Sound Digital Representations of Sound -> Analog-to-digital Conversion -> Binary Numbers -> Digital-to-analog Conversion -> Digital Audio Recording versus MIDI Recording -> Sampling -> Aliasing (Foldover) -> Anti-imaging Filters -> The Sampling Theorem: Ideal Sampling Frequency
MySQL (beta) at CHEARSdotinfo.co.uk Overview to Part I The sample nothing more than a numberis the atom of sound. Theory says that we can construct any sound emitted by a loudspeaker by means of a series of samples that trace the pattern of a sound waveform over time. But theory becomes reality only when strict technical conditions concerning sampling rate and sample width are met. The first chapter, by John Strawn and Curtis Roads, covers such basic topics as the history of digital recording, the sampling theorem, aliasing, phase correction, quantization, dither, audio converters, oversampling, and digital audio formats. Background: * History of Digital Audio Recording Sound recording has a rich history, beginning with Thomas Edison and Emile Berliner's experiments in the 1870s, and marked by V. Poulsen's Telegraphone magnetic wire recorder of 1898 (Read and Welch 1976). Although the invention of the triode vacuum tube in 1906 launched the era of electronics, electronically produced records did not become practical until 1924 (Keller 1981). Optical sound recording on film was first demonstrated in 1922 (Ristow 1993). Sound recording on tape coated with powdered magnetized material was developed in the 1930s in Germany (figure 1.3), but did not reach the rest of the world until after World War 2. * Experimental Digital Recording The core concept in digital audio recording is sampling, that is, converting continuous analog signals (such as those coming from a microphone) into discrete time-sampled signals. The theoretical underpinning of sampling is the sampling theorem, which specifies the relation between the sampling rate and the audio bandwidth (see the section on the sampling theorem later in this chapter). This theorem is also called the Nyquist theorem after the work of Harold Nyquist of Bell Telephone Laboratories (Nyquist 1928). The British researcher A. Reeves developed the first patented pulse-code-modulation (PCM) system for transmission of messages in "amplitude-dichotomized, time-quantized" (digital) form (Reeves 1938; Licklider 1950; Black 1953). In the late 1950s, Max Mathews and his group at Bell Telephone Laboratories generated the first synthetic sounds from a digital computer. The samples were written by the computer to expensive and bulky reel-to-reel computer tape storage drives. By 1977 the first commercial recording system came to market, the Sony PCM-1 processor, designed to encode 13-bit digital audio signals onto Sony Beta format videocassette recorders. Within a year this was displaced by 16-bit PCM encoders such as the Sony PCM-1600 (Nakajima et al. 1978). The professional Sony PCM-1610 and 1630 became the standards for compact disc (CD) mastering. These standards continued throughout the 1980s. * Digital Sound for the Public Digital sound first reached the general public in 1982 by means of the compact disc (CD) format, a 12-cm optical disc read by a laser (figure 1.5). * Digital Sound for Musicians Although CD players had inexpensive 16-bit DACs, good-quality converters attached to computers were not common before 1988. Only in the late 1980s did low-cost, good-quality converters become available for personal computers. In a short period, sound synthesis, recording, and processing by computer became widespread. Dozens of different audio workstations reached the musical marketplace. * Digital Multitrack Recording In contrast to stereo recorders that record both left and right channels at the same time, multitrack recorders have several discrete channels or tracks that can be recorded at different times. Each track can record a separate instrument, for example, allowing flexibility when the tracks are later mixed. The first computer disk-based random-access sound editor and mixer was developed by the Soundstream company in Salt Lake City, Utah (see figure 16.38). Their system allowed mixing of up to eight tracks or sound files stored on computer disk at a time (Ingebretsen and Stockham 1984). For a number of years, digital multitrack recording was a very expensive enterprise (figure 1.7). The situation entered a new phase in the early 1990s with the introduction of low-cost multitrack tape recorders by Alesis and Tascam, and inexpensive multitrack disk recorders by a variety of concerns.