|
|
Hints on Selecting Sound Cards for Multi-SpeechIntroductionThis section provides information about evaluating, selecting, and using generic sounds for Multi-Speech users. Most of Kay's products (e.g., DSP Sona-Graph, CSL, Visi-Pitch II, Nasometer, etc.) are delivered with the necessary peripheral hardware for professional-level sound acquisition and playback. Multi-Speech, however, is a software product dependent on a sound card, which is purchased separately, for input and output. As a result, the quality of the sound input and output, when Multi-Speech is used, is dependent on third-party sound cards with wide variations in performance specifications. This document provides some guidance about selecting, evaluating, and using these cards. With an understanding of their limitations, generic sound cards can be used for some speech applications, and there are fewer potential problems if the use of the card is restricted to output. Kay recommends that, whenever possible, a professional-level input system (e.g., DAT recorder pass-through to computer, professional-level sound card with breakout box, CSL, or Visi-Pitch II) should be used for input to ensure good quality recordings. In addition to input problems, most of the generic sound cards available have significant operational and performance shortcomings when compared with professional-level hardware such as DAT recorders, ADATs, Visi-Pitch II, and CSL. Often the specifications for these generic sound cards are not described in the manufacturer's documentation, or the specifications are idealized and rarely met when installed in typical computer environments. What's the ProblemA computer is a digital signal environment. Digital signals are inherently resistant to electronic noise, hum, and poor grounding. Analog signals from microphones are, on the other hand, very susceptible to corruption from noise. When a sound card is inside a computer, the analog signals flowing through the sound card are exposed to the electronic noise inside the computer. The vast majority of sound cards for computers are designed to output (not input) reasonable quality audio for games and multimedia applications. Audio input is a very secondary application of these cards. At the low prices for these cards, it is inevitable that the engineering and circuitry expense devoted to input is limited. Even with very good quality input design, the environment inside a computer is not conducive to analog signals. For these reasons, few audio professionals use sound cards for audio input. "While the audio quality of consumer sound cards is much better than it was a few years ago, no serious audio professional would use a SoundBlaster-level card to produce audio for multimedia." (Anderton, Craig. New Media, April 14, 1997, p. 33). "It seems like there is so much noise and interference going on inside a computer that it's necessary either to use a digital I/O or have some sort of external breakout box with A/D converters instead of the 'traditional sound' card (Lee, Ken. New Media, April 14, 1997, p. 33-34)." Given these problems, what should a Multi-Speech user do? Users need to proceed cautiously and understand the limitations of their selected sound card. A checklist is offered with information about each feature, its relevance to you, and tips about evaluating the features of the generic sound cards. Sound Card ChecklistHardware DeliveredSome sound systems may, or may not, include a microphone, speaker(s), or headphones. Check to see what is delivered and the quality of the components. Overall Input QualityGeneric sound cards have input SNRs (signal-to-noise ratios) of about 40-60 dB depending on the sound card design, the location of the card in the computer, the noise level of the computer, grounding problems in the host computer, the microphone, and other considerations. As a frame of reference, note that CSL (Model 4300B) and Visi-Pitch II use professional-level components and an external module to achieve input SNR above 86 dB and 81 dB, respectively. The difficulty in selecting a generic sound card is that few manufacturers specify the input SNR, and it can vary depending on the installation in each computer. Even the top-of-the-line multimedia sound cards (e.g., Creative Labs' AWE32, Turtle Beach's Topaz, and Roland sound cards) have significantly more noise (i.e., less dynamic range) than professional components. MicrophoneMost sound cards include inexpensive, limited frequency response dynamic microphones. Multi-Speech users should consider upgrading to a better microphone if acquiring signals using the sound card. Note that a professional condenser-type microphone is recommended for voice analysis by the National Center for Voice and Speech. Microphones also vary in their output levels and not all microphones are matched to the preamplifiers included with the board. You may need to experiment with various microphones or external preamplifiers (see next item in checklist) to find a suitable combination for your sound board. Kay uses and recommends the AKG Acoustics Model C-410 for voice analysis. Also note that professional microphones usually use XLR connectors (balanced input) which would need to be adapted for sound cards that typically use mini-phono type connectors. Note that, when you attach an adapter to connect an XLR microphone to the mini-phono type connector typically used on sound cards, the balanced input and polarity standardization are not preserved. Nevertheless, an XLR-type microphone is almost always a better quality microphone as compared to a microphone with a standard or miniature-phono-type connector. PreamplifierFew sound card manufacturers specify gain and SNR of their preamplifier. You may wish to consider purchasing a separate external preamplifier, which is then connected to the line-level input of the sound card, to improve the noise performance. The preamplifier on your card may have insufficient gain or, if available, the high-gain setting may introduce too much noise. Symptomatic of a low-gain preamplifier is a subject having to speak loudly into the microphone to reach full amplitude levels even when the gain for the sound card's input sensitivity is on its highest setting. Also note that the sound card exhibits the highest DC offset and noise when the input levels are set to their highest setting. PolarityThe time domain pitch extraction algorithm used in Multi-Speech is sensitive to the polarity of the input signal. While the human ear cannot detect inverted polarity, incorrect polarity can affect the pitch extraction process. (Polarity is the relationship between the compression/rarefaction of the acoustic vibration and the up/down orientation of the waveform.) A compression (e.g., initial waveform response to "pa") should be a positive excursion of the waveform. In the pitch extraction process, there is a "software switch" to adjust the polarity of your system. If this switch is in the incorrect position, many voice impulses will be missing. Various microphones are not consistent in their polarity. Check microphone polarity and invert the analysis for impulses if needed. Input Sampling RatesThere are three sampling rates in most sound cards: 11025 Hz, 22050 Hz, and 44100 Hz. If signals are acquired within the Multi-Speech program using the standard multimedia sound cards, you will be limited to one of these three rates. Multi-Speech can, however, accurately analyze any signal acquired by CSL or Visi-Pitch II regardless of the sampling rate. If your application demands other sampling rates and you wish to use Multi-Speech for analysis, you will need to use another system (e.g., CSL) to acquire and save the signals, and then load to Multi-Speech. NoiseNoise is any unwanted signal mixed with the signal of interest. The section following this checklist provides some guidance on noise measurement in your system. Generic sound cards use a 16-bit A/D converter but often the 7 to 8 least significant bits of the input range are corrupted by noise. Therefore, the theoretical resolution of the system of 96 dB is limited to about 40-60 dB. Pick the best card and evaluate its use carefully before using. If your applications (e.g., clinical measurements or research) demand better input specifications, use a professional-level system for input. These systems include CSL and Visi-Pitch II from Kay, and DAT pass-through systems available from Kay and other companies (e.g., Turtle Beach, Antex). Noise can also be aggravated by turning the AGC on. Turn AGC off when possible. Input Anti-Aliasing FiltersTo accurately acquire signals (i.e., converting from an analog signal to a digital representation), without aliasing higher frequency components to the frequency range of interest, an anti-aliasing filter is required. These filters should adjust to the sampling rates and have greater than 100 dB/octave roll-off. With generic low-cost sound cards, the user should be aware of the characteristics of the anti-aliasing filters, at which sampling rates they are used, and the capability (or lack thereof) to automatically track the set sampling rate. This can be especially problematic when high-frequency signals are present above the sampling rate of interest. For example, computer monitors often generate high-frequency signals (about 15000 Hz) at the flyback frequency that could inadvertently alias an improperly filtered signal acquired at 22050 Hz. Tape recorders may also have signals above the listener's hearing range that will "mix" with the signal of interest during A/D conversion if not filtered. Again, because sound card manufacturers do not provide these specifications, users should test their input system before using. If CSL or Visi-Pitch II is used for input, you will not need to test because Visi-Pitch II and CSL have input anti-aliasing filtering to fit the requirement noted above. CSL and Visi-Pitch II filters automatically adjust to any selected sampling rate. DC DriftAn alternating signal such as received from a vibrating microphone diaphragm should produce voltages that vary around zero. The resultant digitization will produce a signal with values varying around 0. If you plotted the waveform, the center line should be 0. Electronic circuits can, unfortunately, drift so that the signals do not vary around zero. This DC drift varies with temperature. A listener cannot hear this DC drift. Measurements, however, can be affected by this drift. Professional-level components are attentive to DC drifts and will autocalibrate to eliminate drift. You may wish to measure the DC drift of your system at various levels of system warm-up. CSL and Visi-Pitch II autocalibrate drift out before input. Output Sampling RatesCSL and Visi-Pitch II support a wide range of I/O rates. Many sound cards do not. If you play back a signal which was acquired on CSL or Visi-Pitch II, on a multimedia sound card, that card may not support the sampling rate, and the output will be distorted. This is especially true with sampling rates higher than 44100 Hz and lower than 11025 Hz rates. Multi-Speech users should not infer from this distorted output that the stored signal is distorted. The Multi-Speech macros for playing out signals at different sampling rates (i.e., SAMPLING.MAC, PITCHTST.MAC) can be used to evaluate the flexibility of the sound card. We have found that the Creative Labs cards AWE32 and AWE64Gold do the best job supporting various output rates between 11025 Hz and 44100 Hz. No card will support all of the sampling rates used by CSL and Visi-Pitch II. Output Anti-Aliasing FiltersTo eliminate sampling "noise", a digital-to-analog conversion of signals should use an adjustable filter appropriate to the I/O sampling rate. In generic sound cards, these specifications for output filtering may not be delineated in the sound card's documentation. You may be able to achieve the desired output sampling, but not necessarily the correct output anti-aliasing. Users of generic sound cards should not assume that a noisy or distorted output means that the stored signal is corrupt. It could be simply an output problem associated with the sound card. Adjusting Output VolumeThe software setting for most sound cards requires you to stop playback to make adjustments. If you are using an external speaker, you should be able to adjust the output volume of the speaker without interrupting output. Those who use headphones with Multi-Speech may wish to explore getting headphones with volume adjustments. Calibrated InputGeneric sound cards do not have a calibrated input. Users cannot know the absolute level of a signal. Therefore, all measurements are relative to other portions of the signal. Even relative measurements are not possible if the AGC is on. AGC should always be set to OFF for any measurement task. Adjusting Input SensitivityWith generic sound cards, you will need to repeatedly acquire, check levels of signals, stop acquiring, evoke sound card controller program, adjust, go back to Multi-Speech, and reacquire in order to set the input level correctly. The awkwardness of this operation typically yields signals with more overloads (i.e., clipping) or an under-utilization of the dynamic range (i.e., under-loading). CSL and Visi-Pitch II allow the input sensitivity to be adjusted during input so that overloading is avoided and the dynamic range of the input is used effectively. Multi-Speech users need to become adept at quickly changing function (using Alt-Tab) between the sound card control software and the Multi-Speech application to adjust the input sensitivity and then test the new setting. Note that the sound card's higher noise levels (and smaller useful dynamic range) necessitate more efficient use of the dynamic range. A professional system with a fuller dynamic range system is more tolerant of under-utilization of the dynamic range. Cross-Channel NoiseA special caution should be noted if you intend to acquire two-channel data. Many sound cards have significant cross-channel noise (i.e., the signal from one channel is insufficiently isolated from the other channel). To evaluate this, put a full-scale signal into the second channel and see if the other channel is affected. Flagging of Overloaded InputOverloads in digital systems can be disastrous for data analysis. Generic sound card software does not have the ability to identify overloaded sample points during acquisition or even after the signal is acquired. The sound card hardware does not have the ability to detect overloads at the A/D converter. Multi-Speech software does flag suspected overload points after the signal is acquired by showing a color change in the waveform display. It does this by analyzing the signal characteristics. When using these generic sound cards for input, be careful to avoid overloading. Before analysis, you may want to view the waveform display to see if the trace changes color. CSL and Visi-Pitch II hardware monitor the preamplifier circuits and A/D. The input lights on the external module indicate overloads at the A/D converter. In addition, when overloads are detected, portions of the waveform are immediately flagged with a color change to indicate overload. The CSL option, Multi-Dimensional Voice Program, further analyzes the signals for suspected overload points and will not generate voicing parameters if overloads are suspected. Users are encouraged to use CSL or Visi-Pitch II for input whenever possible to avoid the many input problems inherent in generic sound cards. Real-Time AnalysisWith generic sound cards, real-time operations are processed using the host CPU and software specifically designed for real-time processing. Real-time analysis is the ability to simultaneously acquire and graphically analyze a signal (e.g., pitch trace). In real-time analysis, "you see it as you say it", without any perceptible delay. Any perceived delay interferes with the non-cognitive feedback loop essential in biofeedback applications. The Windows operating system, because of graphic, and other operating system overhead, is not as efficient at real-time operations as DOS. Multi-Speech is designed to operate as quickly as possible, but the core program is not real time. Add-on options, in progress, achieve true real-time capability. Some cards may include a digital signal processor, but this processor is dedicated to other functions and cannot be used to achieve real-time performance. If your application (e.g., therapy or biofeedback) requires real time, consider Visi-Pitch II or CSL. In the near future, these kinds of capabilities will be also available for Multi-Speech. DC CouplingAC coupling is used for microphone signals, but DC coupling is used for electroglottography, air flow, and other low-frequency content signals. Generic sound cards do not typically include DC coupling and cannot, therefore, be used to analyze low-frequency information reliably. If you need to analyze these types of signals, use CSL because this system includes the ability to set the input to either DC or AC coupling for all of the input channels. Capacitance DischargeWith most inexpensive sound cards, the signal is affected by the act of starting acquisition. You note this as a wavy line in the first half-second after starting acquisition. You should always discard this portion of the signal. SupportBefore you purchase a card, you may benefit by finding out about the card manufacturer's service policy. Do they have toll-free support? If possible, check out the sound card manual for support information. Kay provides training and support for all of its products, but Kay cannot support the service or operation of the third-party sound card. Checking the Input Signal-to-Noise SpecificationsAs stated previously, few sound card manufacturers specify their input SNR (signal-to-noise ratio). For those that do specify it, our limited studies have shown that these specifications are idealized and are rarely met in the typical user's computer system. The specifications depend on the installation and the computer. Therefore, it is useful to determine the SNR of the sound card in your system. Multi-Speech includes analytical tools that may be helpful in assessing the quality of your system's input circuits. For example, by initiating capture with no signal attached, the resultant captured signal may represent the background "noise" of the system. An understanding of how unwanted noise can be mixed with the signal of interest will improve your ability to interpret the analysis of a signal compromised by noise. Noise is generally taken to mean random fluctuations, added to, or modulated with, a wanted signal. Here, however, noise means any unwanted signal, periodic or not, existing with a desired signal. When you use a multimedia card, you may unwittingly report analysis results (e.g., energy levels, jitter, harmonic/noise ratios) which have been significantly altered by noise added during signal acquisition. This is why the National Center for Voice and Speech recommends that the system you use to make voicing measurements should have an input SNR of better than 86 dB, and that the acquisition system should include robust anti-aliasing filters to filter out unwanted frequency components. Any method of acquiring and storing signals can affect the signal quality. Noise can be introduced during signal acquisition in a number of ways. System components (e.g., microphone, cabling, preamplifier, amplifier, anti-aliasing filters, and A/D) could be of poor quality. Professional-level systems use careful design and more expensive components that minimize system-generated noise. However, if you acquire with consumer-level products, you should evaluate their performance. Additionally, noise sources (e.g., fan noise, electromagnetic signals from monitors or fluorescent light fixtures, power supply hum) may inadvertently be acquired along with the signal when using even the best equipment. Poor room acoustics can also add noise. When analog signals are converted to digital signals (and vice versa) on a sound card inside a computer, the input and output circuitry can be affected by computer noise. To avoid this, most products for professional sound applications (Kay's CSL and Visi-Pitch II, DAT recorders, ADATs, disk recorders, etc.) use an external module, isolated from the noisy computer, to perform the analog-to-digital and digital-to-analog conversion. Inherent in plug-in cards is their susceptibility to computer-generated noise. Converting an analog signal to a digital representation is performed by an analog-to-digital converter (A/D converter). Most systems use a 16-bit A/D converter that produces a 16-bit binary number to represent the range of incoming values. A binary number with 16 places has a range of 216 or 65,536 possible values. This range of values equals 96 dB of possible signal level variation (20 log10 65536). This is the maximum achievable range, also called the dynamic range, with a 16-bit linear converter. The full 96 dB range, however, is seldom achieved because the associated electronics can rarely take advantage of the full dynamic range available. For example, most sound cards which plug into a computer lose half of the dynamic range to system noise and large DC offsets. One simple way to perform a partial check of a recording system, without the need for test equipment, is to acquire a signal exactly as you would during your work except with the microphone turned OFF. Then, with Multi-Speech, analyze the acquired signal for noise by scrolling the cursor along the waveform to note the highest waveform value (i.e., noise level). The waveform display cursor reads the waveform values as a linear value (with values ranging from 0 to ± 32768). Use the table that follows to correlate the linear waveform value of the cursor with the noise level and SNR ratio. If you repeat the above test with the microphone on (but quiet), you can separately measure the noise added by microphone pickup of acoustic and electromagnetic signals. Note that there could also be a constant DC offset. A DC offset is different from noise and only very slightly reduces the useful dynamic range. Noise, on the other hand, is seen in the waveform display (with no input) as a continually changing signal. With noise, the values change over time (i.e., varying +/-200). DC offset is revealed as a constant offset. A noise signal of +/-200 reduces dynamic range by almost 50dB due to noise corruption. The true dynamic range of the 16-bit A/D has been reduced from 96 dB to about 46 dB. A DC offset of +200 would reduce the dynamic range less than 0.1 dB. Note that some cards show both DC offset and noise. For voice measurements, a useful dynamic range of 85 dB or more is recommended by the National Center for Speech and Voice. This means that the noise should fall within the range of -2 to +2 using the tests described above. Cross-Reference of Waveform Value to Noise Level and SNR
DiscussionSound cards are primarily designed to offer inexpensive sound output for computers. As a result, most generic sound cards offer acceptable output quality but poor input quality. These generic sound cards offer significant challenges to speech professionals who need high-quality, accurate sound acquisition for acoustic analysis. Kay recommends that a professional system (e.g., Kay's CSL, Kay's Visi-Pitch II, ADAT, or DAT with computer pass-through) be used whenever possible for reliable, high-quality input. Use the checklist and measurement techniques described above to evaluate your system's performance before use. If a professional system is not available, select the best cards from Antex (e.g., StudioCard) or Creative Labs (e.g., AWE64 Gold or AWE32). Having measured the AWE32, we found noise levels of about 24 dB and a DC offset of 4 bits (24 dB). Therefore, the useful dynamic range is about 70 dB, which is above average for sound cards. Turtle Beach sound cards have an excellent reputation but support only three output rates. The Antex StudioCard requires a separate external preamplifier, costs $1,595, and only samples at 48kHz. However, it includes balanced inputs and claims impressive specifications (Kay has not yet evaluated). The Sound cards for portable systems have significantly higher levels of noise than desktop systems and should not be used for input for any measurement task. You may wish to consider connecting an external preamplifier, a professional quality microphone, and an upgraded sound system to improve your system's performance. The use of an external preamplifier helps protect the low-level microphone signal from corruption by computer noise. The external preamplifier boosts the signal level to higher levels that are, therefore, less susceptible to noise. Using this approach, you can boost the S/N by about 6 dB. Note that this is a help, not a cure. It is unrealistic to expect professional quality input from a generic sound card. When you can find a sound card and an operating environment with the best achievable results, you will still need to be cautious about overloading because there will be no direct feedback of overload conditions. Not all output rates are supported by sound cards. When possible, use standard multimedia rates (11025, 22050, and 44100 Hz) for acquisition. These output rates are supported by most sound cards. The Sound Blaster cards seem to do the best job of supporting the most output sampling rates. SummaryGeneric sound cards have significant operating and performance weaknesses when compared to professional-level hardware systems. These performance tradeoffs are inherent in their low-cost design and the problems associated with routing analog signals into a computer. Therefore, these performance tradeoffs cannot be completely avoided. Users should heed the cautions and recommendations listed above in order to select the input system appropriate for the task and carefully use the selected system to attain the best achievable results.
|
| Copyright © 1996-2008 KayPENTAX. All rights reserved. Site Map | Contact Us |