In this tutorial, we will be analyzing a voice signal. We humans produce analog waves which are continuous, and for signal processing, we need to have a non-continuous time model – a digital model for our computer. The main purpose of this article is to understand how the digital signal processing is achieved. We’ll be working with an example showing how you can find similarities and differences between 2 voice signals by recording your own voice and also using a distorted signal.
For this project, it’s is important to understand the patterns of the waveforms and how they are changing over time when we apply different parameters. In this first part, we are going to explore some particularities of the voice signal and how we can process it using MATLAB, and in the following part, we will begin the detailed analysis of a recorded voice signal using Arduino.
Step 1: About voice signal
When we are working with voice signals, the most important part is to understand what “signal” means. So what is signal?
“[A] signal, technically yet generally speaking, is a formal description of a phenomenon evolving over time or space; by signal processing we denote any manual or “mechanical” operation which modifies, analyzes or otherwise manipulates the information contained in a signal. Consider the simple example of ambient temperature: once we have agreed upon a formal model for this physical variable – Celsius degrees, for instance – we can record the evolution of temperature over time in a variety of ways and the resulting data set represents a temperature “signal”.
In this first step, we will learn about the properties of a voice signal. Before we record our own voice, let’s go over some different parameters in Audacity.
Number of channels:
Mono: you record with one channel which means that there’s only one audio signal – the recording is distributed on the same level, so you won’t hear differences if you have 2 or more speakers. This is the most common method because the sound is recorded with one microphone, which means that mono take up less bandwidth and thus is useful to many applications (e.g. telephone and radio). For users who are not too familiar with the technical details of the placement of the audio equipment, mono is the right choice.
Figure 1: Single audio signal is distributed on the same level to multiple speakers (Mono)
Stereo: you record with 2 or more channels – . Since the signal comes from more than one source, it provides direction and location of the sound. Having a stereo setup means that you need to have at least 2 microphones placed in an appropriate position. In this way, you can determine different locations of the sound sources. This kind of recording is commonly used in movies and music for broader audio perspective.
Figure 2: Two or more signal channels are distributed to speakers (Stereo)
Project Rate (sampling rate): the number of samples obtained in one second from a continuous-time signal which is then transformed to a discrete-time signal (with numerical values). The unit of measure of sampling is S/s (i.e. samples per second).
Since the audio signal is analog, we need to transform it to a digital signal in order for it to be processed by the computer. This operation can be done using the sampling theorem below.
Let’s assume that we have an analog signal as shown in the picture below. Imagine you need to describe this signal to your friend. This won’t be the easiest task. In such case, it will be more simple to describe the signal as a sequence of numbers.
Figure 3: An example analog signal
Every sample is described by its own amplitude. The sampling rate can be chosen by the user according to their preferences. In the picture below it is shown how you can transform an analog signal into a digital signal using the sampling theorem.
Figure 4: An analog signal transformed into a digital one
The sampling rate must be chosen according to the range of human hearing – the ranges of frequencies are between 20-20000 Hz. In order for us to obtain all the necessary information about a particular signal (i.e. accurate sound), we can only transform it to a discrete-time signal so long as our sampling rate meets the requirements of the Nyquist-Shannon theorem:
The sampling rate must be greater than twice the highest (maximum) frequency in the signal spectrum. In our case, the maximum frequency is 20000 Hz.
The chosen value for this project was 48000 kHz – this is a standard value when you are working with audio signals. In this case, the processing is done with all the information found in the continuous-time signals.
It is the process of sampling a signal with a sampling frequency significantly higher than the Nyquist rate. Theoretically, a bandwidth-limited signal can be perfectly reconstructed if sampled at the Nyquist rate or above it. The Nyquist rate is defined as twice the highest frequency component in the signal.
It is a technique where one samples a bandpass-filtered signal at a sample rate below its Nyquist rate (twice the upper cutoff frequency), but is still able to reconstruct the signal. When one undersamples a bandpass signal, the samples are indistinguishable from the samples of a low-frequency alias (i.e. signals becoming indistinguishable) of the high-frequency signal.
Now, let’s take a look at a sine wave demonstration.
Figure 5: A sine wave (plotted in MATLAB)
This is a sine wave with a frequency of 2 Hz and time domain [-1, 1]. It has 4 periods, which means that a period occurs at every 0.5 seconds (1/2 Hz). On this signal, we are going to add more sampling cases to see how they work.
A number of points indicates how many points were added for a period and these points are connected using an interpolation. The minimum number of points in a period should 20, otherwise the linear interpolated signal will look distorted. This especially applies to smooth analog signals, which will lose the round-edge patterns after sampling. The higher the number of points is, the more accurate the sampled signal is.
Figure 6: Number of necessary points for a sine wave (plotted in MATLAB)
For this first tutorial, we are going to plot some sine waves in MATLAB and observe how they behave when different types of sampling frequencies are applied. In the second part of voice signal series, we are going to use MATLAB to perform a Fourier analysis. MATLAB is another useful tool when we work with Arduino because it can communicate well using the serial interface. We’ll get into more details in the third part of this series where we’ll record a voice signal with Arduino and process it in MATLAB.
For those who are not entirely familiar with MATLAB, here are the general steps to plot a function:
To create a script or a function, go to [HOME] → [New] and then select either “Script” or “Function”.
To plot a sine wave shown in Figure 7, we need to create a Script. Copy and paste the following code:
title('5 points plot')
title('10 points plot')
title('15 points plot')
title('20 points plot')
To see its result, all you need to do is press Run button.
We need to have another plot in order to see – on the one we just created – how it modifies its form when we try to sample the signal. We are going to keep the length of the signal at 0.5 seconds, but the frequency will be increased to 60. the number of points we are working on was also modified to 20x the frequency number. Let’s say in this interval we want to sample with a frequency of 50Hz; this means that every point will be at T = 1/50 (where T is the period of sampling). In Figure 7, you can see the tiny red dots, which are the sampling points on the signal.
Figure 7: An undersampled sine wave
In Figure 7, you cannot grasp the shape of the original signal due to the lack of points. When the dots were connected, the red signal has an unusual shape due to the insufficient points the signal has (Figure 8). The points were not interpolated on a sinusal form and the red signal cannot reconstruct the blue signal.
Figure 8: Sine wave with fs = 50 Hz
Figure 9: Sine wave with fs = 240 Hz
Figure 10: Examples of sampling frequencies
As you can see in Figure 9 and 10, when we are sampling at 2x frequency, the only points represented on the plot will be maximum on the top and bottom. According to the sampling theorem, these 2 points are enough for reconstruction of the signal.
Step 2: Working with voice signals
We covered signals and its properties. Now it’s time to test with a real example. Recording will be done with Audacity with the following specific properties:
Sample rate: 48000 Hz
Number of bits: 16
Figure 11: Settings for recording with Audacity
The length of the signal should be around 10 seconds minimum (the project will have many processings, so it’s not recommended to go over 20 seconds).
Figure 12: Audacity
Using Audacity is pretty intuitive as for other recording platforms. In the picture below you can see the red button with the circle in the middle (record logo) – this is how you can make recordings. If you want to stop the recording, all you need to do is to play the button with a yellow square (stop button). The program will record the sound wave from your recording and it will plot it as shown below.
Figure 13: Recorded audio signal
Part 1 was a brief introduction to a theory of digital signal processing. We explored different signal waves using MATLAB and recorded our voice using Audacity. In the next tutorial, we are going to dive into more details of “processing.” We’ll touch upon various algorithms and time-frequency domain. We’ll continue utilizing MATLAB as our main source for writing functions for processing our recorded voice and obtaining the results of the analysis.
Tiberia is currently in her final year of electrical engineering at Politehnica University of Bucharest. She is very passionate about designing and developing Smart Home devices that make our everyday lives easier.