Finite Impulse Response Filters Using Apple’s Accelerate Framework – Part III

In Part I and Part II of this series we developed a finite impulse response audio filter taking advantage of some of the DSP functions in the Accelerate Framework. The improvements we made in Part II were fairly straightforward, mostly replacing naive hand-coded loops with optimized  implementations, taking advantage of specialized processor instructions to speed up our filter.  In this installment, I’ll show you a less obvious improvement which can result in huge speedups for higher-order filters.

Instead of looking through our code for more opportunities to use vectorized code, in this part we’re going to turn to our DSP textbook (Oppenheim and Schafer in my case, but any good text will cover this.) This last part of this series, we are going to take a different approach to convolution based on the convolution theorem.

The Convolution Theorem

The convolution theorem states that convolution of two signals in the time domain is equivalent to the multiplication of their corresponding Fourier transforms. Essentially, we will obtain the same result if we multiply the Fourier transforms of our signals as we would if we convolved the signals directly.

Convolution in the Time- and Frequency- Domains

 

The above image shows the convolution operation in both the time and frequency domains.

Applying The Convolution Theorem

If you recall from an earlier post, our naive convolution function contained a set of nested loops which looped over the entire input signal and the entire filter kernel, multiplying each input sample with each filter tap and adding them up. According to the convolution theorem, if we have the frequency-domain representation of our signal and our filter, all we have to do is multiply them, which exponentially decreases the number of operations required to compute the convolution. In order to implement this topology, we need to compute the Discrete-time Fourier transform (DFT) of our input signal and our filter kernel, multiply the resulting frequency-domain signals, and then compute the inverse DFT to get our filtered signal. Naturally, we will use the Fast Fourier Transform (FFT) algorithm to compute the Fourier transforms. Our FFT-based convolution approach is shown in this block diagram:

Block Diagram of Convolution using the Fast Fourier Transform

 

FFT-convolution isn’t exactly a silver bullet, because  even though we’ve drastically decreased the number of multiply-add cycles, we’ve added the overhead of computing the Fourier Transform of the input signal and filter, and computing the inverse Fourier transform of the result. For this reason, you should only use this approach to computing convolution when you have a long filter kernel. Conventional wisdom is to use FFT convolution for filter lengths longer than 64-128 samples, and to compute convolution directly otherwise. This is by no means a hard and fast rule, and the optimal cutoff point will vary based on a number of factors. As always, profile first!

Implementation Considerations

The first step in our convolution approach is to compute the DFTs of our input signal and our filter kernel. There are several things to consider at this point. Most importantly, the FFT algorithm is optimized for calculating the DFT of signals with a length that is a power of 2. Secondly, the lengths of the DFTs must be the same for both the signal and the filter kernel. We need to select our FFT length to be longer than our convolution length (signal length + kernel length – 1) as well as satisfying the power-of-2 requirement. We will choose the FFT length to be the next power of 2 greater than our convolution length.  The way we change the length of our signal and our filter kernel is by zero-padding, which means nothing more than adding zeros onto the end until we reach the desired length. This is shown below:

 

FFT Convolution zero-padding diagram

 

Once we have selected our FFT length and zero-padded our signals, we can go ahead and compute the FFTs of our filter and input. If you know that your filter kernel is not changing, compute the FFT once at initialization and store it, to eliminate redundant calculation.

The multiplication and inverse DFT (IDFT) are straightforward, which leaves only the overlap-and-add method for handling the extra samples at the end of our filtered signal. This is handled in the same way as previously. The extra samples added when the zero-padding the signal and filter out to the next power-of-2 length will come out as zeros, we could process them as part of the overlap, adding them to the beginning of the next block, but they may be safely ignored.

The Implementation

 

There are many improvements that need to be made to the above code, primarily moving redundant code outside of the processing loop and into an initialization routine. There are also many opportunities for refactoring. I personally use a thin abstraction layer over the FFT functions, which eliminates the need to re-write the conversions between split and interleaved complex types and things like that.  For prototyping and general development, having functions like: FFT(input, output, length); and IFFT(input, output, length) save a lot of time.  I will dive into the vDSP_fft functions in more depth for a future post.

Conclusions

By now we have two working FIR filter implementations which you can use for whatever you like. There are many places these implemenations can (and should) be improved, but the basics are all there. Let me know how it goes!

Edit: Thanks to Ricci Adams for some corrections and additions!

One Comment

  1. Adriaan

    Although I am just a re-entry e.e. undergrad in Santa Cruz California this is an enlightening post. This is an enlightening series of posts. Thank you for your time Sir.

Leave a Reply

Your email address will not be published. Required fields are marked *