Finite Impulse Response Filters Using Apple’s Accelerate Framework – Part II

In Part I of this 3-part series we created a simple FIR filter. In this part we will look at some ways of improving the implementation using some functions from the Accelerate Framework.

The Accelerate Framework

The accelerate framework provides a C API with functions for linear algebra, matrix math, DSP, and image processing. In this case we will specifically be focusing on the vDSP functions. The functions provide vectorized implementations of common functions using specialized instruction sets (SSE, NEON, etc) or  DSP hardware on your processor.

Improving The Filter

The basis for our filter is the convolution function. This is an extremely slow function, it runs in quadratic time (O(N²)). Looking at the nested loops doing multiplication and accumulation should give you some intuition for why this is the case. This is where we are going to focus our attention. Fortunately, the accelerate framework provides a convolution function, using vectorized multiply-accumulate instructions. The function we are going to use is vDSP_conv (or its double-precision equivalent vDSP_convD).

The prototype for vDSP_conv is as follows:

This matches up fairly closely to our convolution function prototype, with the addition of the Stride arguments, which is used to set the direction and  step size when looping through the input and filter arrays. There are a few things we need to look out for when using this function in our filter code.  Per the vDSP_conv documentation, The input signal length must be at least (length of filter + length of result – 1)[1]. We need to set __vDSP_strideFilter to -1 [2], as well as pointing __vDSP_filter at the end of our filter array in order to perform convolution [3].

We can eliminate some more of the loops in this code using some more vDSP functions as well a BLAS function from the Accelerate Framework. The first loop which fills our padded buffer with zeros can be replaced with vDSP_vfill. Here is the code we can use to replace the first loop:

The second loop can be replaced with a call to cblas_scopy, which copies a vector from one location to another:

Here is our updated FIR filter using our vectorized convolution code. I’ve also replaced a loop which adds two vectors with vDSP_vadd, which is a fairly straightforward replacement:

In Part III, I will walk through another improvement which leverages the fast fourier transform to take our convolution from O(N²) to O(N log N).

Leave a Reply

Your email address will not be published. Required fields are marked *