Accelerating the pace of engineering and science

# Documentation Center

• Trial Software

## Working with Fixed-Point Direct-Form FIR Filters

This example shows how to design FIR filters implemented with the direct-form structure using fixed-point arithmetic. This example requires a Fixed-Point Designer™ license.

See also Getting Started with Fixed-Point Filters

Designing the Filter

The FIR filter to use is not critical. Since we will use the direct-form structure, it doesn't even need to have linear phase. For this example we will use a simple least-squares design.

```f=fdesign.lowpass('N,Fp,Fst',80,.11,.19); % Specifications
```

A filter object results from the design method. It associates coefficients with a particular filter structure, here a direct-form FIR structure.

```h = design(f, 'firls', 'Wpass', 1, 'WStop', 100, ...
'FilterStructure', 'dffir');
set(h,'Arithmetic','fixed');
h
```
```
h =

FilterStructure: 'Direct-Form FIR'
Arithmetic: 'fixed'
Numerator: [1x81 double]
PersistentMemory: false

CoeffWordLength: 16
CoeffAutoScale: true
Signed: true

InputWordLength: 16
InputFracLength: 15

FilterInternals: 'FullPrecision'

```

Comparing Quantized Coefficients to Non-Quantized Coefficients

There are several parameters for a fixed-point direct-form FIR filter. To start with, it is best to concentrate on the coefficient wordlength and fractionlength (scaling). First we use the Filter Visualization Tool to compare the quantized coefficients to the nonquantized (reference) coefficients.

```hfvt = fvtool(h, 'legend', 'on', 'Color', 'white');
```

Determining the Number of Bits being Used

To determine the number of bits being used in the fixed-point filter, we simply look at the CoeffWordlength. To determine how they are being scaled, we can look at the CoeffAutoScale state.

```get(h,'CoeffWordLength')
```
```ans =

16

```
```get(h,'NumFracLength')
```
```ans =

17

```

This tells us that 16 bits are being used to represent the coefficients, and the least-significant bit (LSB) is weighed by 2^(-17). 16 bits is just the default number used for coefficients, but the 2^(-17) weight has been computed automatically to represent the coefficients with the best precision possible. This is controlled through the 'CoeffAutoScale' property. This property can be set to false if manual control of the coefficient scaling is desired. We simply verify that auto scaling is enabled here:

```get(h,'CoeffAutoScale') % Returns a logical true
```
```ans =

1

```

Determining the Proper Coefficient Word Length

We can make several copies of the filter to try different word lengths. Allowing the coefficient auto scaling to determine the best precision in each case.

```h1 = copy(h);
set(h1,'CoeffWordLength',12); % Use 12 bits
h2 = copy(h);
set(h2,'CoeffWordLength',24); % Use 24 bits
href = reffilter(h);
set(hfvt, 'Filters', [href, h1, h, h2]);
set(hfvt,'ShowReference','off'); % Reference already displayed once
legend(hfvt,'Reference filter','12 bits','16 bits','24 bits');
```

12 bits are clearly not enough to faithfully represent this filter. 16 bits may be enough for most applications, so we will continue to use 16 bits in this example. As a rule-of-thumb, one should expect an attainable attenuation of about 5 dB per bit.

Fixed-Point Filtering

Our main purpose is to evaluate the accuracy of the fixed-point filter when compared to a double-precision floating point version. We will see that it is not sufficient to have a faithful representation of the coefficients that keep the magnitude response approximately the same.

Generating Training Input Data

Since we just want to evaluate accuracy, we will use some random data to filter and compare against. We will create a quantizer, with a range of [-1,1) to generate random uniformly distributed white-noise data using 16 bits of wordlength.

```rng(0,'twister'); % Intialize random generator to get reproducible results
q = quantizer([16,15],'RoundMode','round');
xq = randquant(q,1000,1); % 1000 Data points in the range [-1,1)
xin = fi(xq,true,16,15);
```

Generating a Baseline Output to Compare Against

When evaluating accuracy of fixed-point filtering, there are three quantities to consider:

1. The "ideal" output, this quantity is what we would like to compute. It is computed using the reference coefficients and double-precision floating-point arithmetic.

2. The best we can hope for, this is the best we can hope to achieve. It is computed using the quantized coefficients and double-precision floating-point arithmetic.

3. What we can actually compute, this is the output computed using the quantized coefficients and fixed-point arithmetic.

Clearly we want to compare what we can actually compute to the best we can hope for. This last quantity can be computed by casting the fixed-point filter to double and filtering with double-precision floating-point arithmetic.

```xdouble = double(xin);
hdouble = double(h);
ydouble = filter(hdouble,xdouble);
```

For completeness we show how to compute the "ideal" output. And how much the effect of solely quantizing the coefficients affects the output of the filter.

```yideal = filter(href,xdouble);
norm(yideal-ydouble)     % total error
```
```ans =

3.4555e-04

```
```norm(yideal-ydouble,inf) % max deviation
```
```ans =

3.8847e-05

```

Computing the Fixed-Point Output

Next we will perform the actual fixed-point filtering. Once again, the best we can hope to achieve is to have an output identical to ydouble.

```y = filter(h,xin);
norm(double(y)-ydouble)     % total error
```
```ans =

0

```
```norm(double(y)-ydouble,inf) % max deviation
```
```ans =

0

```

The errors are exactly zero, showing that no quantization is being introduced in the accumulator. The products are set by default to full precision, so we know that no errors are occurring there. Finally the output have the same specifications as the accumulator which eliminates quantization error at the output completely.

The Advantages of Having Guard Bits

If compare the product settings, with the accumulator settings:

```info(h)
```
```Discrete-Time FIR Filter (real)
-------------------------------
Filter Structure  : Direct-Form FIR
Filter Length     : 81
Stable            : Yes
Linear Phase      : Yes (Type 1)
Arithmetic        : fixed
Numerator         : s16,17 -> [-2.500000e-01 2.500000e-01)
Input             : s16,15 -> [-1 1)
Filter Internals  : Full Precision
Output          : s34,32 -> [-2 2)  (auto determined)
Product         : s31,32 -> [-2.500000e-01 2.500000e-01)  (auto determined)
Accumulator     : s34,32 -> [-2 2)  (auto determined)
Round Mode      : No rounding
Overflow Mode   : No overflow
```

We notice that the accumulator has 3 extra bits available. This is typical of most fixed-point DSP processors. These bits are usually referred to as guard bits. They provide a safety net for intermediate overflows. The easiest way of appreciating their value is to remove them and see what happens (we adjust the output setting accordingly),

```set(h,'FilterInternals','SpecifyPrecision');
set(h,'AccumWordLength',get(h,'ProductWordLength'));
set(h,'OutputWordLength',get(h,'AccumWordLength'));
```

We now enable quantization reports. The logging capability is integrated to the 'filter' method. It is triggered when the 'Logging' FI preference is 'on'. The stored report corresponds to the last simulation. It is overwritten each time the filter command is executed.

```p = fipref; previousLoggingMode = p.LoggingMode;
fipref('LoggingMode', 'on');
y = filter(h,xin);
R = qreport(h)
```
```
R =

Fixed-Point Report
---------------------------------------------------------------------------------------------
Min              Max       |              Range              |      Number of Overflows
---------------------------------------------------------------------------------------------
Input:      -0.9989624        0.9989624 |             -1       0.99996948 |              0/1000 (0%)
Output:      -0.2498605       0.24979357 |          -0.25             0.25 |              0/1000 (0%)
Product:     -0.14476998       0.14476998 |          -0.25             0.25 |             0/81000 (0%)
Accumulator:     -0.24998221       0.24997947 |          -0.25             0.25 |           806/80000 (1%)
```

The quantization report contains the minimum and maximum values that were recorded during the last simulation (values are logged before quantization), the range and the number of overflows of different internal signals. As expected, we can see that overflows are occurring in the accumulator.

```norm(double(y)-ydouble)     % total error
```
```ans =

7.8102

```
```norm(double(y)-ydouble,inf) % max deviation
```
```ans =

0.5000

```
```plot([ydouble,double(y)])
xlabel('Samples'); ylabel('Amplitude')
legend('ydouble','y')
set(gcf, 'Color', [1 1 1])
```

The error is large now, because overflow occurred as can be seen in the plot.

Avoiding Overflow with No Guard Bits

It is possible to not have overflow even if guard bits are not available. From the plots of y and ydouble, it was clear that one bit for the integer part was all that was required in this specific case to avoid overflow. We can improve the results slightly with this setting, but this is specific to the current filter coefficients and input signal.

```set(h,'AccumFracLength',get(h,'AccumWordLength')-1);
set(h,'OutputFracLength',get(h,'AccumFracLength'));
y = filter(h,xin);
```
```R = qreport(h)
```
```
R =

Fixed-Point Report
---------------------------------------------------------------------------------------------
Min              Max       |              Range              |      Number of Overflows
---------------------------------------------------------------------------------------------
Input:      -0.9989624        0.9989624 |             -1       0.99996948 |              0/1000 (0%)
Output:     -0.59239117        0.5617338 |             -1                1 |              0/1000 (0%)
Product:     -0.14476998       0.14476998 |          -0.25             0.25 |             0/81000 (0%)
Accumulator:     -0.65199985       0.57536921 |             -1                1 |             0/80000 (0%)
```

The quantization report let us verify that the overflows are eliminated and that the signals occupy the full range i.e. the scaling is optimal for this particular training data.

```norm(double(y)-ydouble)     % total error
```
```ans =

7.8283e-08

```
```norm(double(y)-ydouble,inf) % max deviation
```
```ans =

8.8476e-09

```

The error seems small because there is no output quantization error in this case. If we use 16 bits for the output, the error is much larger.

```set(h,'OutputWordLength',16);
set(h,'OutputFracLength',15);
y = filter(h,xin);
norm(double(y)-ydouble)     % total error
```
```ans =

2.7438e-04

```
```norm(double(y)-ydouble,inf) % max deviation
```
```ans =

1.5248e-05

```
```fipref('LoggingMode', previousLoggingMode);
```