# A 2K/8K DUAL-MODE FFT PROCESSOR FOR OFDM OF DVB-T RECEIVERS§ Chua-Chin Wangt, Jian-Ming Huang, and Hsian-Chang Cheng Department of Electrical Engineering National Sun Yat-Sen University Kaohsiung, Taiwan 80424 email: ccwang@ee.nsysu.edu.tw #### ABSTRACT We present a novel implementation for 2K/8K dual-mode FFT (fast Fourier transform) for OFDM (orthogonal frequency division multiplexing) of DVB-T (Digital Video Broadcasting Terrestrial) receivers. Besides pipelining the FFT to reduce the area and enhance the data throughput. SDF (single-path delay feedback) butterfly units for radix-2 and radix-4 processing is adopted to resolve the power consumption difficulty and the P&R (place and route) problem. The SRAM is used in the butterfly units to relax the autorefreshing requirement if DRAM is used such that not only is the dynamic power saved, the timing control is also less stingy. The 2K/8K FFT comprises 5/6 cascaded stages of radix-4 and one stage of radix-2 butterfly units. The proposed design is carried out by 0.35 µm 2P4M CMOS process to verify the high processing 8 MHz rate with power dissipation as low as 535 mW at a 16 MHz system clock. $\begin{tabular}{ll} Keywords: OFDM, DVB-T receiver, DIF, radix-4 butter-fly unit, dual-mode \end{tabular}$ ## 1. INTRODUCTION Digital TV (DTV) is currently one of the major consumer products which imposes a strong impact to a large amount of users globally. The DVB compliant DTV and set-top box (STB) has been gradually adopted in Europe as well as Asia mainly owing to that OFDM processing supported by DVB has been proven to overcome the multi-path effects in mobile receivers. Hence, OFDM is deemed as one of the most critical IPs (intellectual property) in the implementation of DVB receivers. Since OFDM utilizes multiple of orthogonal subcarriers to transmit the same signal, it is highly insensitive to the multi-path effects. It allows up to 24 Mbps wide-band rate within an 8 MHz transmission bandwidth. It also leads to the concept of SFN (single frequency network) in which many transmitters send the same signal on the same frequency. With regard to the terrestrial broadcast, DVB-T allows two modes: 2K and 8K modes [5]. The former is proper to mobile receiving, while the latter is used in the SFN. The implementation of the FFT is the most difficult part for the DVB-T receivers [10]. Hence, many efforts have been thrown upon the research of efficient implementation of the FFT realization. Pipelining is probably the most common feature in prior designs, [1], [4], [6], [8], [9], [10]. However, the design of a more efficient butterfly stage seems to play a more important role which determines the throughput of the pipelining architectures. In this work, we adopt a SRAM-based SDF (signal-path delay feedback) butterfly stage in a DIF (decimation in frequency) rather than a DIT (decimation in time) FFT design. The power consumption is found to be 535 mW@16 MHz and 3.3 V power supply by using 0.35 2P4M CMOS process. ## 2. LOW-POWER FFT DESIGN FOR OFDM An illustrative DVB-T receiver is shown in Fig. 1. According to the DVB-T specifications [5], FFT/IFFT should be able to carry out 8192(8K) points in carrier spacing interval. It can be told that the realization of the OFDM Demodulator, i.e., the 8K/2K FFT, is the very critical part since it directly affects the accuracy of the channel estimation as well as the symbol demapper. ## 2.1. FFT theory The basic butterfly stages widely used in prior FFT designs are radix-2, radix-4, radix-n, and split-radix. The radix-2 unit is the most popular one owing to its simplicity. However, when the points of the FFT required to be computed increase, radix-4 will possess the edge of less computation complexity [10]. Another factor to be take into consideration is the difference between DIT [3] and DIF. Since the OFDM algorithm is based upon the utilization of multiple subcarriers, we tend to adopt DIF scheme instead of DIT to avoid any transformation between time domain and frequency domain. The input signal to the N-point FFT is denoted by x[n]. Hence, it is well known as follows. $$X[k] = \sum_{n=0}^{N-1} x[n]W_N^{nk}, \quad k = 0, 1, \dots, N-1,$$ (1) where $W_N^{nk}=e^{-j2\pi\,nk/N}$ is the twiddle factor. Thus, the function of a radix-4 unit is represented by the following equality. $$X[4r+p] = \sum_{n=0}^{N/4-1} \left\{ x[n] + x[n + \frac{N}{4}] \cdot W_4^p + x[n + \frac{N}{2}W_4^{2p} + x[n + \frac{3N}{4}]W_4^{3p} \right\} \cdot W_N^{np} \cdot W_{N/4}^{np}, \tag{2}$$ where $r=0 \sim (N/4)-1$ , p=0,1,2,3, and $n=0 \sim (N/4)-1$ . The corresponding butterfly graph is shown in $<sup>\</sup>S$ This research was partially supported by National Science Council under grant NSC 92-2220-E-110-001 and 92-2220-E-110-004. <sup>†</sup>the contact author Fig. 2. Fig. 3 is its signal flow chart. Notably, x'[n] is the intermediate value of x[n] in the figure. ## 2.2. Low-power FFT architecture The DVB-T specs require that the symbol data rate is 8 MHz. Given such a high data, the stage of memory as well as the routing area will be very demanding particularly in the 8K mode (= 8192 symbols). The pipeline structure seems to be an unavoidable option to carry out the high data rate design. Fig. 4 shows the pipeline structure of the 2K mode FFT design. Every block in Fig. 4 is called a butterfly stage. Notably, the data are serial-inserial-out. For instance, the top-leftmost radix-4 butterfly stage won't start the operation until the 0th (x[0]), the 512nd (x[0+2048/4]), the 1024th $(x[0+2048\cdot 2/4])$ , and the 1536th $(x[0+2048\cdot3/4])$ points. As soon as the radix-4 computation is done, the result is propagated to the very next stage. As for the next operation of the top-leftmost butterfly stage, it is triggered as soon as the the 1537th $(x[0+2048\cdot 3/4+1])$ point is collected. Obviously, the number of waiting latencies is reduced. The process of 2K mode is composed of 5 radix-4 stages and one radix-2 stage. By contrast, the 8K mode is realized by 6 radix-4 stages and one radix-2 stage. For the sake of area-saving, the first radix-4 stage can be bypassed by a mode selection signal. Notably, since the DIF is adopted in our design, the final result generated by the last radix-2 stage must be bitwise reversed. #### 2.3. Low-power butterfly unit The radix-4 unit used in the proposed FFT is shown in Fig. 5. Notably, the temporary storage elements are SRAMs which consume much less area than DFF-based registers, and no self-refreshing dynamic power consumption as the DRAM-based storage cells. A single operation of the radix-4 butterfly unit is composed of 4 cycles, which are summarized as follows. - cycle 0-2: The input data point, x[n], x[n+N/4], x[n+N/2], are serially read and stored in individual SRAM cells. Meanwhile, the intermediate values of the last OFDM demodulation operation, $x^{'}[n+N/4]$ , $x^{'}[n+N/2]$ , $x^{'}[n+N\cdot3/4]$ , are delivered to the next pipeline stage after the multiplication with twiddle factors stored in the ROM. - cycle 3: As soon as $x[n+N\cdot 3/4]$ is ready, the operation of the radix-4 butterfly is triggered. The generated intermediate values will then be stored, which will be delivered in the next butterfly operation. It can be concluded that as soon as the operation of the previous symbol is completed, the next one will take place right away. There is no waiting latency nor idle cycle. The throughput of the pipeline structure is maximized. # 2.4. Memory considerations **ROM**: The ROMs are used to store the twiddle factors for the FFT computation. By taking advantage of the periodical property of the twiddle factors and the truncation errors, lots of ROM space will be saved. Take the 2k mode as an example. $W_n^{0k}, W_N^{2nk}, W_N^{nk}, W_N^{3nk}$ are the corresponding twiddle factors to the intermediate values, $x^{'}[n], x^{'}[n+N/4], x^{'}[n+N/2], x^{'}[n+N\cdot3/4]$ , respectively. Considering the truncation errors, 10-bit resolution are enough to represent the twiddle factors. Besides, since one fourth of the twiddle factors are equal to $W_N^{0k}=1$ , a 1536×10 ROM is sufficient to keep all of the required twiddle factors for the 2K mode. SRAM: It is well known the the chip area of DRAM is much smaller than that of SRAM. However, if DRAM is used, the required self-refreshing operations thereof will deteriorates the timing control of the FFT circuitry in addition to the significant dynamic power consumption. Hence, the SRAM is adopted to be the storage elements. According to Fig. 5, every stage in the pipeline structure needs 3 SRAMs and one ROM. Take the first radix-4 stage of the 8K mode FFT as an example. A total of 6144 $(Q+j\cdot I)$ points for x[n], x[n+N/4], x[n+N/2] must be stored in three 2048×10 SRAMs. As a matter of fact, 6 2048×10 SRAMs are used to separately store Q and I symbols. Meanwhile, the large 6144×10 ROM to store twiddle factors are replaced with two smaller ROMs, one 4096×10 and one 2048times10, to enhance the access speed. The overall memory usage of the proposed FFT is tabulated in Table 1. ## 3. SIMULATION AND IMPLEMENTATION The proposed FFT is implemented by TSMC 0.35 $\mu$ m 2P4M CMOS technology to verify the performance. Notably, all of the process corners : $[0^{\circ}C, +100^{\circ}C]$ , (SS, TT, FF) models, and VDD±15%, are simulated. The layout of the proposed Tx and Rx on silicon is shown in Fig. 6. A built-in testing circuit compose of a pseudo random number generator is used to test the proposed FFT. A total of 8192 pairs of $(Q + j \cdot I)$ , where $Q, I \in [-128, +127]$ to meet the 8-bit symbol data format. Fig. 7 shows the first 8 symbols' output waveforms of the proposed FFT in the 8K mode. The signal, out\_start, is pulled high right after the rising edge of the fifth clock cycle. Fig. 8, 9, and 10, are, respectively, the snapshots of the comparison using our design (the 3rd, 4th, and 5th points) and the MATLAB S/W (the 6144th, 1024th, and 5120th points). They are exactly matched. Table 2 summarizes the characteristics of the proposed FFT design. Meanwhile, we also make a performance comparison of our FFT with several prior designs in Table 3. Not only do we have the smallest gate count as well as the chip area, the proposed design consumes the least power in general. ## 4. CONCLUSION We propose a novel FFT design for DVB-T OFDM demodulator by taking advantage of the pipeline architecture and the SRAM-based butterfly stages so as to achieve low power dissipation and small area. Besides, thorough post-layout simulations confirm the superiority of our design in terms of the gate count as well as the power efficiency. ## 5. REFERENCES - S. Anikhindi, G. Cradock, R. Makowitz, and C. Petzelt, "A commercial DVB-T demodulator chipset," 1997 International Broadcasting Convention, pp. 528-533, Sep. 1997. - [2] E. Bidet, J. C. Castekain, and P. Senn, "A fast singlechip implementation of 8192 complex point FFT," - $IEEE\ J.\ of\ Solid-State\ Circuits,\ vol.\ 20,\ no.\ 3,\ pp.\ 300-305,\ Mar.\ 1995.$ - [3] A. Buttar, R. Makowitz, C. Patzelt, J. Gledhill, S. Anikhindi, "FT And OFDM Receiver ICS For DVB-T Decoders," 1997. Inter. Conf. on Consumer Electronics (ICCE'97), pp. 102-103, June 1997. - [4] P. Combelles, C. Del Toso, D. Hepper, D. Le Goff, J. J. Ma, P. Rovertson, F. Scalise, L. Soyer, and M. Zamboni, "A receiver architecture conforming to the OFDM based digital video broadcasting standard for terrestrial transmission (DVB-T)," 1998 IEEE International Conference on Communications (ICC'98), vol. 2, pp. 7-11, June 1998. - [5] European Broadcasting Union, "Digital Video Broadcasting (DVB): Framing Structure, Channel Coding and Modulation for Digital Terrestial Television," Data Sheet: ETSI EN 300 744, Jan. 2001. - [6] S. A. Fechtel, and A. Blaickner, "Efficient FFT and equalizer implementation for OFDM receivers," *IEEE Trans. on Consumer Electronics*, vol. 45, no. 4, pp. 1104-1107, Nov. 1999. - [7] L. Jia, Y. Gao, J. Isoaho, and H. Tenhunen, "A new VLSI-oriented FFT algorithm and implementation," IEEE ASIC Conference, pp. 337-341, 1998. - [8] Y. Jung, H. Yoon, and J. Kim, "New efficient FFT algorithm and pipeline implementation results for OFDM/DMT applications," *IEEE Trans. on Consumer Electronics*, vol. 49, no. 1, Feb. 2003. - [9] R. Makowitz, A. Buttar, S. Anikhindi, J. Gledhill, C. Patzelt, "DVB-T decoder ICs," *IEEE Trans. on Consumer Electronics*, vol. 43, no. 3, Aug. 1997. - [10] J.-H. Suk, D.-W. Kim, T.-W. Kwon, S.-K. Hyung, and J.-R. Choi, "A 8192 complex point FFT/IFFT for COFDM modulation scheme in DVB-T system," 2003 Inter. SOC (System-on-Chip) Conference, vol. 5, pp. 131-134, Dec. 2003. | stage no. | I/P | O/P | $\operatorname{SRAM}$ | ROM | |-----------|----------|---------|---------------------------|------------------| | 1 | 8 bits | 19 bits | $2048 \times 10 \times 6$ | 4096×20 | | | | | | $2048 \times 10$ | | 2 | 19 bits | 22 bits | $512\times21\times6$ | $1024 \times 20$ | | | | | | $512 \times 20$ | | 3 | 22 bits | 25 bits | $128\times24\times6$ | $256 \times 20$ | | | | | | $128 \times 20$ | | 4 | 25 bits | 28 bits | $32 \times 27 \times 6$ | 128×20 | | 5 | 28 bits | 31 bits | None | $64 \times 20$ | | 6 | 31 bits | 34 bits | None | None | | 7 | 34 bits | 28 bits | None | None | | | | | | | Table 1: Memory usage in the proposed design Figure 1: DVB-T receiver | Technology | $0.25~\mu~\mathrm{2P4M~CMOS}$ | |---------------------------------------------|-------------------------------| | Vdd | 3.3 V | | no. of FFT points | 2048/8192 | | $\operatorname{data} \operatorname{length}$ | 8 bits | | $\operatorname{data}$ rate | $8~\mathrm{MHz}$ | | Power dissipation | $535~\mathrm{mW@16~MHz}$ | | Gate count | 139 K | | Temp. range | $0^{o} 75^{o} C$ | Table 2: Characteristics of the proposed dula-mode FFT processor | | [1] | [4] | [7] | [2] | ours | |--------------------|---------------------|---------------------------|---------------------|---------------------|---------------------------| | Tech. | $0.5~\mu\mathrm{m}$ | $0.5~\mu\mathrm{m}$ | $0.5~\mu\mathrm{m}$ | $0.5~\mu\mathrm{m}$ | $0.35~\mu\mathrm{m}$ | | Area | N/A | N/A | $140~\mathrm{mm}^2$ | $1.0~\mathrm{cm^2}$ | $35.75 \text{ mm}^2$ | | $\# \mathrm{Gate}$ | 1.8 M | 1.1 M | 1.3 M | 1.5 M | 139 K | | Power | N/A | N/A | $650~\mathrm{mW}$ | $600~\mathrm{mW}$ | $535~\mathrm{mW}$ | | Rate | $9.14~\mathrm{MHz}$ | 4R | N/A | N/A | $8~\mathrm{MHz}$ | | $\mod$ es | 2K | $2\mathrm{K}/8\mathrm{K}$ | 2K/8K | 8K | $2\mathrm{K}/8\mathrm{K}$ | Table 3: Performance comparison Figure 2: radix-4 butterfly unit Figure 3: signal flow in the radix-4 butterfly unit Figure 4: 2K/8K mode pipeline-structured OFDM FFT Figure 5: schematic of the proposed radix-4 butterfly Figure 6: layout of the proposed FFT for OFDM demodulator Figure 7: an illustrative post-layout output waveform of the proposed $\operatorname{FFT}$ Figure 8: an output (3rd) point compared with the expected result (6144th point) given by MATLAB Figure 9: an output (4th) point compared with the expected result (1024th point) given by MATLAB Figure 10: an output (5th) point compared with the expected result (5120th point) given by MATLAB