

# A Fast Dynamic 64-bit Comparator with Small Transistor Count

CHUA-CHIN WANG\*, YA-HSIN HSUEH, HSIN-LONG WU and CHIH-FENG WU

Department of Electrical Engineering, National Sun Yat-Sen University, Kaohsiung 80424, Taiwan, ROC

(Received 1 May 2000; Revised 16 March 2001)

In this paper, we propose a 64-bit fast dynamic CMOS comparator with small transistor count. Major features of the proposed comparator are the rearrangement and re-ordering of transistors in the evaluation block of a dynamic cell, and the insertion of a weak *n* feedback inverter, which helps the pull-down operation to ground. The simulation results given by pre-layout tools, e.g. HSPICE, and post-layout tools, e.g. TimeMill, reveal that the delay is around 2.5 ns while the operating clock rate reaches 100 MHz. A physical chip is fabricated to verify the correctness of our design by using UMC (United Microelectronics Company) 0.5  $\mu$ m (2P2M) technology.

Keywords: Comparator; Dynamic CMOS; Small transistor count; Feedback inverter; High speed; VLSI

## INTRODUCTION

High speed operation has long been a target of circuit design owing to the speed demand of supercomputing, CPU, etc. One of the critical operations is the comparison of two binary data. Theoretically, the fastest comparator is made of full combinatorial logic gates. However, the gate count, the area and the fan-in will be problems when the length of the data is very large, e.g. n = 64. Besides, wide bit comparators are key components in the design of parallel testing, signature analyzer and built in self test (BIST) circuits, etc. [4]. Although high fan-in gates are useful in a number of applications, they are not practical in a single stage of static CMOS. Since the NMOS and PMOS transistors of a static CMOS gate are dual of each other, one of them will always be arranged in series. These transistors also increase the loading seen by their previous stages. When a large fan-in is required, the dynamic logic, thus, has to be used [1,2]. Meanwhile, other prior dynamic logic design styles suffer from different difficulties. For example, domino logic [6] cannot be noninverting; NORA [6] has the charge sharing problem; all-N-logic [6] and robust single phase clocking [1] cannot operate correctly under clocks with short rise time or fall time, which cannot be easily integrated with other part of logic design; singlephase logic [6] and Zipper CMOS [6] contain slow P-logic blocks. In this work, we propose a fast 64-bit dynamic comparator with small transistor count.

#### FAST 64-BIT COMPARATOR CIRCUIT

#### **Prior Comparators**

Three comparator circuits have been proposed [5].

- (1) The equality comparator using the combination of XNOR gates and an NAND gate is shown in Fig. 1.
- (2) The comparator using a pass-gate logic structure is shown in Fig. 2.
- (3) As shown in Fig. 3, another version of the comparator, using a merged XNOR/NOR gate and pseudo-nMOS FETs, is presented.

# **Equality Comparator**

An example of the proposed dynamic CMOS 4-bit equality comparator is shown in Fig. 4. In Fig. 4, when the CLK is low, Node\_1 is precharged to VDD. If  $A\langle 0 \rangle$  and  $B\langle 0 \rangle$  are both high, then N1 and N2 are on and P1 and P2 are off. Thus, no current path exists during the evaluation period, and then Node\_1 will be kept high. If  $A\langle 0 \rangle$  is high and  $B\langle 0 \rangle$  is low, then N1 and P2 are on. Thus, a current path is formed between Node\_1 and ground through P2 and N1 during the evaluation period. Node\_1 will then be pulled down. The truth table is tabulated in Table I.

The operation for A(1) and B(1), A(2) and B(2), and A(3) and B(3) is the same. In short, when any pair of A(i) and

<sup>\*</sup>Corresponding author. Tel.: +886-7-5252000, Ext. 4144. Fax: +886-7-5254199. E-mail: ccwang@ee.nsysu.edu.tw



FIGURE 1 Prior equality comparator (a).

 $B\langle i\rangle$  is not equal, a current path will be formed and Node\_1 will be low. By contrast, if  $A\langle i\rangle$  is equal to  $B\langle i\rangle$  for all i, Node\_1 will keep high [5]. Notably, because PMOSs are used in the discharge path, the voltage of Node\_1 can only be discharged to Vtp instead of GND. Thus, a latch is required to connect to Node\_1. The weak n feedback is used to pull down Node\_1 to ground when Node\_1 is in the low state. The weak p feedback is utilized to latch Node\_1 to VDD when Node\_1 is in the high state. Hence, the charge redistribution problem can be resolved. The pull-up time is determined only by the pull-up transistors P0, but the ground switch N0 will increase the pull-down time. Note that the ground switch may be omitted if the

TABLE I Truth table of the equality comparator

|        | $A\langle 0\rangle = = B\langle 0\rangle$ | $A\langle 0\rangle! = B\langle 0\rangle$ |
|--------|-------------------------------------------|------------------------------------------|
| Node_1 | 1                                         | 0                                        |
| Output | 0                                         | 1                                        |

inputs of every pair are guaranteed at the same states during the precharge period [2,5].

## Zero/one Detector

The same design methodology can be applied to another important application. That is, the zero/one detector. Notably, detecting all ones or all zeros on wide words requires large fan-in AND or OR gates. Constructing a tree of AND gates can overcome this problem, as shown in Fig. 5. Alternatively, another version of design, as shown in Fig. 6, was proposed [5]. The zero/one detector is also employed in the parallel testing of memory, where the outputs of the arrays are compared against the expected data, as shown in Fig. 7.

The proposed circuit of a 4-bit zero/one comparator is shown in Fig. 8. When the CLK is low, Node\_1 is precharged to VDD. If Ref, the reference data, is set high and  $D\langle 0 \rangle$ ,  $D\langle 1 \rangle$ ,  $D\langle 2 \rangle$  and  $D\langle 3 \rangle$  are all high, then N, N0, N1, N2 and N3 are on while P, P0, P1, P2 and P3 are all off. Thus, no current path exists during the evaluation period,



FIGURE 2 Prior equality comparator (b).



FIGURE 3 Prior equality comparator (c).

DYNAMIC CMOS 391



FIGURE 4 Proposed equality comparator.



FIGURE 5 Prior zero/one comparator (a).



FIGURE 6 Prior zero/one comparator (b).



FIGURE 7 Prior zero/one comparator (c).

TABLE II Truth table of the zero/one detector

|        | $D\langle i\rangle = = \text{Ref}$ | $D\langle i\rangle! = \text{Ref}$ |
|--------|------------------------------------|-----------------------------------|
| Node_1 | 1                                  | 0                                 |
| Output | 0                                  | 1                                 |

and then Node\_1 will be kept high. Similarly, if Ref is low and  $D\langle 0 \rangle$ ,  $D\langle 1 \rangle$ ,  $D\langle 2 \rangle$  and  $D\langle 3 \rangle$  are all low, then N, N0, N1; N2 and N3 are off and P, P0, P1, P2 and P3 are all on. Thus, no current path exists during the evaluation period either, and then Node\_1 will be kept high. If any input is different from Ref, there will be some NMOS and PMOS turned on simultaneously. A current path will then be formed between Node\_1 and ground during the evaluation period. Node\_1 will be discharged to low. The truth table is tabulated in Table II.

# **Transistor Count and Speed Comparison**

The total transistor count of the mentioned circuits is summarized in Table III.

TABLE III Transistor counts comparison (Note: n is the number of the input)

|                       | Transistor count                        |  |
|-----------------------|-----------------------------------------|--|
| Equality comparator   | 6n + 2n                                 |  |
| Fig. 1                | 6n + 2n                                 |  |
| Fig. 2                | 12n + 5                                 |  |
| Fig. 3                | 8n + 1                                  |  |
| Fig. 4 (The proposed) | 4n + 6                                  |  |
| Zero/one comparator   |                                         |  |
| Fig. 5                | $4(n/2) + 4(n/4) + 4(n/8) + \cdots + 4$ |  |
| Fig. 6                | 4(n-1)+2(n/2-1)                         |  |
| Fig. 7                | 6n + 2n                                 |  |
| Fig. 8 (The proposed) | 2(n+1)+6                                |  |

TABLE IV Input capacitance comparison (Cg is the gate capacitance and Cs is the source capacitance)

| Equality comparator     | Input capacitance     |  |
|-------------------------|-----------------------|--|
| 10T XNOR                | 2Cgp + 2Cgn           |  |
| 4T XNOR (cross-coupled) | Cgp + Cgn + Csn       |  |
| 6T XOR A terminal       | 2Cgp + Cgn + Csp      |  |
| 6T XOR B terminal       | Cgp + Cgn + Csp + Csn |  |
| Fig. 2                  | Cgp + 3Cgn            |  |
| Fig. 3                  | Cgp + 2Cgn            |  |
| The proposed            | Cgp + Cgn             |  |

Note that tiny XOR with 6 transistors is used for the traditional comparators. It is obvious that the transistor count of the proposed comparators is much less than that of the other comparators with the same functionality. Regarding the speed comparison, owing to the low input capacitance of the dynamic logic, the speed performance is better than that of other logics. The comparisons of input capacitance of different comparators are tabulated in Table IV. Notably, the input capacitance of the proposed circuit is the minimum. Besides, there are only two stages in the proposed circuits, which make the total delay time shorter. Thus, the speed performance of the proposed design is expected to be better than that of the previous designs.



FIGURE 8 Proposed zero/one comparator.



FIGURE 9 64-bit comparator architecture.

## Design of the 64-bit Comparator

Following the proposed design strategy, a hierarchical design of a fast 64-bit comparator is shown in Fig. 9, which is composed of eight 8-bit equality comparators and one 8-bit zero/one comparator. The individual 8-bit equality comparator, respectively, determines the equality of one of the eight corresponding bytes of the two input 64-bit data, and produces one output signal to the 8-bit zero/one comparator wherein the Ref is set to "0". In other words, the overall 64 bits are divided into eight bytes which are evaluated at the same time, and then the 8-bit zero/one comparator produces the final output signal. HSPICE is employed to optimize the speed. The length of

TABLE V The transistor width used in our designs (wire loading = 0.1 pF, unit =  $\mu m$ )

| W in zero/one |
|---------------|
| 15            |
| 20            |
| 2.5           |
| 10            |
| 20            |
| 2             |
| 0.9           |
| 0.9           |
|               |

all of the transistors are all set to  $0.6\,\mu m$ , while their widths are illustrated in Table V.

### SIMULATIONS AND CHIP LAYOUT

The entire 64-bit comparator simulated by HSPICE reveals a very short delay as tabulated in Table VI.

The clock rate can run up to 200 MHz with 0.01 ps rise/fall time. Figure 10 is the waveform when the clock rate is 200 MHz. Figure 10 is also known to be the worst case scenario. That is, there is only one-bit difference between the two 64-bit input data. The TimeMill simulation results indicate a 2.5 ns delay without pads and 4.5 ns with pads.

The design is carried out by using UMC (United Microelectronics Company)  $0.5\,\mu m$  (2P2M) technology. The chip layout with pads is shown in Fig. 11 which

TABLE VI The delays of the proposed 64-bit comparator

| I/O path       | Delay (ns |
|----------------|-----------|
| clk → output   | 2.126     |
| input → output | 2.120     |



FIGURE 10 Simulation waveform.



FIGURE 11 Chip layout.

occupies  $1.8\times1.8\,\mathrm{mm^2}$  while the core is only  $145\times240\,\mu\mathrm{m^2}$ . The data are serially byte-wide I/Oed. We also simulate several comparator designs using different logics. Note that the adders/subtractors are also often used as comparators. The results are tabulated in Table VII.

The proposed design was approved by CIC (Chip Implementation Center) of NSC (National Science

TABLE VII The performance comparison of different designs

| Logic                     | Delay   | # Transistors |
|---------------------------|---------|---------------|
| 64-b PLA-ANT CLA [6]      | 4.0 ns  | 8352          |
| 32-b EMODL adder [1]      | 2.7 ns  | 1537 (gates)  |
| 8-b TSPC adder (1 µm) [3] | 7.5 ns  | 1832          |
| All-N-logic [3]           | Failed  | 2062          |
| The proposed              | 2.50 ns | 328           |



FIGURE 12 Die photo.



FIGURE 13 Simulation waveforms given randomly normal inputs.



FIGURE 14 Simulation waveforms given the worst case of inputs ("1FF" and "1FE").

DYNAMIC CMOS

Council) to be fabricated by UMC given the chip number: U05-89B-11u. The physical die photo of the proposed comparator chip is shown in Fig. 12. We used HP 1660 CP analyzer/pattern generator to test the chip. Figures 13 and 14, respectively, show the measured results given random input data and the worst case of inputs which differ by only one bit. The maximum operating clock is 35 MHz.

#### CONCLUSION

Several dynamic CMOS comparators are proposed with a number of advantages. The transistor count is much less than that of the other similar designs. Although it has high fan-in, the number of series transistors is only two, which in turn reduce the pull down delay. Compared with XOR-based equality comparators and deterministic comparators, the proposed design is much faster. The design methodology is proven to implement a fast 64-bit dynamic comparator.

### Acknowledgements

This research was partially supported by National Science Council under grant NSC 88-2219-E-110-001 and 89-2215-E-110-014.

#### References

- Afghahi, M. (1996) "A robust single phase clocking for low power, high-speed VLSI applications", *IEEE Journal of Solid-State Circuits* 31(2), 247–253.
- [2] Clark, L.T. and Taylor, G.F. (1996) "High fan-in circuit design", IEEE Journal of Solid-State Circuits 31(1), 91–96.
- [3] Gu, R.X. and Elmasry, M.I. (1996) "All-N-logic high-speed truesingle-phase dynamic CMOS logic", *IEEE Journal of Solid-State Circuits* 31(2), 221–229.
- [4] van de Goor, A.J. (1994) Testing Semiconductor Memories (Wiley, Reading).
- [5] Weste, N.H.E. and Eshraghian, K. (1993) Principle of CMOS VLSI Design (Addison Wesley, Reading).
- [6] Wang, C.-C., Wu, C.-F. and Tsai, K.-C. (1998) "A 1.0 GHz 64-bit high-speed comparator using ANT dynamic logic with two-phase clocking", *IEE Proceedings—Computers and Digital Techniques* 145(6), 433–436.

# **Authors' Biographies**

Chua-Chin Wang was born in Taiwan, in 1962. He received the BS degree in Electrical Engineering from National Taiwan University, Taiwan, in 1984 and the MS and PhD degrees in electrical engineering from State University of New York, Stony Brook, in 1988 and 1992, respectively. Currently he is a Professor in the Department of Electrical Engineering, National Sun Yat-Sen University, Taiwan. His research interests include low-power logic and circuit design, VLSI design, and neural networks implementations.

Ya-Hsin Hsueh was born in Taiwan, in 1976. She received BS and MS degree in Electrical Engineering from National Sun Yat-Sen University, Taiwan, in 1998 and 2000, respectively. She is currently working toward the PhD degree in Electrical Engineering at National Sun Yat-Sen University. Her current research interests are VLSI design and interfacing I/O circuits.

Hsin-Long Wu was born in Taiwan, in 1976. He received BS and MS degree in Electrical Engineering from National Sun Yat-Sen University, Taiwan, in 1998 and 2000, respectively. He is currently working in Computer and Communication Lab of Industrial Technology Research Institute. His current research interests are VLSI design, and system integration.

Chi-Feng Wu was born in Kaohsiung, Taiwan, in 1961. He received his BS degree in electrical engineering from National Cheng Kung University, Tainan, Taiwan and the MS degree (1994) and the PhD degree (2000) in Electrical Engineering from National Sun Yat-Sen University, Taiwan. Since 1987, he has been working with Philips Semiconductor in Kaohsiung. Currently he is the Factory Manager of Wafer Testing Factory of Philips Semiconductor, Kaohsiung. His major research interests include design for testability, Iddq testing and VLSI design.