

# Improved Design of C<sup>2</sup>PL 3-2 Compressors for Inner Product Processing\*

CHUA-CHIN WANG†, PO-MING LEE and CHENN-JUNG HUANG‡

Department of Electrical Engineering, National Sun Yat-Sen University, Kaohsiung 80424, Taiwan, ROC

(Received 1 May 1999; Revised 16 March 2001)

The inner product of two vectors might be one of the most frequently used mathematical operations in digital computation. The design style of inner product processor will become a critical issue of performance. So does its basic building block, i.e. 3-2 compressor. In this work, improved designs of 3-2 C<sup>2</sup>PL-based compressors are presented which can be used to build a fast inner product processor. The features of our compressors include a short delay minimized by HSPICE optimization, less transistor count, and high fan-out.

Keywords: Compressor; CPL; C2PL; Inner product processor; High fan-out; CMOS

#### INTRODUCTION

In digital computation, the inner product of two vectors might be one of the most frequently used mathematical operations [1]. If the vectors' dimension is large, then the carry propagation of the inner product will likely become the critical delay. Many high-speed logic design styles have been announced to resolve the propagation delay caused by inner product. However, these logics suffer from different difficulties. For example, domino logic [2] cannot be non-inverting; NORA [3] has the charge sharing problem; all-N-logic [4] and robust single phase clocking [5] cannot operate correctly under clocks with short rise time or fall time, which cannot be easily integrated with other part of logic design; single-phase logic [6] and Zipper CMOS [7] contain slow P-logic blocks. Complementary pass-transistor logic (CPL) proposed by Yano et al. [8] is twice as fast as conventional CMOS, whereas it needs more area in silicon like the conventional CMOS due to the mixed interconnection. Moreover, noise margin and speed degradation caused by the mismatched input signal level and the logic threshold voltage of the CMOS driver needs to be taken into consideration when the CPL is implemented. Though Zhang et al. proposed a C2PL (complex CPL) and demonstrated that the problems of CPL are all fixed in Ref. [9], several physical design factors are not fully considered or implemented. First, the sizes of the NMOS transistors for pass logics are impossible to be minimal. Second, the driving inverters' sizes have to be properly tuned. Third, the original design in Ref. [9] not only gives a poor fan-in and fan-out capability, but also produces very asymmetrical rise and fall delay which will very much likely cause glitch hazards and unwanted power consumption. In this paper, we propose an improved 3-2 compressor to resolve all the problems mentioned in the above. The HSPICE simulation results are presented to verify the correctness of our observation.

## FRAMEWORK OF IMPROVED COMPRESSORS

## Basic Compressor Building Block Design

A 3-2 compressor is basically a full adder. The feature of such a compressor is that the output represents the number of 1's given in inputs. The equations of a full adder are

HEE Torresponding author. Tel.: +886-7-525-2000. Ext. 4144. Fax: +886-7-5254199. E-mail: ccwang@ee.nsysu.edu.tw ¹Dr Chenn-Jung Huang is currently an associate professor with Department of Computer Science and Information Education, National Taitung Teachers College.

ISSN 1065-514X print/ISSN 1563-5171 online © 2002 Taylor & Francis Ltd DOI: 10.1080/10655140290011195

<sup>\*</sup>Portions reprinted with permission, from ISCAS 1999, IEEE International Symposium on Circuits and Systems, Vol. I, pp. 161-164, June ©1999



FIGURE 1 3-2 compressor building block

presented as follows:

$$\begin{aligned} Sum &= (A \oplus Cin)B' + (A \oplus Cin)'B = FB' + F'B \\ Cout &= (A \oplus Cin)B + (A \oplus Cin)'C = FB + F'Cin \end{aligned}$$

$$F = (A \oplus Cin)$$
 (1)

where F denotes (A $\oplus$ Cin). As shown in Fig. 1, the logic structure of a typical 3-2 compressor can be split into two logic layers. One of the three inputs, i.e. B(B'), is not required in the first logic layer. The existence of unequal delays in the 3-2 compressor paves the way for us to reduce the total delay of inner product computation by arranging the input signals to the 3-2 compressors inside the compressor tree in a proper order.

#### Prior 3-2 Compressor Design

Though a 3-2 compressor could be realized by a full adder, and Zhang et al. [9] proposed a C<sup>2</sup>PL design for 3-2 and 7-3 compressors, several design issues as addressed above were ignored in their work. Figures 2 and 3 shows the schematic diagrams for the two types of 3-2 compressors based on complex CPL (C<sup>2</sup>PL) proposed in Ref. [9]. To test if these two 3-2 compressors have enough fan-out, we add an 0.1 pF capacitor at the output side and use HSPICE to perform simulations. The results are shown in Figs. 4 and 5.

It is obvious that Zhang's original design does not have enough driving capability. Hence, it is not suitable to cascade or construct an inner product processor. However, the original design still possesses its advantages: first, these two 3-2 compressors are functionally correct; second, the number of transistors is fewer than the traditional full adder; third, these two 3-2 compressors do not contain two logic layers shown in Fig. 1.

## Improved C<sup>2</sup>PL 3-2 Compressors

We, thus, try to improve the original design to achieve three goals:

- Minimize the delay time to make a single 3-2 compressor as fast as possible.
- Increase the fan-out to make the original design adequate to large loads.
- (3) Decrease the number of transistors without malfunctioning.



FIGURE 2 Schematic of the prior 3-2 compressor I.



FIGURE 3 Schematic of the prior 3-2 compressor II.

## Minimize the Delay Time

These two 3-2 compressors both have three inputs: "A", "B", and "Cin"(carry in). However, all the three different inputs in Figs. 2 and 3 need special buffers because they all connect to the source of a transistor. It is necessary to tune the size of each buffer, particularly "Cin" pin, because the pins need to provide nearly ideal power source to drive their own loads. Besides, in order to analyze the relation of input and output more precisely, we choose the input vectors carefully to ensure that the output will respond to the switching of only one input. Figure 6 is an example to show how we measure the delay. We use HSPICE to measure the delay between each input and each output. The delay measurement is tabulated in Table I.

TABLE I The delays of different inputs (unit: ns)

| Delay      | 3-2 Compressor I | 3-2 Compressor II |
|------------|------------------|-------------------|
| Ar2Sumr    | 0.32807          | 0.27349           |
| Ar2Sumf    | 0.40614          | 0.33903           |
| Ar2Coutr   | 0.36864          | 0.29813           |
| Af2Sumr    | 0.40947          | 0.31921           |
| Af2Sumf    | 0.42800          | 0.35410           |
| Af2Coutf   | 0.44482          | 0.39079           |
| Br2Sumr    | 0.33396          | 0.28400           |
| Br2Sumf    | 0.13140          | 0.14148           |
| Br2Coutr   | 0.32067          | 0.28124           |
| Bf2Sumr    | 0.19858          | 0.20984           |
| Bf2Sumf    | 0.35582          | 0.24719           |
| Bf2Coutf   | 0.34879          | 0.25178           |
| Cinr2Sumr  | 0.31832          | 0.31444           |
| Cinr2Sumf  | 0.40463          | 0.36233           |
| Cinr2Coutr | 0.37468          | 0.34597           |
| Cinf2Sumr  | 0.38056          | 0.41332           |
| Cinf2Sumf  | 0.38806          | 0.45531           |
| Cinf2Coutf | 0.40995          | 0.48952           |

#### \* Increase the Fan-out

According to the analysis of Zhang's 3-2 compressor, Zhang's original design is verified to possess poor fan-out. To overcome this shortage, the output buffers are re-tuned to make the compressor have an enhanced fan-out capability. The size measurement is given in Table II wherein column "invsum" and "invcout" reveal our result.

#### Reduce Transistor Count

Figures 2 and 3 both indicate that "Sum/" and "Cout" pins are redundant. Therefore, we can remove these two pins and transistors therewith. In our new design, each 3-2



FIGURE 4 Original 3-2 compressor I simulation (load = 0.1 pF).



FIGURE 5 Original 3-2 compressor II simulation (load = 0.1 pF)



FIGURE 6 An example to show how we measure the delay of input signal "A" rise to output signal "Sum" rise and that of input signal "A" fall to output signal "Sum" fall.

TABLE II Transistor sizes of compressor I and II (unit:  $\mu m)$ 

| Transistor name | 3-2 Compressor I (W/L) | Compressor II (W/L) |
|-----------------|------------------------|---------------------|
| inva PMOS       | 60/1.3                 | 56.7/1.3            |
| inva NMOS       | 32.8/1.3               | 49.7/1.3            |
| invb PMOS       | 36.5/1.3               | 57.5/1.3            |
| invb NMOS       | 40.9/1.3               | 50/1.3              |
| invc PMOS       | 80/1.3                 | 80/1.3              |
| invd NMOS       | 80/1.3                 | 73.7/1.3            |
| invsum PMOS     | 23.7/1.3               | 17.8/1.3            |
| invsum NMOS     | 30/1.3                 | 25.8/1.3            |
| invcout PMOS    | 23.4/1.3               | 17.7/1.3            |
| invcout NMOS    | 30/1.3                 | 26.8/1.3            |
| pl .            | 29.5/1.3               | 28.6/1.3            |
| p2              | 29.5/1.3               | 20/1.3              |
| p3              |                        | 1.8/1.3             |
| p4              |                        | 21.6/1.3            |
| n1              | 34/1.3                 | 14.1/1.3            |
| n2              | 34/1.3                 | 1.8/1.3             |
| n3              | 28.3/1.3               | 7.5/1.3             |
| n4              | 28.3/1.3               | 4.5/1.3             |
| n5              | 23.6/1.3               | 1.8/1.3             |
| n6              | 23.6/1.3               | 1.8/1.3             |
| n7              | 22/1.3                 | 16.1/1.3            |
| n8              | 22/1.3                 | 16.1/1.3            |

compressor is eight transistors less than the corresponding prior compressor. Consequently, the area of 3-2 compressors is reduced.

## PHYSICAL IMPLEMENTATION

We use Taiwan Semiconductor Manufacturing Company (TSMC)  $0.6 \,\mu m$  IP3M technology to carry out the improved 3-2 compressors. The schematic diagrams for the improved 3-2 compressors are shown in Figs. 7 and 8, respectively. Table II shows the transistor sizes of the two 3-2 compressors, respectively. Figures 9 and 10 demonstrate the impressive simulation results of our new designs. The



FIGURE 7 Improved 3-2 compressor type I.



FIGURE 8 Improved 3-2 compressor type II.



FIGURE 9 Simulations of improved 3-2 compressor type I (load  $= 0.1 \, \text{pF}$ ).

simulation results prove the fan-out is strengthened given a 0.1 pF load. Furthermore, the delay is drastically reduced.



FIGURE 10  $\,$  Simulations of improved 3-2 compressor type II (load = 0.1 pF).

improved 3-2 compressor is capable of driving large loads, and the transistor count is reduced. The improved 3-2 compressors become very solid cells to construct an inner product processor [1].

# CONCLUSION

In this paper, two improved designs of 3-2 compressors are presented. The improved 3-2 compressors are proposed to overcome several problems appearing in Zhang's work [9]. Our simulation results show that the

## Acknowledgements

This research was partially supported by Nation Science Council under grant NSC 88-2219-E-110-001 and 80-2215-E-110-014.

#### References

- Wang, C.-C., Huang, C.-J. and Lee, P.-M. (1999) "A comparison of two alternative architectures of digital ratioed compressor design for inner product processing". *IEEE Int. Symp. Circuits Syst.* 1, 161–164.
   Krambeck, R.H., Lee, C.M. and Law, H.-S. (1982) "High-speed compact circuits with CMOS", *IEEE J. Solid-State Circuits* 17,
- 614-619.
- 614–619.
  [3] Goncalves, N.F. and De Man, H.J. (1983) "NORA: a race-free dynamic CMOS technology for pipelined logic structures", *IEEE J. Solid-State Circuits* 18, 261–266.
  [4] Gu, R.X. and Elmasry, M.I. (1996) "All-N-logic high-speed true-single-phase dynamic CMOS logic", *IEEE J. Solid-State Circuits* 31(2), 221–229.
  [5] Afghahi, M. (1996) "A robust single phase clocking for low power high-speed VLSI application", *IEEE J. Solid-State Circuits* 31(2), 247–253.
  [6] Yuan, J. and Svensson, C. (1980) "Had-great CMOS".

- Yuan, J. and Svensson, C. (1989) "High-speed CMOS circuit technique", IEEE J. Solid-State Circuits 24, 62–70.
   Lee, C.M. and Szeto, E.W. (1986) "Zipper CMOS", IEEE Circuits Devices Mag. May, 10–16.
   Yano, K., Yamanaka, T., Nishida, T., Saito, M., Shimohigashi, K. and Shimizu, A. (1990) "A 3.8-ns CMOS 16×16 b multiplier using complementary pass-transistor logic", IEEE J. Solid-State Circuits 25(2), 388–395.
   Zhong, D. and Elipsery, M.J. (1907) "Yu. St. conversed seeins with.
- [9] Zhang, D. and Elmasry, M.I. (1997) "VLSI compressor design with applications to digital neural networks", *IEEE Trans. VLSI Syst.* \$(2), 230–233.

# Authors' Biographies

Chua-Chin Wang was born in Taiwan in 1962. He received the BS degree in electrical engineering from National Taiwan University in 1984, and the MS and PhD

degree in electrical engineering from State University of New York in Stony Brook in 1988 and 1992, respectively. He is currently a Professor in the Department of Electrical Engineering, National Sun Yat-Sen University, Kaohsiung, Taiwan. His recent research interests include low power and high speed logic circuit design, VLSI design, neural networks, and interfacing I/O circuits.

Po-Ming Lee was born in Taiwan in 1973. He received the BS degree in computer science & engineering from Yuan-Ze University in 1995, and the MS degree in electrical engineering from National Sun Yat-Sen University in Taiwan in 1999. He is currently a PhD student in Department of Electrical Engineering in National Sun Yat-Sen University. His recent research interests include VLSI design, computer graphics, and consumer electronics.

Chenn-Jung Huang was born in Hualien, Taiwan, in 1961. He received the BS degree in electrical engineering from National Taiwan University, Taiwan and the MS degree in computer science from University of Southern California, Los Angeles, in 1984 and 1987, respectively. He received the PhD degree in electrical engineering from National Sun Yat-Sen University, Taiwan, in 2000. He is currently an Associate Professor in the Department of Computer Science and Information Education, National Taitung Teachers College, Taiwan. His research interests include computer communication networks, neural networks, fuzzy logic, and computer arithmetic.