# IMPROVED DESIGN OF $C^2PL$ 3-2 COMPRESSORS WITH LESS TRANSISTOR COUNT

Chua-Chin Wang, Po-Ming Lee, Chenn-Jung Hunng

Department of Electrical Engineering National Sun Yat-Sen University Kaohsiung, Taiwan 80424 Tel: 886-7-525-2000 ext. 4144 Fax: 886-7-5254199

E-Mail: ccwang@ee.nsysu.edu.tw

#### ABSTRACT

In this work, improved designs of 3-2 C<sup>2</sup>PL-based compressors are presented which can be used to build a fast inner product processor. The features of our compressors include a short delay minimized by HSPICE optimization, less transistor count, and high fan-out.

#### 1. INTRODUCTION

In digital computation, the inner product of two vectors might be one of the most frequently used mathematical operations [1]. If the vectors' dimension is large, then the carry propagation of the inner product will likely become the critical delay. Many high-speed logic design styles have been announced to resolve the propagation delay caused by inner product. However, these logics suffer from different difficulties. For example, domino logic [2] can not be non-inverting; NORA [3] has the charge sharing problem; all-N-logic [4] and robust single phase clocking [5] cannot operate correctly under clocks with short rise time or fall time, which can not be easily integrated with other part of logic design; single-phase logic [6] and Zipper CMOS [7] contain slow P-logic blocks. Complementary passtransistor logic (CPL) proposed by Yano et al. [8] is twice as fast as conventional CMOS, whereas it needs more area in silicon like the conventional CMOS due to the mixed interconnection. Moreover, noise margin and speed degradation caused by the mismatched input signal level and the logic threshold voltage of the CMOS driver needs to be taken into consideration when CPL is implemented. Though Zhang et al. proposed a so-call C<sup>2</sup>PL (complex CPL) and demonstrated that the problems of CPL are all fixed in [9], several physical design factors are not fully considered or implemented. First, the sizes of the NMOS transistors for pass logics are impossible to be minimal. Second, the driving inverters' sizes have to be properly tuned. Third, the original design of [9] not only

gives a poor fan-in and fan-out capability, but also produces very asymmetrical rise delay and fall delay which will very much likely cause glitch hazards and unwanted power consumption. In this paper, we propose an improved 3-2 compressor to resolve all the problems mentioned in the above. The HSPICE simulation results are presented to verify the correctness of our observation.

# 2. FRAMEWORK OF IMPROVED COMPRESSORS

# 2.1. Basic compressor building block design

A 3-2 compressor is basically a full adder. The feature of such a compressor is that the output represents the number of 1's given in inputs. The equations of a full adder are presented as follows:

$$\begin{array}{lcl} Sum & = & (A \oplus Cin)B' + (A \oplus Cin)'B = FB' + F'B \\ Cout & = & (A \oplus Cin)B + (A \oplus Cin)'C = FB + F'Cin \\ F & = & (A \oplus Cin) \end{array} \tag{1}$$

where F denotes (A  $\oplus$  Cin). As shown in Figure 1, the logic structure of a typical 3-2 compressor can be split up into two logic layers. One of the three inputs, B(B'), is not required in the first logic layer. The existence of unequal delays in the 3-2 compressor paves the way for us to reduce the total delay of inner product computation by arranging the input signals to the 3-2 compressors inside the compressor tree in a proper

## 2.2. Prior 3-2 compressor design

Though a 3-2 compressor can be realized by a full adder, and Zhang *et al.* [9] proposed a C<sup>2</sup>PL design for 3-2 and 7-3 compressors, several design issues as addressed above are still ignored in their work. Figure 2 and Figure 3 shows the schematic diagrams for the two types of 3-2 compressors based on complex complementary pass-transistor logic (C<sup>2</sup>PL) proposed in [9]. To test if these two 3-2 compressors have enough

This research was partially supported by Nation Science Council under grant NSC 88-2219-E-110-001 and 80-2215-E-110-014



Figure 1: 3-2 compressor building block

fan-out, we add an  $0.1~\rm pF$  capacitor at the output side and use HSPICE to perform simulations. The results are shown in Figure 4 and Figure 5.



Figure 2: Schematic of the prior 3-2 compressor I



Figure 3: Schematic of the prior 3-2 compressor II

It is obvious that Zhang's original design doesn't have enough driving capability. Hence it isn't suitable

to cascade or construct an inner product processor. However, the original design still possesses its advantages: First, these two 3-2 compressors are functionally correct. Second, the number of transistors is fewer than the traditional full adder. Third, these two 3-2 compressors don't contain two logic layers mentioned in Figure 1. We, thus, try to improve the original design to achieve three goals:

- Minimize the delay time to make a single 3-2 compressor as fast as possible.
- 2). Increase the fan-out to make the original design adequate to large loads.
- Decrease the number of transistors without malfunctioning.



Figure 4: Original 3-2 compressor I simulation (load:0.1 pF)



Figure 5: Original 3-2 compressor II simulation (load: $0.1~\mathrm{pF}$ )

# 2.3. Minimize the delay time

These two 3-2 compressors both have 3 inputs: "A", "B", and "Cin" (carry in). However, all the three

different inputs in Figure 2 and Figure 3 need special buffer because they all connect to the source of a transistor. It's necessary to tune the size of each buffer, especially "Cin" pin. Because the pins need to provide nearly ideal power source to drive the circuit. Besides, in order to analyze the relation of input and output more specifically, we choose the input vectors carefully to ensure that the output will be affected by the switching of only one input. Figure 6 is an example to show how we measure the delay. And we use HSPICE to measure the delay between each input and output, the delay measurement is tabulated in Table 1.



Figure 6: An example to show how we measure the delay of input signal "A" rise to output signal "Sum" rise and input signal "A" fall to output signal "Sum" fall

| <b>T</b>   | 2.0              | 1 TT              |
|------------|------------------|-------------------|
| Delay      | 3-2 compressor I | 3-2 compressor II |
| Ar2Sumr    | 0.32807          | 0.27349           |
| Ar2Sumf    | 0.40614          | 0.33903           |
| Ar2Coutr   | 0.36864          | 0.29813           |
| Af2Sumr    | 0.40947          | 0.31921           |
| Af2Sumf    | 0.42800          | 0.35410           |
| Af2Coutf   | 0.44482          | 0.39079           |
| Br2Sumr    | 0.33396          | 0.28400           |
| Br2Sumf    | 0.13140          | 0.14148           |
| Br2Coutr   | 0.32067          | 0.28124           |
| Bf2Sumr    | 0.19858          | 0.20984           |
| Bf2Sumf    | 0.35582          | 0.24719           |
| Bf2Coutf   | 0.34879          | 0.25178           |
| Cinr2Sumr  | 0.31832          | 0.31444           |
| Cinr2Sumf  | 0.40463          | 0.36233           |
| Cinr2Coutr | 0.37468          | 0.34597           |
| Cinf2Sumr  | 0.38056          | 0.41332           |
| Cinf2Sumf  | 0.38806          | 0.45531           |
| Cinf2Coutf | 0.40995          | 0.48952           |

Table 1: The delay time of different inputs (unit:ns)

# 2.4. Increase the fan-out

In our analysis of Zhang's 3-2 compressor, Zhang's original design is verified to possess poor fan-out. To overcome this shortage, the output buffers are re-tuned to make the compressor have an enhanced fan-out capability. The size measurement is given in Table 2

wherein column "invsum" and "invcout" reveal our result.

#### 2.5. Reduce transistor count

Figure 2 and Figure 3 both indicate that "Sum" and "Cout" pins are not required. Therefore, we can remove these two pins and transistors therewith. In our new design, each 3-2 compressor is 8 transistors less than the corresponding prior compressor. Consequently, the area of 3-2 compressors is reduced.

#### 2.6. Final result

We use TSMC (Taiwan Semiconductor Manufacturing Company)  $0.6 \mu m$  1P3M technology to re-design the 3-2 compressors, and the schematic diagrams for the improved 3-2 compressors are shown in Figure 7 and Figure 8. Table 2 shows the transistor size of the two 3-2 compressor, respectively.



Figure 7: Re-designed 3-2 compressor type I



Figure 8: Re-designed 3-2 compressor type II

Figure 9 and Figure 10 demonstrate the impressive simulation results of our new designs. These simulation results prove the fan-out is strengthened. Furthermore; the delay is reduced.

| Transistor   | 3-2 compressor I | compressor II |
|--------------|------------------|---------------|
| name         | (W/L)            | (W/L)         |
| inva PMOS    | 60/1.3           | 56.7/1.3      |
| inva NMOS    | 32.8/1.3         | 49.7/1.3      |
| invb PMOS    | 36.5/1.3         | 57.5/1.3      |
| invb NMOS    | 40.9/1.3         | 50/1.3        |
| invc PMOS    | 80/1.3           | 80/1.3        |
| invd NMOS    | 80/1.3           | 73.7/1.3      |
| invsum PMOS  | 23.7/1.3         | 17.8/1.3      |
| invsum NMOS  | 30/1.3           | 25.8/1.3      |
| invcout PMOS | 23.4/1.3         | 17.7/1.3      |
| invcout NMOS | 30/1.3           | 26.8/1.3      |
| p1           | 29.5/1.3         | 28.6/1.3      |
| p2           | 29.5/1.3         | 20/1.3        |
| р3           |                  | 1.8/1.3       |
| p4           |                  | 21.6/1.3      |
| n1           | 34/1.3           | 14.1/1.3      |
| n2           | 34/1.3           | 1.8/1.3       |
| n3           | 28.3/1.3         | 7.5/1.3       |
| n4           | 28.3/1.3         | 4.5/1.3       |
| n5           | 23.6/1.3         | 1.8/1.3       |
| n6           | 23.6/1.3         | 1.8/1.3       |
| n7           | 22/1.3           | 16.1/1.3      |
| n8           | 22/1.3           | 16.1/1.3      |

Table 2: Transistor size of compressor I and II



Figure 9: Simulations of improved 3-2 compressor type I (load:0.1 pF)



Figure 10: Simulations of improved 3-2 compressor type II (load:0.1 pF)

## 3. CONCLUSION

In this paper, an improved design of 3-2 compressor is presented. The improved 3-2 compressors are proposed to overcome several problems appearing in Zhang's work [9]. Our simulation results show that the improved 3-2 compressor is capable to drive large loads and the number of the transistor count is reduced. The improved design of 3-2 compressor will be a very solid cell to construct an inner product processor [1].

#### 4. REFERENCES

- C.-C. Wang, C.-J. Huang, and P.-M. Lee "A comparison of two alternative architectures of digital ratioed compressor design for inner product processing", IEEE International Symposium on Circuits and Systems, vol. I, pp.161-164, June 1999.
- [2] R. H. Krambeck, C. M. Lee, and H.-S. Law, "High-speed compact circuits with CMOS," IEEE J. Solid-State Circuits, vol. 17, pp. 614-619, June 1982.
- [3] N. F. Goncalves, and H. J. De Man, "NORA: A race-free dynamic CMOS technology for pipelined logic structures," *IEEE J. on Solid-State Circuits*, vol. 18, pp. 261-266, June 1983.
- [4] R. X. Gu, and M. I. Elmasry, "All-N-logic high-speed true-single-phase dynamic CMOS logic," IEEE J. on Solid-State Circuits, vol. 31, no. 2, pp. 221-229, Feb. 1996.
- [5] M. Afghahi, "A robust single phase clocking for low power high-speed VLSI application," *IEEE J.* of Solid-State Circuits, vol. 31, no. 2, pp. 247-253, Feb. 1996.
- [6] J. Yuan, and C. Svensson, "High-speed CMOS circuit technique," *IEEE J. on Solid-State Cir*cuits, vol. 24, pp. 62-70, Feb. 1989.
- [7] C. M. Lee, and E. W. Szeto, "Zipper CMOS," IEEE Circuits Devices Mag., pp. 10-16, May 1986.
- [8] K. Yano, T. Yamanaka, T. Nishida, M. Saito, K. Shimohigashi, and A. Shimizu, "A 3.8-ns CMOS 16x16-b multiplier using complementary pass-transistor logic," *IEEE J. on Solid-State Circuits*, vol. 25, no. 2, pp. 388-395, Feb. 1990.
- [9] D. Zhang, and M. I. Elmasry, "VLSI compressor design with applications to digital neural networks," *IEEE Trans. on VLSI Systems*, vol. 5, no. 2, pp. 230-233, June 1997.