

# Design of an Inter-plane Circuit for Clocked PLAs

CHUA-CHIN WANG\*, YA-HSIN HSUEH, YU-TSUN CHIEN and YING-PEI CHEN

Department of Electrical Engineering, National Sun Yat-Sen University, Kaohsiung 80424, Taiwan

(Received 1 May 2000; Revised 16 March 2001)

Since the Programmable Logic Arrays (PLAs) can implement almost any Boolean function, they have become popular devices in realization of both combinational and sequential circuits. We present a power-saving fast half swing CMOS circuit implementation for NOR–NOR PLA implementation. An additional 1/2 VDD voltage source and buffering transmission gates are inserted between the NOR planes to erase the racing problem and shorten the rise delay as well as the fall delay of the output response such that the speed is enhanced and the dynamic power is reduced. Detailed simulation results reveal appropriate L/W guidelines. The analysis of effects of the 1/2 VDD on power and speed is also provided in this work.

Keywords: NOR-NOR PLA; Power-saving; Half-swing CMOS; High speed; Dynamic logic; VLSI

## INTRODUCTION

Prior works to improve PLA circuits are mainly focused on speed and power by using many alternatives, e.g. erasing ground switch, NAND gate buffering, or reducing static current [2–7]. An important fact that has been long ignored is that one of the largest state transition in a PLA is the switching of the load between the first plane and the second plane. Owing to the large wire load induced on this inter-plane connection, many syndromes will occur in different design style for the Programmable Logic Arrays (PLAs). All of the problems result from the same reason. That is, the slow state transition at the output of the first plane and the inter-plane wire load. We introduce a novel design composed of an extra 1/2 VDD source with a PMOS transistor, a transmission gate, and a NMOS transistor to resolve the aforementioned difficulty.

## CLA DESIGN USING HALF-SWING FAST PLA

In this section, the details of the proposed half-swing fast PLA design methodology will be disclosed followed by its applications to design a 8-bit CLA.

## General Prior PLA Circuits

Referring to Fig. 1, which is a general architecture of prior PLAs, the slow response at the output of the first plane is

the major reason why the entire PLA either operates slowly or functions incorrectly. Prior works regarding the speed enhancement and power-saving, e.g. [1–5], are all focused on using different gates between the two planes. Hopefully, the state transition of the inter-plane wire load can be fastened. Other considerations to improve the performance of PLAs include adding low overhead IDDQ testability [8,9]. Nevertheless, all of the mentioned improved circuits unavoidably introduces either the full swing charging time or discharging time.

## Half-swing Inter-plane Circuit

A simple thought to reduce the state transition time at the inter-plane wire load is to precharge the output of the first plane to be 1/2 VDD in the precharge (or pre-discharge) duration. Thus, technically speaking, we can approximately reduce the subsequent rise delay or fall delay in the evaluation duration to one half of the original delay. As shown in Fig. 2, an extra 1/2 VDD power source is introduced accompanied with two cascaded inverters (i.e. the delay buffer) and one transmission gate.

The entire operation of the proposed circuit is described as follows.

a. When clk = 0, node s, t, p and q, are, respectively, charge to VDD, GND, VDD, and 1/2 VDD. Note that the voltage at node q is kept at 1/2 VDD in the precharge duration is owing to the OFF state of the

<sup>\*</sup>Corresponding author. Tel.: +886-7-5252000. Ext. 4144. Fax: +886-7-5254199. E-mail: ccwang@ee.nsysu.edu.tw



FIGURE 1 Prior NOR-NOR PLA.

transmission gate, N1 and P1. The transmission gate composed of N1 and P1 is controlled by  $\overline{clk}$ . Hence, in the precharge (or pre-discharge) phase, the transmission is OFF such that p and q are isolated.

- b. When clk turns high, the N-block-1 proceeds its own evaluation process while the N1-P1 transmission gate is turned ON. Regardless what the outcome is at the output of the first plane, the voltage at node q is either pulled up from 1/2 VDD to full VDD or pulled down from 1/2 VDD to GND. Obviously, the delay of the output response of the first plane will be much faster than any full swing dynamic logic.
- c. Although the speed enhancement by the proposed half-swing circuit is achieved, another power dissipation problem must be resolved at the same time. Note that if the proposed circuit is applied to a design style of which the second plane does not have a clockcontrolled transistor, e.g. pseudo-NMOS logic, the precharged voltage at node q might result in a DC path in the second plane composed of the P3 and the

- N-block-2. Hence, we need to add a clock-controlled NMOS between N-block-2 and GND such that the DC current path will not be created in the precharge duration
- d. A simple observation of the proposed inter-plane half-swing circuit is that if INV1 and INV 2 are usual inverters, the circuit still functions properly. However, if the evaluation result of N-block-1 is "stop," the voltage of s, t, and p stay the same, and q is pulled up to a full VDD. The response time is much faster than that of a pull down operation for q. The reason is that if the evaluation result of N-block-1 is "pass," then s must be pulled down, t is pulled up, and p is pulled down. Then q will be pulled down. This simple fact reveals that the pull down of the output of the first plane is a longer process. In order to fix this problem, the sizes of INV1 and INV2 should be adjusted. A proposed size ration is that INV1 possesses a large pull-up PMOS and a small pulldown NMOS, while INV2, on the contrary, has a



FIGURE 2 Inter-plane half-swing circuit.



FIGURE 3 Detailed schematic of the proposed circuit.

small pull-up PMOS and a large pull-down NMOS, as shown in Fig. 3.

## Analysis of Speed and Power

## Speed

The speed of the dynamic style PLA depends on the discharging speed of nodes q and r. The inter-plane half-swing circuit helps to pump node q to 1/2 VDD be during the precharging duration. It, in turn, reduces both the charging time (rise delay) and discharging time (fall delay). Meanwhile, there is no racing problem such that the delayed clock can be eliminated to improve the speed.

## Power

First, because of the insertion of N2 transistor, there is no DC path from VDD to ground. Second, the dynamic power dissipation employing our circuit is less than the prior PLAs based upon the following derivation. The switching probability of the NOR-NOR like PLA with *n* inputs is given in Table I.

Notably, the dynamic power dissipation is estimated as the following equation.

$$P_{\rm d} = \left(\sum P_i C_i (\Delta V_i^2)\right) f \tag{1}$$

where  $P_i$ ,  $C_i$ ,  $\Delta V_i$ , respectively, denotes the switching probability, capacitance, and voltage difference at node i, and f is the frequency. Hence, the dynamic power

TABLE I Switching activity comparison of the NOR-NOR like PLA

| I/Ps  | Probability             |
|-------|-------------------------|
| All 0 | 1/2"                    |
| Any 1 | $\frac{2^{n}-1}{2^{n}}$ |

consumption of prior NOR-NOR PLA is

$$P_{d\_nor-nor} = \frac{2^{n} - 1}{2^{n}} (C_{s} + C_{t} + C_{p} + C_{q}) V D D^{2} f$$

$$+ \frac{1}{2^{n}} (C_{s} + C_{t} + C_{p} + C_{q}) \cdot 0 \cdot f$$

$$= \frac{2^{n} - 1}{2^{n}} (C_{s} + C_{t} + C_{p} + C_{q}) V D D^{2} f \qquad (2)$$

$$P_{d\_ourpla} = \frac{2^{n} - 1}{2^{n}} (C_{s} + C_{p} + C_{q}) V D D^{2} f$$

$$+ \frac{2^{n} - 1}{2^{n}} C_{q} \left(\frac{1}{2} V D D\right)^{2} f + \frac{1}{2^{n}} (C_{s} + C_{t} + C_{p}) \cdot 0 \cdot f + \frac{1}{2^{n}} C_{q} \left(\frac{1}{2} V D D\right)^{2} f$$

$$= \frac{2^{n} - 1}{2^{n}} (C_{s} + C_{t} + C_{p}) V D D^{2} f$$

$$+ \frac{1}{4} C_{q} V D D^{2} f \qquad (3)$$

Note that the switching activity of other PLAs besides domino PLA is the same as that of NOR-NOR PLA. Thus, we conclude the comparison of between ourpla and NOR-NOR PLA according to the definition of dynamic power cost.

$$\lim_{n \to \infty} \frac{P_{\text{d\_nor}-nor}}{P_{\text{d\_ourpla}}} \approx 1 + \frac{\frac{3}{4}C_q}{C_s + C_t + C_n + \frac{1}{4}C_q}$$
(4)

TABLE II The average delay of the first plane output of different PLA designs (N2 does not exist in this series of simulations; unit = ns)

| Name     | Pseudo-N | NOR-NOR | Domino | Dhong's |
|----------|----------|---------|--------|---------|
| Original | 140.3    | 28.0    | 25.0   | 28.0    |
| Ours     | 97.0     | 8.0     | 13.0   | 8.0     |



FIGURE 4 Pseudo-N + ours PLA.





FIGURE 5 Dynamic NOR-NOR + ours PLA.





FIGURE 6 Domino + ours PLA.





FIGURE 7 Dhong's + ours PLA.



FIGURE 8 Output waveforms of Pseudo and Pseudo  $\pm$  ours.



FIGURE 9 Output waveforms of Dynamic and Dynamic + ours.

TABLE III The average delay of the second plane output of different PLA designs

|                 | Delay (ns) | Vout, ma |
|-----------------|------------|----------|
| Pseudo-N        | 82.4       | 5.0      |
| Pseudo-N + ours | 51.1       | 5.0      |
| NOR-NOR         | 45.4       | 5.0      |
| NOR-NOR + ours  | 25.1       | 4.93     |
| Domino          | 28.0       | 5.0      |
| Domino + ours   | 16.0       | 4.14     |
| Dhong's         | 30.0       | 2.50     |
| Dhong's + ours  | 8.0        | 2.44     |
|                 |            |          |

TABLE IV  $\;$  The average delay of the second plane output of different PLA designs without delayed clock

|                | Delay (ns) | Vout, max |
|----------------|------------|-----------|
| NOR-NOR        | 23.4       | 3.93      |
| NOR-NOR + ours | 25.1       | 4.93      |
| Dhong's        | 10.0       | 1.79      |
| Dhong's + ours | 8.0        | 2.44      |



FIGURE 10 Output waveforms of Domino and Domino + ours.



FIGURE 11 Output waveforms of Dhong's and Dhong's + ours.

TABLE V  $\;$  The power dissipation of different PLA designs (N2 is added in this series of simulation)

| COOP, OWN BOOK  | Power (mW) | Delay (ns) | Vout, max |
|-----------------|------------|------------|-----------|
| Pseudo-N        | 0.5511     | 82.4       | 5.0       |
| Pseudo-N + ours | 0.4628     | 46.0       | 5.0       |
| Domino          | 0.2088     | 28.0       | 5.0       |
| Domino + ours   | 0.1762     | 22.0       | 4.7       |
| NOR-NOR         | 0.2603     | 45.4       | 5.0       |
| NOR-NOR + ours  | 0.1557     | 25.1       | 4.93      |
| Dhong's         | 0.1732     | 30.0       | 2.50      |
| Dhong's + ours  | 0.1406     | 8.0        | 2.44      |

Since  $C_q$  is the wire load of the first plane output,  $C_q \gg C_s + C_t + C_p$ . This fact makes Eq. (4) approach the following result.

$$\lim_{n \to \infty} \frac{P_{\text{d\_nor-nor}}}{P_{\text{d\_ourpla}}} \approx 1 + 3 \gg 1.$$
 (5)

The above conclusion predicts the power cost of our PLA decreases as the number of input increases. Not only is the proposed half-swing inter-plane circuit faster, but also it consumes less power.

## SIMULATION AND ANALYSIS

## Speed (Delay) Simulations

In order to verify the proposed low-power high-speed PLA configuration, we conduct a series of different PLAs' simulations to compare with other PLA designs as shown in Figs. 4–7.

Different PLA designs are implemented by TSMC  $0.6 \,\mu \text{m}$  1P3 M technology with PMOS (w/l = 2.25/0.6) and NMOS (w/l = 0.9/0.6) except that the PMOS load used in pseudo-NMOS PLA is ratioed to be w/l =0.9/1.2. Notably, P1 is w/l = 7.2/0.6 while N1 is w/l =3.6/0.6. As for the INV1 and INV2, the pull-up PMOS of INV1 is w/l = 6.0/0.6, the pull-down NMOS 0.9/0.6, while the pull-down NMOS of INV2 is w/l = 3.6/0.6, the pull-up PMOS 6.0/2.3. Figures 8-11 show the timing responses of these PLA, while the first plane's load is assumed to be 0.5 pF, that of the ground switch is assumed to be 1.0 pF, the load of the buffers is assumed to be 2.0 pF, and the output load of these PLAs is set to be 1.0 pF. The waveforms in Figs. 8-11 are simulated by CADENCE and HSPICE tools with  $VDD = 5.0 \,\mathrm{V}$ . The average delay of the first plane output of these PLAs are tabulated in Table II. The delay is measured from 90% of the input voltage change to 90% of output voltage change due to the proposed half-swing mode. Table II shows the speed performance of different PLAs.

Our proposed inter-plane half-swing circuit indeed speeds up the response time for all of prior PLA design approaches. Then, we need to compare the delay of the response at the output of the second plane. Note that the second plane should provide a full swing output. Hence, the delay is measured from the 50% of the input voltage to the 50% of the output voltage. Besides, the dynamic NOR-NOR and the Dhong's PLA requires a delayed clock.

After several simulations, the minimal delay of such a clock is 22 ns. We, thus, add such a delayed clock in the following simulation and the speed performance of the second plane is given in Table III.

If there is no delayed clock for dynamic NOR-NOR and Dhong's PLA, their respective simulation results are given in Table IV. The original dynamic NOR-NOR and Dhong's PLA will provide incorrect outputs, but our circuit will not. Notably, Dhong's design is a normally low operation which is different from the other designs. During the precharge period, the output of Dhong's is low. Thus, the critical delay of Dhong's design is the rising edge delay instead of the falling edge delay.

## **Power Dissipation Simulations**

As for the power consumption comparison, we also conduct a series of simulations which employ Monte Carlo method of HSPICE. The number of sweeps is 1000, and the signal frequency is 2.50 MHz (clock period = 400 ns). The power dissipation results are tabulated in

Table V. The proposed inter-plane half-swing circuit produces less power consumption regardless what type of PLAs. These results correspond to what we expect regarding dynamic power consumption when *n* increases.

#### CONCLUSION

The proposed inter-plane half-swing circuit configuration, using one transmission gate and extra 1/2 VDD source between the product line and output line instead of a buffer or an inverter, can eliminate the ground switch, increase the response speed, and reduce power consumption. It also keeps the inputs of the second plane at a "stop" status before the evaluation phase to prevent the racing problem and the usage of delayed clocks.

## Acknowledgements

This research was partially supported by National Science Council under grant NSC 87-2215-E-110-010 and NSC 88-2219-E-110-001.

## References

- [1] Afghahi, M. (1996) "A robust single phase clocking for low power, high-speed VLSI applications", *IEEE J. Solid-State Circuits* 31(2), 247-253
- [2] Blair, G.M. (1992) "PLA design for single-clock CMOS", IEEE J. Solid-State Circuits 27(8).
- [3] Dhong, Y.B. and Tsang, C.P. (1992) "High speed CMOS POS PLA using predischarged OR array and charge sharing AND array", *IEEE Trans. Circuits Syst.-II: Analog Digital Signal Process.* 39(8), 557–564
- [4] Wang, C.-C., Wu, C.-F., Hwang, R.-T. and Kao, C.-H. (1997) "A low-power and high-speed dynamic PLA circuit configuration for single-plead CMOS", 1002 New Comput. Suppl. (MCS) 2712, C. 57, C. 62.
- clock CMOS", 1997 Natl Comput. Symp. (NCS'97) 2, C-57-C-62.
   [5] Goncalves, N.F. and De Man, H.J. (1983) "NORA: a race-free dynamic CMOS technology for pipelined logic structures", IEEE J. Solid-State Circuits 18, 261-266.
- [6] Mai, K.W., Mori, T., Amrutur, B.S., Ho, R., Wilburn, B., Horowitz, M.A., Fukushi, I., Izawa, T. and Mitarai, S. (1998) "Low-power SRAM design using half-swing oulse-mode techniques", *IEEE J. Solid-State Circuits* 33(11), 1659–1671.
- [7] Weste, N.H.E. and Eshraghian, K. (1993) Principles of CMOS VLSI Design—A Systems Perspective, 2nd Ed. (Addison-Wesley, Reading, MA).
- [8] Wu, C.-F., Wang, C.-C., Hwang, R.-T. and Kao, C.-H. (1999) "Dynamic NOR-NOR PLA design with IDDQ testability", Int. J. Eletronics 86(1), 78-85.
  [9] Wu, C.-F., Wang, C.-C., Hwang, R.-T. and Kao, C.-H. (1997) "IDDQ
- [9] Wu, C.-F., Wang, C.-C., Hwang, R.-T. and Kao, C.-H. (1997) "IDDQ testable configuration for PLAs by transformation into inverters", 7th Int. Symp. IC Technol. Systems Appl. (ISIC-97), 398–401.

## **Authors' Biographies**

Chua-Chin Wang was born in Taiwan, in 1962. He received the BS degree in Electrical Engineering from National Taiwan University, Taiwan, in 1984 and the MS and PhD degrees in Electrical Engineering from State University of New York, Stony Brook, in 1988 and 1992, respectively. Currently he is a Professor in the Department of Electrical

CLOCKED PLAS

Engineering, National Sun Yat-Sen University, Taiwan. His research interests include low-power logic and circuit design, VLSI design, and neural networks implementations.

Ya-Hsin Hsueh was born in Taiwan, in 1976. She received BS and MS degree in Eletrical Engineering from National Sun Yat-Sen University, Taiwan, in 1998 and 2000, respectively. She is currently working toward the PhD degree in Electrical Engineering at National Sun Yat-Sen University. Her current research interests are VLSI design and interfacing I/O circuits.

Yu-Tsun Chien was born in Taiwan, in 1975. He received BS and MS degree in Eletrical Engineering from National

Sun Yat-Sen University, Taiwan, in 1998 and 2000, respectively. He is currently working in Electronics Research and Service Organization of Induatrial Technology Research Institute. His current research interests are VLSI design, and analog circuits.

Ying-Pei Chen was born in Taiwan, in 1974. She received her BS degree (1998) and MS degree (2000) in Computer Science and Information Engineering, respectively, from Tamkang University, Taipei, and from National Sun Yat-Sen University, Kaohsiung, Taiwan. She is currently working in VIA Technologies, Taipei, Taiwan. Her major research interests include design analog circuits and VLSI design.