# A Self-Disabled Sensing Technique for Content-Addressable Memories

Chua-Chin Wang, Senior Member, IEEE, Chia-Hao Hsu, Student Member, IEEE, Chi-Chun Huang, Student Member, IEEE, and Jun-Han Wu

Abstract—A low-power content-addressable memory (CAM) using a differential match line (ML) sense amplifier is proposed in this work. The proposed self-disabled sensing technique can choke the charge current fed into the ML right after the matching comparison is generated. Instead of using typical NOR/NAND-type CAM cells with the single-ended ML, the proposed novel NAND CAM cell with the differential ML design can boost the speed of comparison without sacrificing power consumption. In addition, the 9-T CAM cell with disabled read-out circuit provides the complete write, read, and comparison functions to refresh the data and verify its correctness before searching. The CAM with the proposed technique is implemented on silicon to justify the performance by using a standard 0.13- $\mu$ m complementary metal–oxide–semiconductor process. The energy consumption of the searching process is 1.872 fJ/bit/search.

*Index Terms*—Content-addressable memories (CAMs), match line sense amplifier (MLSA), NAND-type CAM cell, self-disabled.

### I. INTRODUCTION

**D** UE to the rapid expanding of networking demands, highspeed data search capability is an important issue for many applications, i.e., lookup tables, databases, associative computing, and data compression. However, high-speed data searching dissipates much energy to compare the input data, i.e., the key, with the stored data [1]. Many methods to compare the key with the stored data have been proposed, including [2]–[4]. The content-addressable memory (CAM) approach has been considered as a better solution than others in terms of cost and speed.

In literature, fully parallel comparison architectures of CAM might cause huge power consumption. One of the major power dissipation sources is caused by the charging of the match line (ML) in this kind of architectures. However, many techniques have been developed based on the NOR-type CAM to reduce the power consumption and attain high-speed demand. For instance, [5] proposed to limit the swing voltage of MLs to reduce the power consumption of MLs. Particularly, in [6], a precomputation-based CAM technique was employed to reduce

The authors are with the Department of Electrical Engineering, National Sun Yat-Sen University, Kaohsiung, Taiwan 80424 (e-mail: ccwang@ee.nsysu. edu.tw).

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TCSII.2009.2037995



Fig. 1. Typical NAND-type CAM.

the number of activated MLs. Arsovski and Sheikholeslami [7] proposed a mismatch-dependent technique to allocate less power in the mismatched MLs to reduce the static power. Then, the power dissipation during the searching process will efficiently be reduced, and the searching speed will be kept.

Not only was the NOR-type CAM adopted to construct the CAM cell but a NAND-type CAM was also used in a serial ML switch structure in [8]. However, the searching speed of the NAND-type CAM was found to be slower than that of the NOR-type CAM. Nevertheless, the performance of these prior techniques encounters a limitation. That is, the power and speed will be dominated by the "match" or "mismatch" of a dummy word circuit. The consequence is unavoidable power wastage during the search process after comparison.

Two prior typical CAM designs, i.e., the NAND and NOR types, are shown in Figs. 1 and 2, respectively. Block M is a memory cell to store the data bit, which can be replaced by any type of memory cell. For instance, it is composed of two cross-coupled inverters, such as the SRAM cell. Moreover, no matter what the state of the MLs ( $ML_A_i$  and  $ML_O_i$ ) is, the ML sense amplifiers (MLSAs) sense the voltage change on the MLs to resolve the comparison result of each word. Obviously, the NAND-type CAM cell consumes the lower power and longer searching time, because the ML is discharged through a long transistor chain. By contrast, the power consumption of the NOR-type CAM cell is higher than that of the NAND-type CAM cell at the cost of high speed. The discharge paths in the NOR-type CAM cell are all parallel such that the speed is fast at the expense of high power consumption.

Two typical MLSAs are used in the prior NAND- and NOR-type CAMs, as shown in Figs. 1 and 2.

1) *Precharging-Type MLSA:* The precharging operation is executed before comparison. MLSA with the precharging

Manuscript received June 30, 2009; revised October 6, 2009. First published January 8, 2010; current version published January 15, 2010. This work was supported in part by the National Science Council under Grant NSC 96-2628-E-110-19-MY3 and in part by the National Health Research Institutes under Grant NHRI-EX99-9732EI. This paper was recommended by Associate Editor T. Zhang.



Fig. 2. Typical NOR-type CAM.

circuit charges the ML (e.g., ML\_A<sub>i</sub> and ML\_O<sub>i</sub>, respectively, in Figs. 1 and 2) or a capacitor before comparison and then senses the voltage drop on ML to generate the comparison result. This kind of MLSAs can easily reduce the power consumption by limiting the voltage precharged in the precharging circuit. However, the precharging design is suffered from the charge-sharing effect among CAM cells.

2) Charging-Type MLSA: The other type of MLSA charges the ML during the searching process. This type of MLSA can speed up the searching process by increasing the charging current. However, the penalty is that the charging current will consume additional power until the end of searching process, even if the match result has been determined.

This brief proposed a self-disabled sensing technique to reduce power consumption by choking the charging current. Not only is the choking current design carried out in this work but a read-out circuit is also added to the CAM to verify the correctness of the stored data before searching. By comparison, the performance of the proposed CAM is significantly lower than that of the prior investigations. The minimum power reduction, compared with that of several prior works, is 15.827%.

## II. CAM USING SELF-DISABLED SENSING

Fig. 3 shows the proposed CAM architecture, where block C and block DMLSA denote the CAM cell and differential MLSAs (DMLSAs), respectively. The prototype CAM is 128 words  $\times$  32 bits. The Search Word Register loads the search key and feeds it into all the CAM cells. Each of the DMLSA charges the ML and senses the voltage variation to generate the match signal, which is sent to the Address Encoder. In general, there is only one word or no match with the search key to enable the Address Encoder to generate the corresponding address code or a no-match signal after the searching process. The details of these blocks will be revealed later in the following section.

## A. 9-T CAM Cell With Decoupled Read-Out Circuit

Fig. 4 shows the proposed 9-T CAM cell, which is composed of a typical 6-T SRAM cell and a compare circuit. Notably,

every 9-T CAM is accompanied by a decoupled read-out circuit. The compare circuit is a metal–oxide–semiconductor switch of the current path between ML $\langle i \rangle$  and SML $\langle i \rangle$ , which will be discussed in the following section. The decoupled read-out circuit is triggered to verify the correctness of the data bit QB stored in the CAM cell before the searching process. In order to reduce power dissipation, the supply voltage of the proposed CAM cell is dropped to 1 V. (The normal supply voltage of typical 0.13  $\mu$ m process is 1.2 V.) However, low voltage supply causes many problems, e.g., poor static noise margin (SNM), poor writing capability, limited number of cells per bit-line, and poor bit-line sensing margin of the memory cell. Thus, the decoupled readout circuit is needed to resolve the aforementioned problems to ensure the data readability. The operation of the proposed CAM cell is described here.

- Before the read operation: RWL(*i*) is logic 0. MRP is turned on to precharge the node RL. MR1 and MR3 are off to keep RBL(*i*) stable. In other words, the current path via MR2 and MR3 is shut off. RBL(*i*) is isolated from QB.
- 2) During the read operation: The "read" operation is proceeded when RWL $\langle i \rangle$  is pulled up to logic 1. MRP is turned off. MR3 is turned on, and the state of RL is the same as node Q. For instance, when QB = 1, RL will be discharged by MR2 and MR3 such that RL = logic 0 will appear at RBL $\langle i \rangle$  through MR1. If QB = 0, MR2 is turned off, and RBL $\langle i \rangle$  = RL = 1 through MR1.

Thus, SNM during the read operation is retained such that the cell node is decoupled from the RBL $\langle i \rangle$ . Since the SNM, i.e., the bit-line sensing margin of the read-out function and the DMLSA, is maintained, the supply voltage of the proposed CAM can be reduced more than 20% (The 1.2 V supply voltage is dropped down to 1.0 V) to save the power consumption. The energy for per search (EfS) and area of the decoupled readout circuit and the proposed 9-T CAM cell are found to be 0.688 (fJ/bit/search) and 2.90 × 6.97  $\mu$ m<sup>2</sup>, respectively, by allprocess, voltage and temperature (PVT)-corner simulations.

## B. Differential NAND CAM

To resolve the design dilemma of the prior NOR-type and NAND-type CAMs described in Section I, we propose a differential NAND-type CAM, as shown in Fig. 5. Notably,  $MS_i$  is turned on only if  $BL\langle i\rangle$  and Q are logically opposite. That is,  $SML\langle i\rangle$  will be charged by  $ML\langle i\rangle$  when the search key is opposite to the corresponding bit of the word. The voltage drop between  $ML\langle i\rangle$  and  $SML\langle i\rangle$  will be sensed by the DMLSA. The speed of the comparison will be fastened by parallel charging paths. Most important of all, the dc grounding path is removed to reduce the static power consumption.

## C. DMLSA

The detailed schematic of the proposed DMLSA for our differential NAND-type CAM is shown in Fig. 6. The DMLSA senses the voltage on the ML $\langle i \rangle$  and SML $\langle i \rangle$  to tell if the word is "match" or "mismatch," and then automatically disables the charge path to save the power. Notably, a reset



Fig. 3. Architecture of the proposed CAM.



Fig. 4. 9-T CAM cell and the associated decoupled read-out circuit.



Fig. 5. Proposed differential NAND CAM cell.

signal  $\overline{\text{SEARCH}_{EN}}$  will set the DMLSA into an initial state, where  $\text{ML}\langle i \rangle = \text{SML}\langle i \rangle = 0$  and SP = 0 before the searching process. The detailed operation of DMLSA in the searching process is described here.

1) "*Mismatch*": Before the searching process SP = 0, SEARCH = SEARCH\_EN is pulled to high at the beginning of the searching process. Then, MN1 is turned on to charge the ML $\langle i \rangle$  such that KP will be discharged but not totally pulled down to 0. If there is any "mismatch"







Fig. 7. Postlayout simulation results of DMLSA, given a "mismatch."

CAM cell,  $MS_i$  is turned on to make a current path between  $ML\langle i \rangle$  and  $SML\langle i \rangle$  such that  $SML\langle i \rangle$  will be charged by  $ML\langle i \rangle$ . When the voltage of  $SML\langle i \rangle$  is high enough to turn off MP3, the voltage of KP will be pulled down such that MATCHB is equal to logic 1, indicating the comparison result of the word = "mismatch". By two feedback paths, MATCHB turns MN3 on and MP1 off, respectively, such that the current path of MP1 is shut off to choke the charge current of  $ML\langle i \rangle$  and SP is discharged via MN3 to turn off MN1. The former constitutes a positive loop from MATCHB to KP through MN3 and MN2, which more quickly pulls down KP. Therefore, the power consumption is reduced after the searching process. The simulation of this "mismatch" scenario is revealed in Fig. 7.



Fig. 8. Postlayout simulation results of DMLSA, given all "match."



Fig. 9. Die photo of the proposed CAM.

2) "Match": If all of the CAM cells are "match," ML(i) and SML(i) are isolated without any current path. The voltage difference between ML(i) and SML(i) creates an output current of the differential pair (MP2 and MP3) to charge the KP and SP. As soon as KP is charged to high, MATCHB becomes logic 0, indicating that the comparison is a "match." After the SP is raised to high, SEARCH will equal to logic 0 and turn off MN1 to choke the charge current to ML(i). The simulation of this "match" scenario is shown in Fig. 8.

In short, the charge current to  $ML\langle i \rangle$  will be choked after the result of comparison has been decided, regardless what the result is.

## **III. IMPLEMENTATION AND MEASUREMENT**

Taiwan Semiconductor Manufacturing Company standard 0.13 1P6M  $\mu$ m CMOS technology is used to carry out the proposed differential NAND-type CAM. Fig. 9 shows the die photo of the proposed CAM. By on-chip measurements, when the mismatch count is increased, the searching time and power consumption will gradually be reduced, as shown in Figs. 10 and 11, respectively.

Fig. 12 shows the measurement waveforms by using the Angilant 93000 SOC Test System to justify the correctness of our design. The stored data have previously been written into the CAM array. Work\_mode\_0\_ and Work\_mode\_1\_ are the control signals to select the search or the write/read operation. If Work\_mode\_0\_ = 0 and Work\_mode\_1\_ = 0, the search key is input to Data\_in. When Work\_mode\_0\_ and Work\_mode\_1\_ are both high at the same time, the key will be compared to the



Fig. 10. Searching time versus mismatch bit number.



Fig. 11. Power versus mismatch bit number.



Fig. 12. Measurement result of the proposed CAM.

CAM data bits. If the key is matched with any word, Match is equal to 1. On the other hand, Match will become 0. The power consumption of the prototype chip is 6.806 mW at a clock = 250 MHz.

The performance comparison of the proposed CAM design with several prior designs is shown in Table I. The proposed CAM has the best normalized EfS and the second best in terms of a figure of merit. Notably, the proposed CAM can be operated at 1.0 V, which is lower than the standard supply voltage (1.2 V) of the typical 0.13- $\mu$ m CMOS process. It also meets the recent demand of low-voltage design.

|                       | proposed | [11]     | [12]     | [13]     | [7]      | [10]     |
|-----------------------|----------|----------|----------|----------|----------|----------|
| Process (µm)          | 0.13     | 0.35     | 0.13     | 0.13     | 0.25     | 0.25     |
| Supply voltage (V)    | 1.0      | 3.3      | 1.2      | 1.2      | 2.5      | 2.5      |
| Frequency (MHz)       | 250      | 100      | 400      | 200      | 260      | 300      |
| Search time (ns)      | 0.9      | 3.9      | 0.337    | 0.298    | 3.8      | 2.1      |
| Energy for per search | 1.872    | 93       | 49.62    | 7.50     | 17.12    | 13.9     |
| (EfS)(fJ/bit/search)  | @250 MHz | @100 MHz | @400 MHz | @200 MHz | @200 MHz | @300 MHz |
| Normalized EfS        | 1.872    | 8.540    | 34.458   | 5.208    | 2.752    | 2.224    |
| Figure of Merit (FOM) | 1.685    | 33.306   | 11.612   | 1.552    | 10.458   | 4.670    |
| Year                  | 2009     | 2008     | 2007     | 2007     | 2005     | 2004     |

TABLE I Comparison With Prior Work

\* Normalized EfS = EfS / Supply voltage<sup>2</sup>

FOM = Search time  $\times$  Normalized EfS

### **IV. CONCLUSION**

A low-power self-disable sensing technique for CAMs has been proposed in this brief. By using the proposed choking current method to reduce the unnecessary dc currents, the power consumption is significantly reduced. Moreover, the comparison process has been accelerated by a positive loop to reduce the unwanted power dissipation. In addition, the decoupled read-out circuit has verified the data before searching, even if the supply voltage is reduced for the sake of power saving.

### ACKNOWLEDGMENT

The authors would like to thank CIC of NSC for their thoughtful chip fabrication service.

#### REFERENCES

- K. Pagiamtzis and A. Sheikholeslami, "Content-addressable memory (CAM) circuits and architectures: A tutorial and survey," *IEEE J. Solid-State Circuits*, vol. 41, no. 3, pp. 712–727, Mar. 2006.
- [2] T. Oliver, L. Y. Yeow, and B. Schmidt, "High performance database searching with HMMer on FPGAs," in *Proc. IEEE Int. Parallel Distrib. Process. Symp.*, Mar. 2007, pp. 1–7.
- [3] I. R. Draganov, A. A. Popova, and L. L. Ivanov, "Multilingual names database searching enhancement," in *Proc. IEEE Int. Symp. Signal Process. Inf. Technol.*, Dec. 2008, pp. 474–479.

- [4] M. El Baraji, V. Javerliac, and G. Prenat, "Towards an ultra-low power, high density and non-volatile ternary CAM," in *Proc. 9th Annu. Non-Volatile Memory Technol. Symp.*, Dec. 2008, pp. 1–7.
- [5] H. Miyatake, M. Tanaka, and Y. Mori, "A design for high-speed lowpower CMOS fully parallel content-addressable memory macros," *IEEE J. Solid-State Circuits*, vol. 36, no. 5, pp. 956–968, Dec. 2001.
- [6] C.-S. Lin, J.-C. Chang, and B.-D. Liu, "A low power precomputationbased fully parallel content-addressable memory," *IEEE J. Solid-State Circuits*, vol. 38, no. 4, pp. 654–662, Apr. 2003.
- [7] I. Arsovski and A. Sheikholeslami, "A mismatch-dependent power allocation technique for match- line sensing in content-addressable memories," *IEEE J. Solid-State Circuits*, vol. 38, no. 11, pp. 1958–1966, Nov. 2003.
- [8] B. D. Yang and L. S. Kim, "A low-power CAM using pulsed NAND-NOR match-line and charge-recycling search-line driver," *IEEE J. Solid-State Circuits*, vol. 40, no. 8, pp. 736–1744, Aug. 2005.
- [9] T. H. Kim, J. Liu, J. Keane, and C. H. Kim, "A 0.2 V, 480 kb subthreshold SRAM with 1 k cells per bitline for ultra-low-voltage computing," *IEEE J. Solid-State Circuits*, vol. 43, no. 2, pp. 518–529, Feb. 2008.
- [10] K.-H. Cheng, C.-H. Wei, and S.-Y. Jiang, "Static divided word matching line for low-power content addressable memory design," in *Proc. IEEE Int. Symp. Circuits Syst.*, May 2004, vol. 2, pp. 629–632.
- [11] S. J. Ruan, C. Y. Wu, and J. Y. Hsieh, "Low power design of precomputation-based content-addressable memory," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 16, no. 3, pp. 331–335, Mar. 2008.
- [12] X. Yang, S. Sezer, J. McCanny, and D. Burns, "A versatile content addressable memory architecture," in *Proc. IEEE Int. SOC Conf.*, Mar. 2007, pp. 215–218.
- [13] X. Yang, S. Sezer, J. McCanny, and D. Burns, "Novel content addressable memory architecture for adaptive systems," in *Proc. 2nd NASA/ESA Conf. Adapt. Hardw. Syst.*, Aug. 2007, pp. 633–640.