# Ultra Low Power Single-ended 6T SRAM Using 40 nm CMOS Technology\*

Chua-Chin Wang

Department of Electrical Engineering National Sun Yat-Sen University Kaohsiung, Taiwan 80424 Email: ccwang@ee.nsysu.edu.tw I-Ting Tseng Department of Electrical Engineering National Sun Yat-Sen University Kaohsiung, Taiwan 80424 Email: tzengitim@vlsi.ee.nsysu.edu.tw

*Abstract*—An ultra low power SRAM cell design is proposed in this investigation. The supply voltage of the SRAM is gated by wordline (WL) enable to select the corresponding supply voltage. If the WL of the cell is not asserted, a lower voltage is selected to keep the status of the stored bit such that the entire standby power is reduced. By contrast, as soon as the WL of the cell is enabled to execute either read or write (R/W), the normal supply voltage will be activated to proceed the R/W operation. Theoretical derivation as well as all-PVT-corner simulations are provided to verify the functional correctness and performance. A 1 kb SRAM design based on the proposed cell with BIST and PDP (power-delay production) reduction circuit is demonstrated to show the energy/access is as low as 0.034 pJ, which is by far the best to date.

*Index Terms*—SRAM, single-ended, supply voltage selection, PDP reduction, read voltage boost

### I. INTRODUCTION

Statistically, the operation time of memory devices is ranked the second next to CPU or microprocessors in electronics products. It is, therefore, the reduction of power dissipation of memory devices benefits the overall operation time of the entire product, particularly to those battery-operated ones [1]. SRAM has been widely used as cache in CPUs such that it is critically power sensitive. The reasons are it is a memory component very frequently accessed by CPU operations, e.g., load and store. Many SRAM designs were reported in last 2 decades. A 4-T loadless SRAM was proposed for the lower power demanding SRAM [2], where low- $V_{th}$  transistors consist of bit line drivers and high-Vth transistors are the data latch. Namely, it becomes a P-latch N-drive 4-T SRAM cell. Although a build-in self-refreshing data retention path is preserved to secure the retention of the bit state, the read/write disturbance impose a shadow on the functional correctness due to lack of bitline isloation mechanism. The degradation of the SNM (static noise margin) pointed out in [3] has verified such a potential hazard. By the definition of SNM, SRAM becomes more vulnerable to noise when the supply voltage drops [3]. This phenomenon propels the development of non-symmetrical R/W auxiliary circuitry providing disturb isolation from bitlines [4]. The disturb isolation or protection design becomes desparately demanding if the SRAM is fabricated using advanced nanometer CMOS technologies, or the SRAM cell is designed to be operated in near subthreshold range. The SRAM with write-assist loop reported in [6] is an example to demonstrate the disturb-free feature. However, it is not applicable to single-ended SRAM cells due to the symmetrical R/W design.

Thanks for the report predicted by ITRS, the area of memory in an SOC (system on chip) will soon to occupy over 90%. The performance of SRAM undoubtfully will pose strong impact on the overall efficiency of SOCs, particularly the power dissipation. This investigation proposes to gatecontrol the supply voltage of every column of SRAM, where two different supply voltages are selected by WL to reduce the standby power dissipation. Meanwhile, the gate drive voltage of the selected SRAM cells is boosted to a higher voltage level such that not only the read speed is enhanced, the slew rate of the output is also strengthen. Detailed post-layout simulation results validate the performance of the proposed design.

## II. ULTRA LOW POWER SRAM DESIGN

The proposed SRAM is illustrated in Fig. 1, consisting of one SRAM array, Column/Row decoders, Control circuit, Column selector, BIST (build-in self test), and PDP Reduction Circuit. Notably, PDP Reduction Circuit is composed of AVD (Adaptive Voltage Detector) and PVB (Pass-Transistor Gate Voltage Boosting) circuit. The supply voltage selection of the proposed SRAM cells is carried out by VDD selector. The major signals in the SRAM are listed as follows.

- WR\_EN : write/read (1/0) enable
- WORD\_Addr[4:0], Bit\_addr[4:0] : wordline/bitline address
- Data\_in, Data\_out : data input, data out
- VMS, BS : voltage mode select, boost select
- BIST\_EN, BIST\_Pass : BIST enable, BIST pass or not

#### A. SRAM cell design and analysis

The 5T single-ended reported in [6] is shown in Fig. 2, where two high-Vth PMOS consist of a latch, two low-Vth NMOS driven by WAB and WB serve as switches of data bit access. Another low-Vth NMOS, namely M205, is the switch

<sup>\*</sup>This investigation was partially supported by Ministry of Science and Technology, Taiwan, under grant MOST 107-2218-E-110-004- and 107-2218-E-110-016-.



Fig. 1. System diagram of 1 kb SRAM using the proposed cell



Fig. 2. 5T SRAM cell [6]

to a bitline (BLB). Although M205 was employed to isolated the cell from the noise on the bitline, the data state "0" at node Q will be compromised by leakage in the Write-assist loop. That is, the retention fault might exist in such a design.

To resolve the mentioned retention problem, a 6T singleended SRAM was proposed [8]. A high-Vth NMOS is added at the foot of the latch. When Qb is high (Q is low), M406 is on to drain the leakage such that the "0" state at Q is ensured. This extra transistor resolve the retention fault with the price of area.

To further reduce the standby power when the cell is not accessed, a new cell-column structure is proposed in Fig. 4. The design to reduce the standby power is as follows.

- 1). If any cell in the same column is accessed, WL is high and WLB is low such that M701 is on to provide VDD to the cells.
- 2). If none of the cells in the column is accessed, WL is low and WLB is high. A reduced supply voltage, VDD



Fig. 3. 6T SRAM cell



Fig. 4. Ultra low power 6T SRAM cell

- Vthp, is coupled to the cells. Thus, the supply voltage is dropped by Vthp to save power.

Notably, the proposed design is a power-gated mechanism to a column of cells. The biggest problem is how wide these gating transistors should be to maintain the correct operation and low power at the same time. This power-gated design apparently will run into a low power current scenario in the FS (fast NMOS, slow PMOS) corner, because Qb is hard to be kept high (as Q=0). The remedy to overcome this current shortage, analytic solutions should be derived. Assume a total of *n* cells are in the same column.  $I_{act}$  is the current required by the accessed cell, and  $I_{idl}$  denotes the current needed for the idle cells. Thus, the drain current of the power PMOS must satisfy :  $I_D \ge I_{act} + (n-1) \times I_{idl}$ . For instance, if n = 32and typical 40 nm CMOS technology is used to fabricate the SRAM, the total current required for an accessed cell in the column is  $I_D = 39.5\mu A = 31.5\mu A + 31 \times 245n A$ . By the saturation current equation, the width of the power PMOS is 750 nm.

### B. Read/Write cycles

The R/W operation of the proposed SRAM is tabulated in Table I. As soon as the row address and column address are ready, the corresponding decoders select the cell. WL and WA are then asserted high to turn on M605 and M604 (M603). Notably, Pre-discharge will ground the BLB before the R/W operation to prevent the state "0" from noise and leakage. Regardless read 1 or 0, WAB is low to shut off M603. The status at Qb will be coupled to BLB through M604 and M605. The entire read operation is illustrated in Fig. 5.

By contrast, the write operation is shown in Fig. 6.

• write 1 : WA is pulled high to turn on M604. WAB is low to turn off M603 at the same time. Pre-discharge pulls down Qb such that Q is pulled high.

• write 0 : WA is low and WAB is high to turn off M604 and turn on M603, respectively. Node Q is pulled down to ground by Pre-discharge.

#### C. PDP reduction circuit

Besides power gating design, another approach to further reduce the energy consumption is to reduce power-delay



Fig. 5. Read cycle timing

product (PDP), which is the measure of the energy, in every R/W operation. A refined PDP reduction circuit improved from the compensation circuit disclosed by [5] is shown in Fig. 7, mainly consisting of AVD (Adaptive Voltage Detector) and PVB (Pass-Transistor Gate Voltage Boosting). The entire circuit is activated by the assertion of BS. When BS is pulled high, AVD generates Boost\_EN to PVB to raise the supply voltage of the cells to be accessed from VDD to VDDD (a voltage higher than VDD). Details of the circuit designs are given in the following text.

1) AVD: With reference to Fig. 8, when BS is high to drive M802 to ground the foot of M801, the AVD is activated. A predefined VP0 is compared with the inverter composed of M804 and M805. The comparison of the VP0 and the switching voltage of the inverter is latched at VP1 by the feedback loop consisting of inv1, inv2 and a transmission gate. Finally,



Fig. 6. Write cycle timing



Fig. 7. PDP reduction circuit



Fig. 8. AVD circuit



Fig. 9. PVB circuit

Boost\_EN is generated by the inverse of VP1, namely VP2 and BSb (the inverse of BS).

2) *PVB:* Referring to Fig. 9, after BS is pulled up to logic 1, AVD is activated to detect the system voltage as described in the previous section. As long as AVD has not completed the system voltage detection (VP0 vs. the switching voltage of the inverter), the output of AVD, Boost\_EN, is kept at logic 0. That is, it is in a waiting mode. Top plate of the C901 will be pulled down to ground by inv903 and the bottom plate will be pulled up to VDD via M901. If the system voltage is higher than the switching voltage, AVD will pull Boost\_EN high. Thus, PVB enters standby mode. Once the signal WR\_ENb pulls high, which means one of the SRAM cells start to write or read, PVB will enter the boosting mode such that M901 is turned off and then top plate of C901 is pulled higher than the original VDD, called VDD'=VDD+ $\Delta$ V. The timing of the PVB is shown in Fig. 10.

## III. SIMULATION AND VERIFICATION

The proposed design is realized by TSMC 40 nm CMOS process. The layout of the entire charger is shown in Fig. 13,



Fig. 10. Timing diagram of gate drive boosting



Fig. 11. SNM simulations



Fig. 12. DNM simulations

where the chip area is  $525 \times 525 \ \mu m^2$ , where the core area is  $215 \times 144 \ \mu m^2$ . All-PVT-corner post-layout simulation for SRAM cell is firstly carried out. Fig. 11 is the static noise margin (SNM), where the worst case is 412.3 mV. Notably, since the SRAM in this study is single-ended cell, the SNM figure is not a usual butterfly shape as those conventional SRAM cells. Dynamic noise margin (DNM) is shown in Fig. 12 to tell that the VDD for the proposed SRAM cell can be as low as 0.3V.

Table II tabulates several previous SRAM designs using 40 nm or 65 nm CMOS technology. The proposed SRAM attains the second best SNM, the lowest read PDP and energy per access given 0.8 V supply voltage. Besides, the proposed SRAM can operated up to 100 MHz, which is also by far the fastest.

#### IV. CONCLUSION

A very low power-consuming SRAM cell design featured with dynamic supply voltage gating is proposed in this investigation. Namely, the supply voltage to hold the status



Fig. 13. Layout of the proposed SRAM design

TABLE II Performance comparison of SRAM designs

|                    | [6]   | [7]   | [8]     | [9]  | This work |
|--------------------|-------|-------|---------|------|-----------|
| Year               | 2012  | 2014  | 2015    | 2017 |           |
| CMOS Tech. (nm)    | 40    | 40    | 40      | 65   | 40        |
| Cell               | 8T    | 12T   | 5T      | 6T   | 6T        |
| Supply Volt. (V)   | 0.6   | 0.35  | 0.6     | 1.2  | 0.8       |
| SNM (mV)           | 86    | N/A   | N/A     | N/A  | 412.3     |
| Read PDP (fJ)      | N/A   | N/A   | N/A     | N/A  | 2.0592    |
| Capacity (kb)      | 256   | 4     | 4+1     | 1    | 1         |
| Word Length        | 16    | 16    | 5       | 4    | 32        |
| Frequency (MHz)    | 10    | 11.5  | 54      | 100  | 100       |
| Energy/access (pJ) | 11.8  | 1.91  | 0.9411  | 2.2  | 0.034     |
| Energy/bit (pJ)    | 0.699 | 0.119 | 0.18822 | 0.55 | 0.001     |

of the unaccessed SRAM cells is reduced by Vth such that the standby power is drastically reduced. Besides the supply voltage gating, a PDP reduction circuit composed of AVD and PVB is added to further reduce the power dissipation by shortening the transient of states. Post-layout simulations at all PVT corners verify the ultra low power performance.

#### ACKNOWLEDGMENT

The authors would like to express our deepest appreciation to CIC (Chip Implementation Center) in NARL (Nation Applied Research Laboratories), Taiwan, for the assistance of EDA tool support.

#### REFERENCES

- E. Morifuji, T. Yoshida, M. Kanda, S. Matsuda, S. Yamada, F. Matsuoka, Supply and threshold-voltage trends for scaled logic and SRAM MOS-FETs, *IEEE Tran. on Electron Devices*, vol. 53, no. 6, pp. 1427-1432, June 2006.
- [2] C.-C. Wang, Y.-L. Tseng, H.-Y. Leo, and R. Hu, A 4-Kb 500-MHz 4-T CMOS SRAM using low-VTHN bitline drivers and high-VTHP latches, *IEEE Trans. on Very Large Scale Integration (VLSI) Systems*, vol. 12, no. 9, pp. 901-909, Sep. 2004.
- [3] C.-C. Wang, C.-L. Lee, and W.-J. Lin, A 4-Kb low power SRAM design with negative word-line scheme, *IEEE Trans. on Circuits & Systems - I* : Regular Papers, vol. 54, no. 5, pp. 1069-1076, May 2007. TL10
- [4] S.-Y. Chen, C.-C. Wang, Single-ended disturb-free 5T loadless SRAM cell using 90 nm CMOS process, *IEEE Inter. Conf. on IC Design and Technology*(*ICICDT*), pp. 1-4, May 2012.
- [5] D.-S. Wang, Y.-H. Su, and C.-C. Wang, A readout circuit with cell output slew rate compensation for 5T single-ended 28 nm CMOS SRAM, *Microelectronics Journal*, vol. 70, pp.107-116, Nov. 2017.
- [6] M. Terada, S. Yoshimoto, S. Okumura, T. Suzuki, S. Miyano, H. Kawaguchi, and M. Yoshimoto, A 40-nm 256-kb 0.6-V operation half-select resilient 8T SRAM with sequential writing technique enabling 367-mV VDDmin reduction, 2012 13th Inter. Symposium on Quality Electronic Design (ISQED), pp. 489492, Mar. 2012.
- [7] Y.-W Chiu, Y.-H Hu, M.-H Tu, J.-K Zhao, Y.-H Chu, S.-J Jou, and C.-T Chuang, 40 nm bit-interleaving 12T subthreshold SRAM with data-aware write-assist, *IEEE Trans. on Circuits & Systems I : Regular Papers*, vol. 61, no. 9, pp. 2578-2585, Oct. 2014.
- [8] C.-C Wang, D.-S Wang, C.-H Liao, and S.-Y Chen, A leakage compensation design for low supply voltage SRAM, *IEEE Trans. on Very Large Scale Integration (VLSI) Systems*, vol. 24, no.5, pp. 1761-1769, Oct. 2015.
- [9] J. Lee, D. Shin, Y. Kim, and H.-J. Yoo, A 17.5 fJ/bit energy-efficient analog SRAM for mixed-signal processing, *IEEE Trans. on Very Large Scale Integration (VLSI) Systems*, vol. 25, no.10, pp. 2714-2723, Feb. 2017.