# A 500-MHz 32-bit DETFF-based Shift Register Utilizing 40-nm CMOS Technology

Jyoshnavi Akiri<sup>\*</sup>, Lean Karlo Santos Tolentino<sup>\*†</sup>, Lung-Jieh Yang<sup>‡</sup>, Balasubramanian Esakki<sup>§</sup>, Sivaperumal Sampath<sup>¶</sup>, and Chua-Chin Wang<sup>\*||2</sup>

\*Department of Electrical Engineering, National Sun Yat-Sen University, Kaohsiung, Taiwan

<sup>†</sup>Department of Electronics Engineering, Technological University of the Philippines, Manila, Philippines

<sup>‡</sup>Department of Mechanical and Electromechanical Engineering, Tamkang University, Tamsui, Taiwan

<sup>§</sup>Department of Mechanical Engineering, Vel Tech University, Chennai, Tamil Nadu, India

<sup>¶</sup>Department of Electronics and Communication Engineering, Presidency University, Bengaluru, Karnataka, India

Institute of Undersea Technology, National Sun Yat-Sen University, Kaohsiung, Taiwan

Email: ccwang@ee.nsysu.edu.tw

Abstract-A double-edge triggered flip-flop (DETFF) has the distinct ability to latch data at either the rising or the falling edge of the edge unlike the single-edge triggered flip-flop (SETFF). The proposed DETFF uses parallel dual paths that work without the need of keepers for input signal boost and an inverting clock for the opposite phase operation. A Schmitt trigger replaced the conventional feedback inverter with keeper in the DETFF, since the feedback inverter may lead to metastability which can affect the output at unexpected timing. The said DETFF was implemented in a 32-bit shift register using TSMC 40-nm CMOS for functionality testing. At a load capacitance of 60 pF, 31.71% power consumption decrease and 43.57% T<sub>c-q</sub> delay reduction were exhibited by the proposed shift register as shown by the post-layout simulation results. It has the best normalized energy per bit normalized power, and normalized delay per bit among prior works.

*Index Terms*—DETFF, low power, Schmitt trigger, shift register, PDP.

#### I. INTRODUCTION

Most digital systems use flip-flops or registers to keep the state of the system, since sequential logic circuits rely on prior output and current input [1], [2]. A shift register consisting of flip-flops can add, shift, and multiply data. Consequently, they store and convey digital data [3]–[5]. Speaking of digital systems, the most difficult challenge of chip designer's work is to develop a digital IC which can function at the maximum throughput while utilizing the least amount of power, resulting in better energy conservation.

Unlike a single edge triggered flip-flop (SETFF), a doubleedge triggered flip-flop (DETFF) has the ability to latch data at either the falling or rising edge of the clock and is designed to significantly reduce the amount of power that it consumes with no negative effects as shown in Fig. 1 [3], [6], [7]. As a result, implementing the DETFF into a shift register with a high resolution has become a must. Implementing a DETFF may be done in a variety of different ways. An XOR gate



Fig. 1. Comparison between (a) DETFF and (b) SETFF waveforms [7].

combined with a delay circuit which outputs internal pulse signals on each edge of the clock is an example of a common approach to this problem [1]–[3]. Another one, i.e., the data input's (D) path is iterated, which makes it possible for the flip-flop that stores the data bit to be sampled on every clock's edge [8].

Another register that was created was an 8-bit reversible LPSR (linear phase shift register) based on DETFF. The power consumption of the typical linear feedback shift register (LFSR) was decreased by 10% thanks to this LPSR [9]. Unlike in [10], a positive feedback source-coupled logic was developed to reduce the amount of power that was used by the LFSR. However, due to the fact that a single-edge triggering mechanism was used, the circuit does not give the impact that is needed in terms of lowering the latency. To improve overall performance, an 8-bit LFSR was proposed to be equipped with a weighted random test pattern generator [11]. This leads in a reduction in the amount of time required for the delay, but at the expense of a greater amount of power consumption. To answer all of the issues that have been discussed, this paper presents a 32-bit shift register that employs DETFFs which use Schmitt trigger and remove the usage of keepers to lower both the delay and the power consumption.

#### II. DESIGN METHODOLOGY OF THE PROPOSED DETFF

## A. Prior DETFF

A conventional DETFF as shown in Fig. 2 consists of 4 pass transistors, 3 inverters with 3 feedback-connected keepers, an

<sup>&</sup>lt;sup>1</sup>The corresponding author is Prof. C.-C. Wang.

inverter generating the inverting clock, and an output keeper circuit [12]. Notably, the keepers or regenerative transistors are employed to strengthen the signal when inverters I21, I22, and I23 are receiving signals. Moreover, an inverting clock CKB is generated by inverter I24 for alternate sampling of data (D) and interchanging of upper and lower paths though nodes N21 & N23 and N22 & N24, respectively, in every opposite clock (CK) phase. These keepers, including MP21, MP22, MP24, are one of the major reasons of DC power loss. They also create internal loops to slow down operations. Besides, their sizes must be selected carefully, namely ratioed design.



Fig. 2. Conventional DETFF cell [12].

## B. Proposed DETFF

Fig. 3 shows the proposed DETFF, where a dual path is implemented to sample data (D) in every clock's (Clk) edge. Unlike the prior DETFF, the proposed DETFF eliminated the use of keepers and the inverting clock. Moreover, it replaced the typical complementary CMOS inverters with two degenerated Schmitt triggers on each path (namely MP4, MN4, MP3, MN3, MP5, MN5, and MP6, MN6, MP7, MN7, MP8, MN8), where the conventional two switching points are merged into one.

Referring to Fig. 3, the operation is described as follows:

- Clk is rising: MN1 and MN2 are on, while MP1 and MP2 are off. Then, MN1 passes input D; this signal is negated by the Schmitt trigger. The negated signal is stored at node 4 until Clk's falling edge arrives.
- Clk is falling: MP2 is on. From this, Q outputs a signal according to the stored state at node 4. Moreover, during Clk's falling edge, MP1 passes D; the procedure is repeated similar to Clk at rising edge. However, MP1 and MP2 are now on while MN1 and MN2 are off. Then, the Schmitt trigger inverted this signal, which is stored at node 2.

## C. Transistor Sizing of the Proposed DETFF

Referring again to Fig. 3, all lengths of the transistors are sized to minimum. The widths of pass transistors, namely, MP1, MN1, MP2, and MN2, which are driven by Clk are largely sized (1  $\mu$ m), since these devices accommodate a high-frequency and varying D input signal. Moreover, the voltage



Fig. 3. Proposed DETFF cell.

at node 1 or 3 is equal to D - Vth; a lower Vth is needed. It was shown through simulations that by increasing the width, the threshold voltage of short-channel transistor decreases as a result of threshold roll-off [13]. Minimum geometry is implemented for sizes of MP5, MN5, MP8, and MN8. Then, the sizes of MP4 and MP7 are calculated using Eqn. (1), while the sizes of MN4 and MN7 are computed using Eqn. (2) [14] where  $V_{SPL}$  and  $V_{SPH}$  are lower and higher switching voltages, respectively. The dimensions of MP3 and MP6 are dictated such that MP4 and MP7 are considered active series resistors with MP3 and MP6, respectively, and will turn on MP5 and MP8, respectively, during low switching point. Same goes through for MN3 and MN6 which are also in series with MN4 and MN7, respectively, and at high switching point, will switch on MN5 and MN8, respectively.

$$\frac{W_{MP4} \times L_{MP5}}{L_{MP4} \times W_{MP5}} = \frac{W_{MP7} \times L_{MP8}}{L_{MP7} \times W_{MP8}} = \left(\frac{V_{SPL}}{V_{DD} - V_{SPL} - V_{THP}}\right)^2$$
(1)

$$\frac{W_{MN4} \times L_{MN5}}{L_{MN4} \times W_{MN5}} = \frac{W_{MN7} \times L_{MN8}}{L_{MN7} \times W_{MN8}} = \left(\frac{V_{DD} - V_{SPH}}{V_{SPH} - V_{THN}}\right)^2$$
(2)

Besides noise immunity which is an advantage of Schmitt trigger over typical CMOS inverter, the implemented sizes of transistors using Schmitt trigger-based DETFF cell are smaller than that of prior work's inverter-with-keeper-based DETFF cell [12] which has fewer transistor counts. In fact, the size of the DETFF cell as shown in Fig. 4 ( $3.66 \times 7.46 \ \mu\text{m}^2$ ) is smaller than the prior work ( $70.13/4 \ \mu\text{m} \times 20.95/2 \ \mu\text{m} = 17.53 \times 10.48 \ \mu\text{m}^2$ . Normalizing it further, our DETFF's normalized area is  $\frac{3.66 \times 7.46 \ \mu\text{m}^2}{(40 \ n\text{m})^2} = 0.017$  that is smaller than that of the prior work which is  $\frac{17.53 \times 10.48 \ \mu\text{m}^2}{(90 \ n\text{m})^2} = 0.023$ .



Fig. 4. Layout of the proposed DETFF cell.

## D. DETFF-Based, 32-Bit Shift Register

To verify the functionality of the said DETFF, two 32-bit shift registers, as shown in Fig. 5, are constructed. Moreover, its power consumption and time delay were compared to a shift register which used a conventional or prior DETFF cell [12]. These 2 shift registers were assembled fully as shown in the chip architecture in Fig. 5. The said architecture's operation is stated as follows:

- When S0 = 0: The proposed shift register is enabled, while the prior shift register is disabled; the opposite happens when S0 = 1.
- By using 2:1 MUXs, S0 selects the resulting outputs of the chosen shift register to be read out. In this way, the chip's output pin count is reduced.
- S1 and S2 are used to pass D to one of 4 8-bit shift register blocks in either 32-bit register. They are activated in the test mode.



Fig. 5. Block diagram of the proposed 32-bit shift register and the counterpart.

## **III. SIMULATION RESULTS**

TSMC 40-nm CMOS process was used to develop the 2 comparable DETFF-based shift registers in Fig. 5. The layout

and floorplan of the shift registers are shown in Fig. 6. The area of the full chip is  $808 \times 805 \ \mu\text{m}^2$  while the area of its core is  $307 \times 130.6 \ \mu\text{m}^2$ . All-PVT-corner simulation using HSPICE was implemented at process corners (FS, SF, TT, FF, and SS), power supply voltages (0.81, 0.9, and 0.99 V), and temperature corners (0, 25, and 75<sup>o</sup>C). Fig. 7 shows the proposed 32-bit, DETFF-based shift register's post-layout simulation for first (Q0 - Q7) 8 bits at worst corner (SF, 0.98 V, 75<sup>o</sup>C). This proves the functionality of the proposed shift register at the 500-MHz highest clock frequency.



Fig. 6. Layout and floorplan of the 2 DETFF-based 32-bit shift registers.



Fig. 7. First 8 bits' (Q0 - Q7) waveforms of the proposed DETFF-based 32-bit shift register at the worst case (SF, 0.98 V,  $75^{0}$ C).

The performance comparison results with numerous earlier efforts for DETFF-based shift registers are shown in Table I. In particular, the necessity to drive significant loads and the consideration of the pad, wire-bond, probe capacitances necessitate that 60 pF be driven while doing measurements. Normalized delay per bit, normalized power, and normalized energy per bit are determined by using the relevant equations in Eqn. (3), (4), and (5), respectively, where f is the frequency, and n is the bit length.

Normalized Delay per 
$$Bit = \frac{Delay}{Process^2 \times C_{\text{load}} \times f \times n}$$
 (3)

Normalized Power = 
$$\frac{Power}{VDD^2 \times C_{load} \times f \times n}$$
 (4)

Normalized Energy per 
$$Bit = \frac{Power}{f \times n}$$
 (5)

Referring to Table I, our work has the best normalized delay per bit, normalized power, and normalized energy per bit among all works. Notably, the proposed shift register has reduced power consumption by 31.71% and decreased  $T_{c-q}$ 

|                                          | ICCTICT  | ICISC    | WCSE      | IJCSIT   | ICISS    | Ref.      | ISCAS     | This      |
|------------------------------------------|----------|----------|-----------|----------|----------|-----------|-----------|-----------|
|                                          | [10]     | [9]      | [3]       | [8]      | [11]     | [15]      | [12]      | work      |
| Year                                     | 2016     | 2017     | 2018      | 2018     | 2019     | 2020      | 2021      | 2022      |
| Process (nm)                             | 180      | 180      | 180       | 180      | 180      | 40        | 90        | 40        |
| Simulation Result                        | Pre-lay. | Pre-lay. | Post-lay. | Pre-lay. | Pre-lay. | Post-lay. | Post-lay. | Post-lay. |
| VDD (V)                                  | 1.8      | -        | 1.8       | 1.8      | -        | 0.9       | 1         | 0.9       |
| T <sub>c-q</sub> Delay (ns)              | 0.573    | 2.32     | 2.2       | 0.251    | 0.775    | 5.69      | 1.5       | 0.869     |
| Power (mW)                               | 2.015    | 37.2     | 35.25     | 0.028    | 71       | 3.1       | 9.14      | 16.8      |
| PDP (pJ)                                 | 1.154    | 86.4     | 78.18     | 0.0072   | 55       | 17.63     | 13.71     | 14.59     |
| Clock frequency (MHz)                    | 100      | 100      | 125       | 500      | 10       | 100       | 100       | 500       |
| Bit length (bits)                        | 4        | 8        | 8         | 1        | 8        | 8         | 8         | 32        |
| Load capacitance (pF)                    | 0.05     | -        | 20        | 0.025    | -        | 20        | 20        | 60        |
| Norm. Delay per Bit ( $\times 10^{-9}$ ) | 884.2    | -        | 3.395     | 619.6    | -        | 222.27    | 11.574    | 0.565     |
| Norm. Power $(\times 10^{-6})$           | 310.95   | -        | 543.98    | 691.35   | -        | 23        | 571.25    | 21.6      |
| Norm. Energy per Bit (pJ/bit)            | 5.0375   | 46.5     | 35.25     | 56       | 88.75    | 3.875     | 11.425    | 1.05      |

 TABLE I

 Performance Comparison of the Proposed DETFF-Based Shift Register with the Prior Shift Registers

delay by 43.57% compared with the most recent shift register [12]. Lastly, Fig. 8 and Fig. 9 show the technology roadmaps for DETFF-based shift registers featuring two figures of merit (FOMs), namely normalized energy per bit and normalized delay per bit. According to these roadmaps, the trend shown by the dashed line suggests that the normalized energy per bit and normalized delay per bit of this work are close to the predicted FOMs for year 2022.



Fig. 8. Normalized energy per bit roadmap of DETFF-based shift registers.

#### **IV. CONCLUSION**

A DETFF-based 32-bit shift register using a 40-nm CMOS that eliminated the use of keepers & inverting clock where the DETFF is enhanced by degenerated Schmitt triggers, was designed at the 500-MHz clock frequency and 60-pF load. It has a 31.71% reduced power consumption and 43.57% decreased  $T_{c-q}$  delay as compared to the prior work. It has the best normalized power, normalized delay per bit, and normalized energy per bit among all works.

## ACKNOWLEDGEMENT

Taiwan Semiconductor Research Institute (TSRI) is greatly acknowledged for providing the EDA tools used. National Science and Technology Council (NSTC), Taiwan supported



Fig. 9. Normalized delay per bit roadmap of DETFF-based shift registers.

this study through its grants NSTC 110-2623-E-110-001-, 109-2221-E-032-001-MY3, and 110-2221-E-110-063-MY2.

#### References

- C.-C. Wang, G.-N. Sung, M.-K. Chang, and Y.-Y. Shen, "Energyefficient double-edge triggered flip-flop design," *Journal of Signal Processing Systems*, vol. 61, no. 3, pp. 347-352, Dec. 2010.
- [2] C.-C. Wang, G.-N. Sung, M.-K. Chang, and Y.-Y. Shen, "Energyefficient double-edge triggered flip-flop design," in *Proc. IEEE Asia Pacific Conference on Circuits and Systems (APCCAS)*, pp. 1792-1795, Dec. 2006.
- [3] A. Baskoro, O. Setyawati, P. Siwindarto, and C.-C. Wang, "Low-power high-speed 8-bit shift register using double-edge triggered flip-flops," in *Proc. 8th International Workshop on Computer Science and Engineering* (WCSE), pp. 151-155, Jun. 2018.
- [4] L. K. S. Tolentino, I. C. Valenzuela, and R. O. Serfa Juan, "Overhead interspersing of redundancy bits reduction algorithm by enhanced error detection correction code," *Journal of Engineering Science & Technol*ogy Review, vol. 12, no. 2, pp. 34-39, Mar. 2019.
- [5] L. K. S. Tolentino, M. V. C. Padilla, and R. O. Serfa Juan, "FPGA-based redundancy bits reduction algorithm using the enhanced error detection correction code," *International Journal of Engineering & Technology*, vol. 7, no. 3, pp. 1008-1013, Mar. 2018.
- [6] K. Y. Yun, P. A. Beerel, and J. Arceo, "High-performance two-phase micropipeline building blocks: double edge-triggered latches and burstmode select and toggle circuits," *IEE Proceedings-Circuits, Devices and Systems*, vol. 143, no. 5, pp. 282-288, Oct. 1996.

- [7] C.-C. Wang, L. K. S. Tolentino, U. K. N. Ekkurthi, P.-Y. Lou, S. Sampath, "A 100-MHz 3.352-mW 8-bit shift register using low-power DETFF using 90-nm CMOS process," *International Journal of Electronics Letters*, early-access online, pp. 1-16, Jun. 2022.
- [8] C.-C. Yu and C.-C. Tsai, "Dual edge-triggered d-type flip-flop with low power consumption," *International Journal of Computer Science and Information Technology (IJCSIT)*, vol 10, no. 5, pp. 1-12, Oct. 2018.
  [9] Y. P. Kumar, B. S. Kariyappa and M. Z. Kurian, "Implementation of
- [9] Y. P. Kumar, B. S. Kariyappa and M. Z. Kurian, "Implementation of power efficient 8-bit reversible linear feedback shift register for BIST," in *Proc. International Conference on Inventive Systems and Control* (*ICISC*), pp. 1-5. Jan. 2017.
- [10] A. Tyagi, N. Pandey, K. Gupta, "PFSCL based linear feedback shift register," in Proc. International Conference on Computational Techniques in Information and Communication Technologies (ICCTICT), pp. 580-585, Mar. 2016.
- [11] A. Bagalkoti, S. B. Shirol, S. Rama Krishna, P. Kumar and B. S. Rajashekar, "Design and implementation of 8-bit LFSR, bit-swapping

LFSR and weighted random test pattern generator: A performance improvement," in *Proc. International Conference on Intelligent Sustainable Systems (ICISS)*, pp. 82-86, Feb. 2019.
[12] U. K. N. Ekkurthi, V. Dasari, J. Akiri, and C.-C. Wang, "A 100 MHz

- [12] U. K. N. Ekkurthi, V. Dasari, J. Akiri, and C.-C. Wang, "A 100 MHz 9.14-mW 8-Bit shift register using double-edge triggered flip-flop," in *Proc. 2021 IEEE International Symposium on Circuits and Systems* (ISCAS), pp. 1-4, May 2021.
- [13] K. A. Gupta, V. Venkateswarlu, D. Anvekar and S. Basu, "The impact of channel-width on threshold voltage for short channel devices," in *Proc. TENCON 2011 - 2011 IEEE Region 10 Conference*, pp. 715-719, Nov. 2011.
- [14] R. J. Baker, CMOS: circuit design, layout, and simulation, 4th ed. Hoboken, NJ: Wiley, 2019.
- [15] C.-H. Chu, "A broken line detection and aging protection circuit for multi-cell Li-ion battery pack and low power 8-bit shift register using double-edge triggered flip-flops," M.S. thesis, Dept. of Elect. Eng., Natl. Sun Yat-Sen Univ., Kaohsiung, 2020.