Contents lists available at ScienceDirect





**Microelectronics Journal** 

journal homepage: www.elsevier.com/locate/mejo

## A 1–6.5 Gbps dual-loop CDR design with Coarse-fine Tuning VCO and modified DQFD

Chua-Chin Wang<sup>\*</sup>, L S S Pavan Kumar Chodisetti, Bo-Hao Liao, Pradyumna Vellanki, Tzung-Je Lee

Department of Electrical Engineering, National Sun Yat-Sen University, No. 70, Lian-Hai Road, Gushan District, Kaohsiung, 80424, Taiwan

# ARTICLE INFO ABSTRACT Keywords: A dual-loop CDR (Clock and Data Recovery) is presented to recover digital data from 1 to 6.5 Gbps. The presented frequency acquisition technique is based on full rate clock architecture. By utilizing modified Digital Quadri-correlator Frequency Detector (DQFD) and Frequency Increment/Decrement Control circuit, the lock-in range is improved. Furthermore, the issue of state loss during wide frequency range detection is successfully mitigated. The inclusion of two control wires in the Coarse-fine Tuning VCO enables the utilization of separate loop filters in the dual loops, resulting in a more effective reduction of noise and jitter. Utilizing a 40-nm CMOS process, the presented CDR design has been implemented. The post-layout simulation results at 6.5 Gbps shows

a P2P and root-mean-square jitter values are 17.1 ps and 5.79 ps, respectively, for the retimed data.

#### 1. Introduction

Within a communication system, the clock signal needs to be acquired from the data stream at the receiver side. The acquisition of the clock is greatly facilitated by the CDR (Clock and Data Recovery) circuit [1-5]. To fulfil the diverse data rate requirements, CDR circuits must have a broad lock-in range. Additionally, achieving highquality communication necessitates low jitter. The implementation of dual-loop control is well adopted in CDR circuits due to its simplicity [1-4,6-10]. A conventional PLL (Phase-locked loop) based dual-loop CDR mechanism is depicted in Fig. 1. The frequency acquisition loop employs the Frequency Detector to carry out frequency comparison, while the PLL utilizes the Phase Detector to achieve phase locking. The Frequency Detector enables a frequency comparison between the input Data, and the output clock of the voltage-controlled oscillator (VCO), removing the need for an external reference frequency. In the event of CDR startup or loss of phase lock, the Frequency Detector is triggered to generate a control voltage using the Charge Pump and the Loop Filter, thereby shifting the VCO oscillation frequency towards the input data rate. Once the frequency difference falls within the capture range of the phase tracking loop, the Phase Detector takes charge and facilitates the phase locking of the VCO output clock with the input data phase.

The challenge linked to the CDR architecture of Fig. 1 involves the possibility of interference between the phase lock loop and frequency acquisition loop during the transition of control from the Frequency Detector to the Phase Detector. This interference could lead to a failure

to lock into the target phase [5]. By merging the two control signals from the dual-loop system into a unified signal for the VCO through a common loop filter, there is a risk of interference and unwanted glitches occurring in the control signal when the system is locked. This will cause an increase in jitter and lock-in time [2–4,6,7]. Digital Quadricorrelator Frequency Detector (DQFD) offers a solution to the CDR in dual-loop controlling by avoiding the generation of control pulses in the locked state [4,6,7,11]. Nevertheless, the operational state may be compromised when there is a significant frequency difference, resulting in a restricted lock-in range [12].

Many high-speed CDRs were reported, e.g., an unrestricted frequency acquisition based reference-less CDR [3] using counter, a reference-less CDR [4] using UP pulse selector, a reference-less CDR [7] with a modified coarse frequency detector, a PLL-based CDR [13], a multiplying delay-locked-loop-based CDR [14], the utilization of DQFD and unbound frequency detection techniques brings forth a CDR design [15], a multi-phase oversampling is used in the frequency acquisition scheme of a reference-less CDR [16], background loop gain controller based CDR [17], and a reference-less CDR [9]. All of these designs suffer a very limited lock-in range.

High P2P jitter is the backdrop with the several recent referenceless CDR circuits [3,8–10,17,18]. Certain other reports addressed the CDR specialized for the LCD panels [13], and for embedded display port [8].

In an effort to resolve the issues highlighted, this investigation recommends the implementation of our novel contributions: the Modified DQFD and the Frequency Increment/Decrement Control circuit

\* Corresponding author. E-mail address: ccwang@ee.nsysu.edu.tw (C.-C. Wang).

https://doi.org/10.1016/j.mejo.2024.106355

Received 26 March 2024; Received in revised form 17 June 2024; Accepted 24 July 2024 Available online 5 August 2024 1879-2391/© 2024 Elsevier Ltd. All rights are reserved, including those for text and data mining, AI training, and similar technologies.



Fig. 1. The conventional PLL based dual-loop CDR mechanism [1].

to extend the lock-in range. Moreover, the Coarse-Fine Tuning VCO is employed to prevent interference from dual loops and minimize jitter. The lock-in range in simulated results spans from 1 to 6.5 Gbps, showcasing P2P jitter of 17.1 ps and root-mean-square jitter of 5.79 ps. The presented reference-less dual-loop CDR architecture, along with the circuit description is made in Section 2. The all-PVT-corner (Process, Voltage, Temperature) results are shown in Section 3 and the conclusion is presented in Section 4.

#### 2. Proposed architecture

In Fig. 2, the proposed CDR architecture is illustrated. This circuit is designed to have a low jitter and wide lock-in range. It consists of various components, including a Phase Detector [5], a Hysteresis Lock Detector [19], two Charge Pump circuits, a Coarse-Fine Tuning VCO, LF-1 and LF-2 as two off-chip Loop Filters, and a Buffer. Our novel contributions include a Frequency Increment/Decrement Control circuit, and a Modified DQFD.

The phase lock loop and frequency acquisition loop constitute the advised reference-less CDR circuit. For dual loop detection, two digital quadrature signals, CKQ and CKI are generated by the VCO. When there is a significant initial frequency deviation between input DATA and CKI, the Hysteresis Lock Detector's output (LOCK) is logic 0, which disables the phase lock loop and triggers the frequency acquisition loop. Then, the frequency difference between DATA and CKI is detected by the Frequency Increment/Decrement Control Circuit and produces the signal U/D, to further indicate the VCO to vary the frequency. During this state, the comparison result of DATA and CKI, directly produces the two control signals, FD<sub>DOWN</sub> and FD<sub>UP</sub>.

Once the VCO frequency closely matches the DATA frequency, the LOCK signal is set to logic 1, allowing the CDR to operate simultaneously with the Modified DQFD and the Phase Detector. In this situation, the coarse control signal for the VCO,  $V_{COARSE}$ , is produced by the Charge Pump 2 and the Modified DQFD.  $V_{FINE}$ , the fine control signal, is generated by the Phase Detector and Charge Pump 1. The VCO receives coarse and fine tuning control signals separately to achieve accurate frequency as well as phase for the in-phase output (CKI) and quadrature output (CKQ). Finally, the output buffer will be driven by the CKI, to improve driving capability for heavy loads of 20 pF.

#### 2.1. Phase detector

Fig. 3 presents the Phase Detector in Fig. 2, which is based on Alexander phase detector. The circuit uses three consecutive time edges to sample DATA three times to determine whether the data transition exists and whether the CKI generated by the VCO is leading or lagging.

 $\rm DFF_{21}$  and  $\rm DFF_{22}$  sample their inputs on the positive edge of CKI to generate  $S_{III}$  and  $S_{I}$ , respectively.  $\rm DFF_{23}$  samples their input on the negative edge of CKI, and  $\rm DFF_{24}$  delays the result by half a cycle to generate  $S_{II}$ . If the input data has no transition, the three samples  $S_{I}$ ,  $S_{II}$ , and  $S_{III}$  are equal and no action will be taken. If CKI is leading (Lead), then, the first sample  $S_{I}$  will not be equal to  $S_{II}$  and  $S_{III}$ . If CKI is leading equal but not equal to  $S_{III}$ , as shown in Fig. 4.

In summary, the sampling point is determined by DFF<sub>21</sub> and DFF<sub>23</sub>, while DFF<sub>22</sub> and DFF<sub>24</sub> are only used as delay elements. The purpose is to delay the sampling value of this cycle to the output of the next cycle so that S<sub>I</sub>, S<sub>II</sub>, and S<sub>III</sub> of each cycle can maintain a constant value to make XOR gate produce effective output at the same time. When the data does not transition, a DC zero output will be produced so that the control of the oscillator is not disturbed.

In addition, to avoid interference from Phase Detector during frequency acquisition, the oscillation frequency of the VCO will first be made closer to the data frequency. When the two frequencies are close, the hysteresis lock detector's output LOCK will be logic 1 to complete the phase detection.

#### 2.2. Hysteresis lock detector

A lock detector plays an important role in a dual-loop CDR circuit, mainly used for switching between the phase lock loop and the frequency acquisition loop. Generally, it is desirable to have a smaller frequency error in the frequency acquisition loop, i.e., the frequency acquisition loop will monitor the frequency of the VCO closer to the operating frequency to help shorten the overall lock time. A phase lock loop would like to have a larger frequency error to prevent the phase-locked loop from cutting back to the frequency acquisition loop during the recovery time. It can be seen that a lock detector with hysteresis needs a smaller frequency error when switching from a frequency acquisition loop to a phase-locked loop (In-lock Condition), and vice versa, which needs a larger frequency error when switching from a phase-locked loop to a frequency acquisition loop (Out-of-lock Condition). Compared to conventional lock detectors, this Hysteresis Lock Detector has two frequency error ranges and can be used in two different conditions.

Fig. 5 presents the Hysteresis Lock Detector in Fig. 2. The elements in Hysteresis Lock Detector circuit are composed of counter, D Flipflops (DFF), and many basic logic gates. The counter serves the essential function of keeping track of the cycle count for both the DATA and the CKI, respectively. The circuitry for decision logic, depicted at the bottom of Fig. 5, assists in determining the lock status of the frequency.

In the In-lock condition, the two M-bit counters are triggered by DATA and CKI to start counting. During this In-lock condition, if either counter reaches  $2^{M-1} + 1$  and the other is still less than  $2^{M-1}$ , then the intermediate signal, C, will have the logic 1 value. The VCO frequency is still outside the error range. The reset signal  $R_b$  will occur and reset the counter and generate a pulse to trigger the DFF to get a low level LOCK signal. The procedure will persist untill the other counter reaches a value that is greater than or equal to  $2^{M-1}$ . The timing diagrams of Inlock condition are shown in Fig. 6. During Out-of-lock condition, when LOCK = 1, the pass transistor logic will increase the number of cycles to be compared from  $2^{M-1} + 1$  to  $2^{M-1} + 2^{K}$ , here K is an integer. In the Out-of-lock condition, the increase in the number of cycles will make the frequency error more. The timing diagrams of Out-of-lock condition are shown in Fig. 7.

During In-lock condition, the logic circuit compares the cycle numbers  $2^{M-1}$  +1 and  $2^{M-1}$ . For the LOCK signal, to get logic 1, the oscillation frequency of the VCO must meet the following conditions:

$$(2^{M-1}+1) \cdot T_{DATA} \ge 2^{M-1} \cdot T_{CLK_1} \quad \text{and}$$

$$(2^{M-1}+1) \cdot T_{CLK_1} \ge 2^{M-1} \cdot T_{DATA} \quad (1)$$



Fig. 2. The proposed dual-loop CDR mechanism.



Fig. 3. Alexander phase detector circuit diagram [5].



Fig. 4. Timing diagram of CKI sampling DATA.



Fig. 5. Hysteresis Lock Detector [19].

After re-organizing the above Eq. (1), the range of frequency for the VCO during In-lock condition can be obtained as follows:

$$\frac{2^{M-1}}{2^{M-1}+1} \cdot f_{DATA} \le f_{VCO} \le \frac{2^{M-1}+1}{2^{M-1}} \cdot f_{DATA}$$
(2)

In order to change the In-lock condition to Out-of-lock condition, the cycle number will be relatively  $2^{M-1} + 2^{K}$ . Therefore, the frequency range becomes larger and the range is derived as follows:

$$\frac{2^{M-1}}{2^{M-1}+2^{K}} \cdot f_{\text{DATA}} \le f_{\text{VCO}} \le \frac{2^{M-1}+2^{K}}{2^{M-1}} \cdot f_{\text{DATA}}$$
(3)

#### 2.3. Frequency increment/decrement control circuit

The Frequency Increment/Decrement Control Circuit in Fig. 2 is as shown in Fig. 8. The CKI/2 signal frequency is obtained by dividing the CKI signal by 2. The two 5-bit counters rely on CKI and DATA as their clock sources. The variation in frequency between the CKI and DATA signals leads to distinct outputs from the counters. Hence, the utilization of counter<sub>1</sub>'s output, Q3, and counter<sub>2</sub>'s output, A3, enables the detection of the faster frequency. As seen in Fig. 9(a), A3 would lead Q3 when DATA is faster than CKI, causing U/D to equal logic 1. This would raise the output frequency of the VCO. As seen in Fig. 9(b), Q3 would lead A3 when DATA is slower than CKI, causing U/D to equal logic 0. This would reduce the output frequency of the VCO. Q4 is included for comparison to address the initial large frequency difference between CKI and DATA. The RESET signal for the two counters, U/D\_rst is generated by the  $DFF_{52}$ .

#### 2.4. Modified DQFD

The circuit diagram of the Modified DQFD in Fig. 2 is depicted in Fig. 10. The traditional DQFD [11] is shown in the bottom of Fig. 10. Using a delay cell and a logical XOR gate, the edge of the DATA will be detected. Quadrature clock signal (CKQ), and CKI will be compared with the falling edge of DATA. This comparison enables the identification of the DQFD's four operating states: state 1, 2, 3, and 4. These states correspond to the values of CKI and CKQ, which are 00, 01, 11, and 10, respectively. These states will be changes as from  $1 \rightarrow 2 \rightarrow 3 \rightarrow 4$ , if DATA is ahead of VCO, as shown in Fig. 11. If VCO is ahead of DATA, then the states will change as,  $4 \rightarrow 3 \rightarrow 2 \rightarrow 1$ . As the input DATA and VCO frequencies CKI and CKQ change, the states will change accordingly. Using the output values of Shift Register, namely, A, B, C, and D, also with the complements of these, the two control signals FD<sub>UP</sub> and FD<sub>DOWN</sub> will be generated. Nevertheless, it is important to note that the operational state may be compromised in scenarios where frequency difference is high. To overcome this issue, an assistant circuit becomes necessary. When LOCK is logic 0, denotes the high frequency difference between CKI (CKQ) and DATA. In this



Fig. 6. In-lock condition timing diagram.

scenario, the choice of the control signal is made directly from the DIFF. When LOCK is logic 1, denotes the low frequency difference and the mentioned state loss issue would not take place. As a result, the output control signals,  $FD_{UP}$  and  $FD_{DOWN}$ , can be assigned to  $UP_1$  and  $DN_1$ , respectively.

Since this circuit operates at high frequency, DFF of current-mode logic (CML) is used to complete this design [20], as shown in Fig. 12. This CML DFF adopts the fold-cascode architecture. Compared with the traditional architecture, the number of stacked transistor layers is smaller and is more suitable for operation in low voltage environments. This CML DFF consists of two current mode D-type latches. The differential pair  $M_{\rm P901}\text{-}M_{\rm P902}$  copies the current  $I_{\rm SS}$  to the current mirrors  $M_{\rm N909}\text{-}M_{\rm N910}$  and  $M_{\rm N911}\text{-}M_{\rm N912}$  to control each latch in the sampling or storage mode. The  $M_{N910b}$  and  $M_{N912b}$  use the above two current mirrors to realize the additional output of the clock current steering differential pair. When CLK is at high potential, I<sub>SS</sub> will be mirrored to the pairs M<sub>N901</sub>-M<sub>N902</sub> and M<sub>N905</sub>-M<sub>N906</sub>. At this time, the potential of the Y point is equal to VDD–Iss  $\times$  R1, and M<sub>N903</sub>-M<sub>N904</sub> are turned off. When CLK is at low level, it switches to the storage mode. The input data differential pair  $\rm M_{\rm N901}\text{-}M_{\rm N902}$  loses its function due to no current, and the interleaved coupling pair  $M_{\rm N903}$ - $M_{\rm N904}$  starts to operate so that the data is stored.  $M_{\rm bias1}$  and  $M_{\rm bias2}$  are mainly used to make the  $V_{\rm ds}$  of  $M_{N909},\,M_{N910},\,M_{N911},\,M_{N912},\,M_{N910b}$  and  $M_{N912b}$  to be equal to reduce the influence of channel modulation effect and improve the accuracy of the current mirror.

#### 2.5. Charge pump and loop filter

Neither a phase detector nor a frequency detector can provide an accurate voltage signal that is proportional to the phase difference (or

### Table 1 Input and output signals of charge pump 1

| Circuit       | Input Signa      | Output Signal      |                     |
|---------------|------------------|--------------------|---------------------|
| oncur         | CP <sub>UP</sub> | CP <sub>DOWN</sub> | CP <sub>OUT</sub>   |
| Charge Pump 1 | PD <sub>UP</sub> | PD <sub>DOWN</sub> | V <sub>FINE</sub>   |
| Charge Pump 2 | FD <sub>UP</sub> | FD <sub>DOWN</sub> | V <sub>COARSE</sub> |

frequency difference) of the input signal. The charge pump provides a digital signal that is converted to a current signal, proportional to the input signal's phase difference (or frequency difference). The loop filter converts the current signal into a voltage signal and filters out the high-frequency noise by the design of the loop bandwidth.

Fig. 13 presents circuit diagram of Charge Pump 1 and Charge Pump 2 in Fig. 2. The input and output signals of Charge Pump 1 and Charge Pump 2 circuits are tabulated in Table 1. Due to the fixed bias current, lower power noise is generated. A differential architecture is used to improve layout matching. Due to the difference in characteristics between PMOS and NMOS, the switching time between UP and DN signals will be different. To avoid the difference in switching time, a single-ended to double-ended circuit is added to compensate for the delay time, which is depicted in Fig. 14.

The Charge Pump 1 will charge and discharge the control voltage  $V_{\text{FINE}}$ , but the control voltage has a great impact on the oscillation frequency of the voltage controlled oscillator such that a loop filter is needed to filter out high-frequency noise and stabilize the control voltage.

Fig. 15 represents the phase lock loop and the corresponding transfer functions [21]. In the design,  $C_2$  is a large capacitor, while  $R_1$ , used



Fig. 8. Frequency Increment/Decrement control circuit.

to monitor voltage changes, and  $C_1$ , a small capacitor, are used to filter out surges generated after charging and discharging. After considering the stability and calculating the transfer function, the following formula is obtained [22]:

$$C_1 = \frac{K_{PD} \times K_{VCO}}{\omega_{c^2}} \times \sqrt{\frac{1 + (\omega_c \tau_z)^2}{1 + (\omega_c \tau_p)^2}} \times \frac{\tau_p}{\tau_z}$$
(4)

$$C_2 = C_1 \cdot \left(\frac{\tau_z}{\tau_p} - 1\right) \tag{5}$$

$$R_1 = \frac{\tau_z}{C_2} \tag{6}$$

 $K_{\rm PD}$  is the gain of the phase detector and charge pump,  $K_{\rm VCO}$  is the gain of the VCO,  $\omega_{\rm c}$  is the loop bandwidth,  $\omega_{\rm REF}$  which is usually designed

to be 1/20 ~ 1/40, and  $\omega_z$  and  $\omega_p$  are the zero and pole positions of the loop filter, which can be derived from the phase margin, which is usually designed to be 60°. This results in C<sub>1</sub>= 375 fF, C<sub>2</sub> = 4.84 pF, and R<sub>1</sub> = 3.85 KΩ.

#### 2.6. Coarse-fine tuning VCO

Fig. 16 presents the circuit diagram of Coarse-Fine Tuning VCO in Fig. 2. It is composed of four Differential Input Differential Output (DIDO) delay cell blocks (D1–D4) [23]. To generate the quadrature clock signals, CKI and CKQ, two D to S Converters (Differential to Single-ended) are employed. Voltage controlled Coarse-Fine Tuning VCO contains two frequency modulation methods. One is a coarsetuning mechanism, which has a large frequency modulation range, and



Fig. 9. Illustrated signals of Frequency Increment/Decrement Control circuit.



Fig. 10. Modified DQFD.

the oscillation frequency is controlled by varying the gate voltage of  $MN_{1207}$ , a frequency acquisition loop. The other one is a fine-tuning mechanism, which has a smaller frequency modulation range and is fine-tuned by a phase-locked loop, where the oscillation frequency is controlled varying the gate voltage of  $MN_{1206}$ . The DIDO delay cell is driven by  $MN_{1205}$  to provide the necessary tail current for ensuring the desired free running frequency. When the  $V_{\rm FN}$  increases,  $I_{\rm D,MN1206}$  increases, and the small signal resistance -2/gm of the cross-coupled pair  $M_{\rm N1203}$ - $M_{\rm N1204}$  becomes larger, thus reducing the oscillation frequency. To ensure that the current flowing through  $M_{\rm P1201}$ - $M_{\rm P1204}$  is balanced,

the variation of  $I_{\rm D}$  in  $MN_{1206}$  is opposite to the variation of  $I_{\rm D}$  in  $MN_{1207}.$  Consequently, the Phase Bias circuit is employed to produce the finely adjusted voltage,  $V_{FN}$ , with an inverse amplitude relative to  $V_{FINE}$ , as depicted in Fig. 17.

Since the output voltage of this oscillator is not at full swing, a D to S Converter circuit needs to be added at the output end to convert it into a full-swing digital signal so that the digital circuit can operate normally. The circuit is shown in Fig. 18. This circuit uses two sets of differential amplifiers to amplify the two inputs individually, and then amplifies them with a common source amplifier composed of transistor



Fig. 11. Timing diagram of frequency detector.



Fig. 12. Current mode D-type flip-flop (CML DFF) [20].



Fig. 13. Circuit diagram of Charge Pump.

 $M_{\rm P1405}$  and transistor  $M_{\rm P1406}.$  It can produce nearly 50% of the periodic oscillation signal.

#### 3. Implementation and verification

Using the 40 nm CMOS process, the proposed CDR design is implemented. The chip layout is shown in Fig. 19. The area of the core circuit is 90.04  $\times 174.2~\mu m^2$  and the area of complete design is

537.42  $\times 537.585~\mu m^2$  and for the 20 pF load capacitance, the power consumption is 54.51 mW at 6.5 Gb/s from a 0.9 V supply. The Output Buffer consumes 43.7 mW of power, which means 80.16% of total power consumption. Modified DQFD consumes 3.62 mW, which means 6.64% of total power consumption. Frequency Increment/Decrement Control Circuit consumes 0.592 mW, which means 1.08% of total power consumption. Hysteresis Lock Detector consumes 0.515 mW, which means 0.94% of total power consumption.



Fig. 14. Single-ended to double-ended circuit.



Fig. 15. Phase lock loop and the corresponding transfer functions.



Fig. 16. Circuit diagram of the Coarse-Fine Tuning VCO.

#### 3.1. Functional simulations

The Hysteresis Lock Detector, Modified DQFD, and Frequency Increment/Decrement Control Circuit are simulated firstly to verify the wide locking range and fast locking function of this circuit and compared with the traditional digital correction frequency detector. For instance, a DATA signal with frequency 3.57 GHz (period = 280 ps) compared within the frequency range 2.63 ~ 5.56 GHz (period difference =  $\pm 100$  ps) in the corner [TT, 25 °C, 0.9 V], is shown in Fig. 20. The horizontal axis is the normalized frequency difference, and the vertical axis is the average and subtraction of the rising frequency FD<sub>UP</sub> pulse and the falling frequency FD<sub>DOWN</sub> pulse to represent the charging and discharging of the Charge Pump. From the graph, we can see that inside the red line is the situation when the Hysteresis Lock Detector has reached "lock" (LOCK = 1). In this case the linearity of the digital frequency calibrator can still be maintained. Outside the red line is the



Fig. 17. Circuit diagram of the Phase Bias.



Fig. 18. D to S Converter.

situation where the Hysteresis Lock Detector is out of "lock" (LOCK = 0), which is different from the traditional digital calibrated frequency detector, where there is a higher gain and there is no dead zone. Due to the high gain, this design not only achieves a fast locking effect, but also has a wider locking range than conventional digitally calibrated frequency detectors.

#### 3.2. Performance analysis and comparison

Fig. 21 shows the results of the presented full-rate CDR circuit simulated at 2 GHz with [SS, SF, TT, FS, FF] × [–10% of VDD, VDD, +10% of VDD] × [0 °C, 25 °C, 75 °C] corners, and confirms that the circuit can be locked at all corners in 180 ns. The V<sub>COARSE</sub> is varying from 0.27 V to 0.55 V and V<sub>FINE</sub> is varying from 0.4 V to 0.52 V for all PVT corners.

Fig. 22 shows the result of input signal  $F_{DATA} = 6.5$  GHz locked at the corner [TT, 25 °C, 0.9 V]. The frequency is locked by the CDR, when LOCK signal becomes logic 1 and the signal  $V_{COARSE}$  becomes stable. The phase lock loop starts to work and the phase is adjusted by the control signals PD<sub>UP</sub>, PD<sub>DOWN</sub>, and  $V_{FINE}$ .

Fig. 23 shows the simulation results of input signal  $F_{DATA}$  at 1 GHz, 1.67 GHz, 3 GHz, 3.5 GHz, 4 GHz, 5 GHz at the corner [TT, 25 °C, 0.9 V].

The circuit is added with PRBS7 to complete the eye diagram simulation. Fig. 24 shows the simulation result of  $F_{DATA} = 6.5$  Gbps at [TT, 25 °C, 0.9 V], Fig. 25(a) shows the eye diagram of the restored clock at [TT, 25 °C, 0.9 V], with  $F_{DATA} = 6.5$  Gbps, and Fig. 25(b) shows the eye diagram of the restored data at [TT, 25 °C, 0.9 V], with  $F_{DATA} = 6.5$  Gbps.

A comparison of the proposed design with various recent CDR works is presented in Table 2. An FOM is defined in Eq. (7). The findings demonstrate that our design outperforms all other CDR works in terms of wide lock-in range and Figure of Merit (FOM) from 2016 to 2023. Moreover, our CDR design showcases the minimum jitter (UI). Furthermore, Fig. 26 illustrates the technology roadmap of CDR circuits in recent years [3,8–10,13–18,24].

FOM (Gb/s) = 
$$\frac{\text{Lock-in range (Gb/s)}}{\text{P2P jitter (UI)}}$$
 (7)

#### 4. Conclusion

The referenceless dual-loop 1 Gbps to 6.5 Gbps CDR circuit is designed and implemented using 40 nm CMOS process and presented in this investigation. A modified DQFD, Frequency Increment/Decrement



174.2um

Fig. 19. Layout of the proposed CDR.



Fig. 20. Frequency Detector Conversion Curve.

Control circuit, Coarse-fine tuning VCO are included in this proposed CDR. Through all-PVT-corner post-layout simulation results, the proposed scheme achieved the wide capture range and a low RMS jitter of 5.79 ps for the retimed data.

Data will be made available on request.

#### Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

#### Acknowledgments

Data availability

The National Science and Technology Council, Taiwan has provided partial funding for this work through the grant NSTC 112-2221-E-110-063-MY3. The authors extends their gratitude to Taiwan Semiconductor Research Institute, Taiwan for their assistance in chip fabrication.



Fig. 21. All-PVT-corner Simulation Results.



Fig. 22. Simulation results for  $F_{\text{DATA}}$  = 6.5 GHz at [TT, 25 °C, 0.9 V].



Fig. 23. Other frequency simulation results at [TT, 25 °C, 0.9 V].







Fig. 25. Eye diagram simulation for  $F_{DATA} = 6.5$  Gbps, (a) Restored Clock Eye Diagram; (b) Restored Data Eye Diagram.



Fig. 26. Technology Roadmap of the CDR circuits.

#### Table 2

Performance comparison with several prior works.

| Parameter                         | [14]               | [15]              | [16]               | [3]                  | [18]               | [10]              | Ours                        |
|-----------------------------------|--------------------|-------------------|--------------------|----------------------|--------------------|-------------------|-----------------------------|
| Year                              | 2016               | 2017              | 2018               | 2020                 | 2021               | 2022              | 2024                        |
| Publication                       | NEWCAS             | TCAS-II           | JSSC               | TCAS-II              | TCAS-I             | TCAS-I            | MEJ                         |
| Verification                      | Meas.              | Simul.            | Meas.              | Meas.                | Meas.              | Meas.             | Simul.                      |
| Process (mm)                      | 40                 | 180               | 65                 | 180                  | 40                 | 40                | 40                          |
| Supply voltage (V)                | 0.9                | 1.9               | 1.3                | 1.8                  | 1.2                | 0.9               | 0.9                         |
| Data rate (Gb/s)                  | $2.6~\sim~6.4$     | 0.3 ~ 4           | 6.7 ~ 11.2         | $0.42 \sim 3.45$     | $10.432\sim16$     | 6.15 ~ 10.9       | $1 \sim 6.5$                |
| Lock-in range<br>(Gb/s)           | 3.8                | 3.7               | 4.5                | 3.03                 | 5.5                | 4.55              | 5.5                         |
| P2P jitter (ps)                   | 13.25<br>at 6 Gbps | 83.6<br>at 4 Gbps | 13.7<br>at 10 Gbps | 29.8<br>at 3.45 Gbps | 61<br>at 16 Gbps   | 24<br>at 10 Gbps  | 17.1<br>at 6.5 Gbps         |
| P2P jitter (UI)                   | 0.0795             | 0.334             | 0.137              | 0.102                | 0.976              | 0.24              | 0.111                       |
| RMS jitter (ps)                   | 1.6<br>at 6 Gbps   | 12.4<br>at 4 Gbps | 13.7<br>at 10 Gbps | 4.33<br>at 3.45 Gbps | 7.1<br>at 16 Gbps  | 4.7<br>at 10 Gbps | 5.79<br>at 6.5 Gbps         |
| Area (mm <sup>2</sup> )           | N.A.               | 0.162             | 0.99               | 0.442                | N.A.               | 0.347             | 0.289                       |
| Power (mW)                        | 1.8<br>at 6 Gbps   | 71<br>at 4 Gbps   | 22.5<br>at 10 Gbps | 20.3<br>at 3.45 Gbps | 39.9<br>at 16 Gbps | 5.8<br>at 10 Gbps | 54.51<br>at 6.5 Gbps        |
| Power efficiency [3]<br>(mW/Gb/s) | 0.3                | 17.75             | 2.25               | 5.88                 | 2.49               | 0.58              | 8.39<br>(1.66) <sup>a</sup> |
| FOM (Gb/s)                        | 47.79              | 11.06             | 32.84              | 29.47                | 5.63               | 18.95             | 49.48                       |

<sup>a</sup> By excluding Output Buffer power consumption.

#### C.-C. Wang et al.

- A. Pottbacker, U. Langmann, H.-U. Schreiber, A Si bipolar phase and frequency detector IC for clock extraction up to 8 Gb/s, IEEE J. Solid-State Circuits 27 (12) (1992) 1747–1751.
- [2] C. Gimeno, D. Flandre, D. Bol, Low-power half-rate dual-loop clock-recovery system in 28-nm FDSOI, in: Proc. 2018 IEEE 9th Latin American Symposium on Circuits & Systems, LASCAS, 2018, pp. 1–4.
- [3] K.-S. Son, T.-J. An, Y.-H. Moon, J.-K. Kang, A 0.42–3.45 Gb/s referenceless clock and data recovery circuit with counter-based unrestricted frequency acquisition, IEEE Trans. Circuits Syst. II 67 (6) (2020) 974–978.
- [4] P.M. Ha, N. Huu Tho, N. Thanh, Q. Nguyen-The, An improved wide-band referenceless CDR with UP pulse selector for frequency acquisition, in: Proc. 2020 International Conference on Advanced Technologies for Communications, ATC, 2020, pp. 56–60.
- [5] B. Razavi, Challenges in the design high-speed clock and data recovery circuits, IEEE Commun. Mag. 40 (8) (2002) 94–101.
- [6] K.-J. Hsiao, M.-H. Lee, T.-C. Lee, A clock and data recovery circuit with wide linear range frequency detector, in: Proc. 2008 IEEE International Symposium on VLSI Design, Automation and Test, VLSI-DAT, 2008, pp. 121–124.
- [7] P.M. Ha, N.H. Tho, H.H. Hanh, Q. Nguyen-The, A wide-band reference-less bidirectional continuous-rate frequency detector, in: Proc. 2019 3rd International Conference on Recent Advances in Signal Processing, Telecommunications & Computing, SigTelCom, 2019, pp. 25–29.
- [8] Y.-H. Moon, J.-W. Yoo, Y.-S. Ryu, S.-H. Kim, K.-S. Son, J.-K. Kang, A 2.41-pJ/bit 5.4-Gb/s dual-loop reference-less CDR with fully digital quarter-rate linear phase detector for embedded displayport, IEEE Trans. Circuits Syst. I. Regul. Pap. 66 (8) (2019) 2907–2920.
- [9] N.H. Tho, H.-J. Lee, T.-J. An, J.-K. Kang, A 0.32–2.7 Gb/s reference-less continuous-rate clock and data recovery circuit with unrestricted and fast frequency acquisition, IEEE Trans. Circuits Syst. II 68 (7) (2021) 2347–2351.
- [10] W. Xiao, Q. Huang, H. Mosalam, C. Zhan, Z. Li, Q. Pan, A 6.15–10.9 Gb/s 0.58 pJ/Bit reference-less half-rate clock and data recovery with phase reset scheme, IEEE Trans. Circuits Syst. I. Regul. Pap. 69 (2) (2022) 634–644.
- [11] B. Stilling, Bit rate and protocol independent clock and data recovery, Electron. Lett. 36 (9) (2000) 824–825.

- [12] K. Lee, J.-Y. Sim, A 0.8-to-6.5 Gb/s continuous-rate reference-less digital CDR with half-rate common-mode clock-embedded signaling, IEEE Trans. Circuits Syst. I. Regul. Pap. 63 (4) (2016) 482–493.
- [13] H.-E. Liu, C.-J. Su, C.-K. Cheng, W.-K. Liu, Design and modeling of PLL-based clock and data recovery circuits with periodically embedded clock encoding for intra-panel interfaces, in: Proc. 2016 IEEE International Symposium on Circuits and Systems, ISCAS, 2016, pp. 2234–2237.
- [14] K. Gharibdoust, A. Tajalli, Y. Leblebici, A wideband MDLL with jitter reduction scheme for forwarded clock serial links in 40 nm CMOS, in: Proc. 2016 14th IEEE International New Circuits and Systems Conference, NEWCAS, 2016, pp. 1–4.
- [15] Y.-L. Lee, S.-J. Chang, Y.-C. Chen, Y.-P. Cheng, An unbounded frequency detection mechanism for continuous-rate CDR circuits, IEEE Trans. Circuits Syst. II 64 (5) (2017) 500–504.
- [16] K. Park, W. Bae, J. Lee, J. Hwang, D.-K. Jeong, A 6.7–11.2 Gb/s, 2.25 pJ/bit, single-loop referenceless CDR with multi-phase, oversampling PFD in 65-nm CMOS, IEEE J. Solid-State Circuits 53 (10) (2018) 2982–2993.
- [17] Y.-S. Yao, C.-C. Huang, S.-I. Liu, A jitter-tolerance-enhanced digital CDR circuit using background loop gain controller, IEEE Trans. Circuits Syst. II 68 (6) (2021) 1837–1841.
- [18] W.-M. Chen, Y.-S. Yao, S.-I. Liu, A 10.4–16-Gb/s reference-less baud-rate digital CDR with one-tap DFE using a wide-range FD, IEEE Trans. Circuits Syst. I. Regul. Pap. 68 (11) (2021) 4566–4575.
- [19] Y.S. Tan, K.S. Yeo, C.C. Boon, M.A. Do, Design of a hysteresis lock detector for dual-loops clock and data recovery circuit, in: Proc. 2011 IEEE International Conference of Electron Devices and Solid-State Circuits, 2011, pp. 1–2.
- [20] G. Scotti, D. Bellizia, A. Trifiletti, G. Palumbo, Design of low-voltage high-speed CML D-latches in nanometer CMOS technologies, IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 25 (12) (2017) 3509–3520.
- [21] B. Razavi, Design of Analog CMOS Integrated Circuits, first ed., McGraw-Hill, 2001.
- [22] S.-I. Liu, C.-Y. Yang, Phase Locked Loop, Tsang Hai Publishing, Republic of China, 2006.
- [23] B. Razavi, Design of Integrated Circuits for Optical Communication, second ed., Wiley, 2012.
- [24] Y.-H. Yang, M. Tzou, T.-C. Lee, A 6.0–11.0 Gb/s reference-less sub-baud-rate linear CDR with wide-range frequency acquisition technique, IEEE Trans. Circuits Syst. II 70 (2) (2023) 386–390.