# International Journal of Advance Engineering and Research Development p-ISSN: 2348-6406 Volume 3, Issue 4, April -2016 ## DESIGN OF LOW POWER PULSE TRIGGERED DUAL DYNAMIC FLIP FLOP BASED ON A SIGNAL FEED-THROUGH SCHEME S. KEERTHANA<sup>1</sup>, R. ARIGOVINDAN<sup>2</sup>, J. VASANTHARAJ<sup>3</sup> <sup>1</sup>Assistant Professor, Department of ETE, Karpagam College of Engineering, Coimbatore. <sup>2</sup>Assistant Professor, Department of ECE, Panimalar Polytechnic College, Chennai. <sup>3</sup>PG Graduate, Department of ECE, Tiruchengode. Abstract—Flip-flops (FFs) are the basic storage elements used extensively in all kinds of digital designs. In particular, digital designs nowadays often adopt intensive pipelining techniques and employ many FF-rich modules such as register file, Dual Dynamic Flip-Flop Based on a Signal Feed-Through Scheme, and first infest out. A low-power flip-flop (FF) design featuring an explicit type pulse-triggered structure and a modified true single phase clock latch based on a signal feed-through scheme is presented. The proposed design successfully solves the long discharging path problem in conventional explicit type pulse-triggered FF (P-FF) designs and achieves better speed and power performance. Based on post-layout simulation results using cadence CMOS 180-nm technology, the proposed design outperforms the conventional P-FF design data-close-to-output (MHLFF) by 8.2% in data-to-Q delay. In the meantime, the performance edges on power and power-delay-product metrics are 24.6% and 31.7%, respectively. **Keywords:** Flip-flop (FF), Low Power, Explicit Pulse Triggered Flip Flop, Signal Feed-Through Scheme, Dual Dynamic Flip-Flop. #### 1. INTRODUCTION Flip-flops is estimated that the power consumption of the clock system, which consists of clock distribution networks and storage elements, is as high as 50% of the total system power. FFs thus contribute a significant portion of the chip area and power consumption to the overall system design. Pulse-triggered FF (P-FF), because of its single-latch structure, is more popular than the conventional transmission gate (TG) and master—slave based FFs in high-speed applications. Besides the speed advantage, its circuit simplicity lowers the power consumption of the clock tree system. A P-FF consists of a pulse generator for strobe signals and a latch for data storage. If the triggering pulses are sufficiently narrow, the latch acts like an edge-triggered FF. Since only one latch, as opposed to two in the conventional master—slave configuration, is needed, a P-FF is simpler in circuit complexity. This leads to a higher toggle rate for high-speed operations. P-FFs also allow time borrowing across clock cycle boundaries and feature a zero or even negative setup time. Pulse-triggered FF (P-FF), because of its single-latch structure, is more popular than the conventional transmission gate (TG) and master—slave based FFs in high-speed applications. Besides the speed advantage, its circuit simplicity lowers the power consumption of the clock tree system. A P-FF consists of a pulse generator for strobe signals and a latch for data storage. If the triggering pulses are sufficiently narrow, the latch acts like an edge-triggered FF. Since only one latch, as opposed to two in the conventional master—slave configuration, is needed, a P-FF is simpler in circuit complexity. This leads to a higher toggle rate for high-speed operations. P-FFs also allow time borrowing across clock cycle boundaries and feature a zero or even negative setup time. Despite these advantages, pulse generation circuitry requires delicate pulse width control to cope with possible variations in process technology and signal distribution network. In a statistical design framework is developed to take these factors into account. To obtain balanced performance among power, delay, and area, design space exploration is also a widely used technique. In this brief, we present a novel low-power P-FF design based on a signal feed-through scheme. Observing the delay discrepancy in latching data "1" and "0," the design manages to shorten the longer delay by feeding the input signal directly to an internal node of the latch design to speed up the data transition. This mechanism is implemented by introducing a simple pass transistor for extra signal driving. When combined with the pulse generation circuitry, it forms a new P-FF design with enhanced speed and power-delay-product (PDP) performances. ### 1.1. Conventional Explicit Type P-FF Designs PF-FFs, in terms of pulse generation, can be classified as an implicit or an explicit type. In an implicit type P-FF, the pulse generator is part of the latch design and no explicit pulse signals are generated. In an explicit type P-FF, the pulse generator and the latch are separate Without generating pulse signals explicitly, implicit type P-FFs are in general more power-economical. Fig. 1: Conventional P-FF designs. (a) ep-DCO (b) CDFF (c) Static-CDFF (d) MHLFF However, they suffer from a longer discharging path, which leads to inferior timing characteristics. Explicit pulse generation, on the contrary, incurs more power consumption but the logic separation from the latch design gives the FF design a unique speed advantage. Its power consumption and the circuit complexity can be effectively reduced if one pulse generator is shares a group of FFs. Despite these advantages, pulse generation circuitry requires delicate pulse width control to cope with possible variations in process technology and signal distribution network. In, a statistical design framework is developed to take these factors into account. To obtain balanced performance among power, delay, and area, design space exploration is also a widely used technique. In this brief, we present a novel low-power P-FF design based on a signal feed-through scheme. Observing the delay discrepancy in latching data "1" and "0," the design manages to shorten the longer delay by feeding the input signal directly to an internal node of the latch design to speed up the data transition. This mechanism is implemented by introducing a simple pass transistor for extra signal driving. When combined with the pulse generation circuitry, it forms a new P-FF design with enhanced speed and power-delay-product (PDP) performances. #### 1.2. SIGNAL FEED-THROUGH SCHEME TYPE P-FF DESIGNS P-FF Design Recalling the four circuits reviewed in Section II-A, they all encounter the same worst case timing occurring at 0 to 1 data transitions. Referring to Fig. 2(a), the proposed design adopts a signal feed-through technique to improve this delay. Similar to the SCDFF design, the proposed design also employs a static latch structure and a conditional discharge scheme to avoid superfluous switching at an internal node. However, there are three major differences that lead to a unique TSPC latch structure and make the proposed design distinct from the previous one. First, a weak pull-up pMOS transistor MP1 with gate connected to the ground is used in the first stage of the TSPC latch. This gives rise to a pseudo-nMOS logic style design, and the charge keeper circuit for the internal node X can be saved. Fig. 2: Signal Feed-Through P-FF design. In addition to the circuit simplicity, this approach also reduces the load capacitance of node X. Second, a pass transistor MNx controlled by the pulse clock is included so that input data can drive node Q of the latch directly (the signal feed-through scheme). Along with the pull-up transistor MP2 at the second stage inverter of the TSPC latch, this extra passage facilitates auxiliary signal driving from the input source to node Q. The node level can thus be quickly pulled up to shorten the data transition delay. Third, the pull-down network of the second stage inverter is completely removed. Instead, the newly employed pass transistor MNx provides a discharging path. The role played by MNx is thus twofold, i.e., providing extra driving to node Q during 0 to 1 data transitions, and discharging node Q during "1" to "0" data transitions. Compared with the latch structure used in SCDFF design, the circuit savings of the proposed design include a charge keeper (two inverters), a pull-down network (two nMOS transistors), and a control inverter. The only extra component introduced is an nMOS pass transistor to support signal feed through. This scheme actually improves the "0" to "1" delay and thus reduces the disparity between the rise time and the fall time delays. In comparison with other P-FF designs such as ep-DCO, CDFF, and SCDFF, the proposed design shows the most balanced delay behaviors. The principles of FF operations of the proposed design are explained as follows. When a clock pulse arrives, if no data transition occurs, i.e., the input data and node Q are at the same level, on current passes through the pass transistor MNx, which keeps the input stage of the FF from any driving effort. At the same time, the input data and the output feedback Q\_fdbk assume complementary signal levels and the pull-down path of node X is off. Therefore, no signal switching occurs in any internal nodes. On the other hand, if a "0" to "1" data transition occurs, node X is discharged to turn on transistor MP2, which then pulls node Q high. Referring, this corresponds to the worst case timing of the FF operations as the discharging path conducts only for a pulse duration. However, with the signal feed through scheme, a boost can be obtained from the input source via the pass transistor MNx and the delay can be greatly shortened. Although this seems to burden the input source with direct charging/discharging responsibility, which is a common pitfall of all pass transistor logic, the scenario is different in this case because MNx conducts only for a very short period. Referring to Fig. 2(c), when a "1" to "0" data transition occurs, transistor MNx is likewise turned on by the clock pulse and node Q is discharged by the input stage through this route. Unlike the case of "0" to "1" data transition, the input source bears the sole discharging responsibility. Since MNx is turned on for only a short time slot, the loading effect to the input source is not significant. In particular, this discharging does not correspond to the critical path delay and calls for no transistor size tweaking to enhance the speed. In addition, since a keeper logic is placed at node Q, the discharging duty of the input source is lifted once the state of the keeper logic is inverted. #### 1.3. PROPOSED DDFF The proposed DDFF architecture. Node X1 is pseudo-dynamic, with a weak inverter acting as a keeper, whereas, compared to the MHLFF, in the new architecture node X2 is purely dynamic. Fig. 3: Proposed DDFF. An unconditional shutoff mechanism is provided at the frontend instead of the conditional one in MHLFF. The operation of the flip-flop can be divided into two phases: 1) the evaluation phase, when CLK is high, and 2) the precharge phase, when CLK is low. The actual latching occurs during the 1–1 overlap of CLK and CLKB during the evaluation phase. If D is high prior to this overlap period, node X1 is discharged through NM0-2. This switches the state of the cross coupled inverter pair INV1-2 causing node X1B to go high and output QB to discharge through NM4. The low level at the node X1 is retained by the inverter pair INV1-2 for the rest of the evaluation phase where no latching occurs. Thus, node X2 is held high throughout the evaluation period by the PMOS transistor PM1. As the CLK falls low, the circuit enters the precharge phase and node X1 is pulled high through PM0, switching the state of INV1-2. During this period node X2 is not actively driven by any transistor, it stores the charge dynamically. The outputs at node QB and maintain their voltage levels through INV3-4. If D is zero prior to the overlap period, node X1 remains high and node X2 is pulled low through NM3 as the CLK goes high. Thus, node QB is charged high through PM2 and NM4 is held off. At the end of the evaluation phase, as the CLK falls low, node X1 remains high and X2 stores the charge dynamically. The architecture exhibits negative setup time since the short transparency period defined by the one to one overlap CLK of and CLKB allows the data to be sampled even after the rising edge of the CLK before CLKB falls low [7]. Fig. 7 shows the post-layout timing diagram of the flip-flop at 2-GHz CLK frequency and 1.2 V supply in 90-nm UMC process technology. Node X1 undergoes charge sharing when the CLK makes a low to high transition while D is held low. This results in a momentary fall in voltage at node X1, but the inverter pair INV1-2 is skewed properly such that it has a switching threshold well below the worst case voltage drop at node X1 due to charge sharing. The timing diagram shows that node X2 retains the charge level during the precharge phase when it is not driven by any transistor. Note that the temporary pull down at node X2 when sampling a "one" is due to the delay between X1 and X1B. Thold1 $$\geq$$ Tvm (1) $$Thold0 \ge Tov - Tvm \tag{2}$$ The setup time and hold time of a flip-flop refers to the minimum time period before and after the CLK edge, respectively where the data should be stable so that proper samplings possible. Here setup time and the hold time depend on the CLK overlap period. If VMis the switching threshold of the inverter pair INV1-2 and Tvm is the time required to discharge node X1 to VM, the hold time required by the flipflop can be expressed as where Tov is the overlap period defined by the low to high transition of the CLK and high to low transition of CLKB. It should be greater than Tvm for the proper functioning of the flip-flop Thold1 and Thold0 represent the hold-time required for sampling a one and a zero, respectively. Also note that Thold1 and Thold0, respectively are the maximum time period after the CLK transition such that the flip-flop samples a zero and a one, respectively. The CLKB is high prior to the low to high transition of the CLK, when D is high, the parasitic diffusion capacitors at the drain of NM1 andNM2 are pre discharged, resulting in a low Tvm. Now the overlap period can be chosen such that Thold1 and Thold0 in (1) and (2), respectively, are minimized. Tov can be adjusted by setting proper size for the transistors in INV5 as specified in. This leads to a small negative setup time and a positive hold time close to zero. Fig. 8(a) shows hold time for sampling "zero," where D is held low for time-period slightly greater than Tov-Tvm after the positive CLK edge. This causes node X1 to discharge to a voltage greater than VM and INV1-2 restores the high level leading to a proper latching of "zero." Fig.4: Pulse Generator. Here, since D is held high for a time-period equal to Tvm, node X1 properly discharges and "one" is latched. We measured Tvm to be 18 ps in the pre-layout analysis, where only the frontend of the flipflop was simulated with proper load, and an overlap period of 50 ps was chosen. The pre-layout simulation of the flip-flop at 27 °C and 1.2 V supply voltage measures Thold0 to be 30 ps and Thold1 to be 15 ps. The slight variation of the results from that of (1) and (2) is due to the nonzero slopes of CLK and data signals. The conditional shutoff mechanism provided in SDFF (Fig. 3) is robust. It is capable of producing smaller sampling window by skewing the inverters and the NAND gate in the conditional shutoff path. Although this method can provide lower hold time requirements, it results in a larger precharge Node capacitance. Thus, the unconditional shutoff used in the proposed architecture provides a simple and power efficient method at the cost of a slightly involved design process. Since Tvm plays an important role in the hold time of the proposed architecture, the worst case hold time is determined by the switching threshold of INV1-2. A larger switching threshold with a short overlap period results in a smaller Tvm and, hence, a smaller hold time requirement. Fig. 5: Proposed DDFF. To analyze the performance of DDFF, other designs were also simulated under similar conditions. Since the D-Q delay reflects the actual portion of the time period consumed by the latching device to consider the minimum D-Q delay as the performance metric for speed. Optimum setup-time is the data-to-CLKdelay when D-Q is at its minimum. The power is divided into three parts—the latching power, the local CLK driving power, and the local data driving power, to accurately analyze the power-performance of various designs. The simulations are carried out at various data activities to obtain a realistic performance comparison of various designs. A data activity of 100% represents an output data transition at every positive CLK edge, and 0% represents no data transition. Since the performance of the proposed flip-flops depends on the CLK overlap period, a detailed analysis at various process and temperature corners is carried out. Also, the process variation impact on the power and latency of the flip-flop is studied in detail from a 1000-point Monte-Carlo simulation with mismatch between transistors. Here, power and D-Q delay were measured when the flip-flop was working at optimum setup-time. Results with and without a setup time margin [18] are provided to understand the importance of system design with process variations in mind. The leakage performance of various designs has been carried out. The leakage currents for different input and output conditions are measured to find the worst case leakage power. In addition, all the designs were analyzed at different voltage points to understand the impact of supply voltage fluctuation in the functionality of the flip-flops. Finally, a 4-b synchronous up-counter is designed to highlight the performance of the proposed flip-flop architecture. The reason for considering a counter is that the data activity at each bit position is known. The most significant bit has the least data activity (12.5%), whereas the least significant bit has the maximum (100%). #### 2. RESULTS The proposed flip-flop has the lowest PDP among the group. It gives 29%, 10%, and 7% reduction in total power dissipation compared to SDFF, PowerPC, and MHLFF, respectively, along with comparable speed performance. Also, it gives power performance comparable to CDFF while providing 8% improvement in PDP. SDFF and PowerPC have the highest CLK power dissipation, whereas the proposed DDFF has the least. PowerPC and CDFF dissipate the highest data driving power. They are 6.3 and 5.2 times higher than that of DDFF, respectively. Fig. 6: Hold-time required by DDFF for sampling. Zero. The latching power corresponds to the internal power dissipated in the flip-flop, which includes the dynamic power spent on local CLK processing, power on the dynamic nodes, and the static leakage power. The proposed design exhibits a smaller negative setup-time compared to SDFF and HLFF. In order to estimate the size of the flip-flops, the number of transistors used and the total layout area of various designs are provided. The proposed flip-flop uses least number of devices and has the lowest total layout area (Table I), which is 29% smaller than CDFF, the largest in the group. As mentioned before, the reason for the large layout area of CDFF is the conditional structures in the critical path, which has to be placed near to the transistors they are connected. Also, the larger rooting overhead for distributing data and CLK signals and the feedback mechanism in the circuit increases the layout area. As the total power dissipated in the flip-flop depends on the data activity, an illustration of power dissipated at data activities of 100%, 25%, and 0%. Data activity of 100% corresponds to 101010... Data pattern and 50% data activity corresponds to 11001100... Data pattern and so on. In order to analyze the performance of the flipflop in the absence of any data switching, power dissipation corresponding to 0% data activity for 11111... And 00000... data patterns are also provided. The results show that the proposed design consumes lowest total power for 100% and 0% (0000...) data activity. As mentioned earlier, the small precharge node, CLK-input, and data-input capacitances makes the proposed flip-flop power efficient at higher data rates. At 25% data activity, CDFF dissipates lowest power. #### 3. CONCLUSION In this paper, a new low power DDFF proposed. An analysis of the overlap period required to select proper pulse width was provided in order to make the design process simpler. The proposed DDFF eliminates the redundant power dissipation present in the MHLFF. A comparison of the proposed flip-flop with the signal feed through flip-flops showed that it exhibits lower power dissipation along with comparable speed performances. The post-layout simulation results showed an improvement in PDP by about 10% compared to the MHLFF at 25% data activity. #### REFERENCES - [1] Jin-Fa Lin, "Low Power Pulse- Triggered Flip- Flop Design Based on a Signal Feed-Through Scheme",IEEE Trans. Very Large Scale Integr(VLSI) Syst.,vol.22,NO.1,pp.181-185,Jan 2014 (references) . - [2] K. Chen, "A 77% energy saving 22-transistor single phase clocking D flip-flop with adoptive-coupling configuration in 40 nm CMOS," in Proc. IEEE Int. Solid-State Circuits Conf., pp. 338–339, Nov. 2011. - [3] E. Consoli, M. Alioto, G. Palumbo, and J. Rabaey, "Conditional pushpull pulsed latch with 726 fJops energy delay product in 65 nm CMOS," in Proc. IEEE Int. Solid-State Circuits Conf., pp. 482–483, Feb. 2012. - [4] S. Sadrossadat, H. Mostafa, and M. Anis, "Statistical design framework of sub-micron flip-flop circuits considering die-to-die and within-die variations," IEEE Trans. Semicond. Manuf., vol. 24, no. 2, pp. 69–79, Feb. 2011. - [5] M. Alioto, E. Consoli, and G. Palumbo, "General strategies to design nanometer flip-flops in the energy-delay space," IEEE Trans. Circuits Syst., vol. 57, no. 7, pp. 1583–1596, Jul. 2010 - [6] M. Alioto, E. Consoli, and G. Palumbo, "Flip-flop energy/performance versus Clock Slope and impact on the clock network design," IEEE Trans. Circuits Syst., vol. 57, no. 6, pp. 1273–1286, Jun. 2010. - [7] M. Alioto, E. Consoli, and G. Palumbo, "Analysis and comparison in the energy-delay-area domain of nanometer CMOS flip-flops: Part I methodology and design strategies," IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 19, no. 5, pp. 725–736, May 2011. - [8] M. Alioto, E. Consoli and G. Palumbo, "Analysis and comparison in the energy-delay-area domain of nanometer CMOS flip-flops: Part II results and figures of merit," IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 19, no. 5, pp. 737–750, May 2011. - [9] Y.-T. Hwang, J.-F. Lin, and M.-H. Sheu, "Low power pulsetriggered flip-flop design with conditional pulse enhancement scheme," IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 20, no. 2, pp. 361–366, Feb. 2012. - [10] H. Mahmoodi, V. Tirumalashetty, M. Cooke, and K. Roy, "Ultra low power clocking scheme using energy recovery and clock gating," IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 17, no. 1, pp. 33–44, Jan. 2009. - [11] P. Zhao, J. McNeely, S. Venigalla, G. P. Kumar, M. Bayoumi, N. Wang, and L. Downey, "Clocked-pseudo-NMOS flip-flops for level conversion in dual supply systems," IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 17, no. 9, pp. 1196–1202, Sep. 2009.