International Journal of Advance Engineering and Research Development e-ISSN (0): 2348-4470 p-ISSN (P): 2348-6406 International Conference of Trends in Information, Management, Engineering and Sciences (ICTIMES) Volume 5, Special Issue 02, Feb.-2018 (UGC Approved) ## NOVEL ENERGY EFFICIENT CARRY SKIP ADDER BASED ON DUAL MODE LOGIC DESIGN <sup>1</sup>Ms.C.Aishwarya, <sup>2</sup>Mrs.J.R.Beny, <sup>3</sup>R. Rajasekaran, <sup>4</sup>G.Abirami, Department of EEE, SNS College of Technology, Coimbatore. [1] Department of EEE, SNS College of Technology, Coimbatore. [2] Department of EEE, SNS College of Technology, Coimbatore (3). Department of EEE, SNS College of Technology, Coimbatore. [4] **ABSTRACT:-** The dual mode logic (DML) gates family proposed here empowers a very high level of energy-delay optimization flexibility at the very gate level. In this paper, this adaptability is utilized to enhance energy efficiency and performance of combinatorial circuits by manipulating their critical and noncritical paths. An approach that locates the design's critical paths and operates these paths in the boosted performance mode is proposed. The noncritical paths operate in the low energy DML mode, which does not affect the performance of the design, but allows significant energy consumption reduction. The proposed approach is analyzed on a 128 bit carry skip adder. Keywords:- Dual Mode Logic, energy efficiency, high performance, critical paths, energy-delay optimization. ### I. INTRODUCTION The DML logic gates was proposed in order to provide a very high level of energy-delay (E-D) optimization adaptablity. DML allows an on-the fly change between two operational modes at the gate level: static mode and dynamic mode. DML gates consume very low energy, with some performance degradation, as compared to standard CMOS gates in the static mode,. In other terms, dynamic DML gates possess very high performance at the expense of increased energy dissipation. A DML basic gate is based on a static logic family gate, e.g., a conventional CMOS gate, and an additional transistor. While DML gates have very simple and intuitive structure, they require an unconventional sizing scheme to achieve the desired behavior .Performance of most digital circuits and systems is determined by the delay of critical paths (CP). Even though standard synthesis tools attempt to design logic blocks without CP (i:e: equalized path delay), the slack from the targeted Clk (Clock) frequency always exists and should be repaired by designers. Many methods have been proposed to address these slacks. These methods include adaptive voltage scaling with a CP emulator circuit, multi oxide thickness driven threshold-voltages, multi-channel lengths for energy reduction in the non-CPs and performance boost in the CPs. *Meijer et al.* and *Liu et al.* applied a body bias on a non-CP to improve energy consumption and increase performance of the CPs, respectively. While the aforementioned methods solve the critical path slack problem, in most cases they also result in a significant increase of energy consumption. In addition to these gate level approaches higher-level approaches were presented such as multi-mode logics, parameterized logic. In this paper, we issue both the gate and higher architectural levels. This paper proposes to meet the delay requirements of CPs along with lowering the over-all energy consumption of the design by utilizing the powerful modularity of DML. We propose and analyze a new approach, which locates the design's CPs and utilizes the on-the-fly modularity of DML to operate these paths in the boosted (dynamic) performance mode. The non-critical paths are operated in the low energy static DML mode, which does not affect the performance of the design. Since in most cases the majority of gates in the design are not on the CPs, the increase in energy consumption of the critical paths will be negligible in comparison to the general circuit consumption. Moreover, DML static gates dissipate less power than their CMOS counterparts, resulting in reduced power dissipation of the whole design. The proposed approaches have been analyzed on a 16 bit Carry Skip Adder (CSA) benchmark. Simulations carried out in a standard 180nm CMOS process. # II. DML BASICS ## A. DML OVERVIEW A basic architecture is composed of an static unclocked gate, A DML gate implementation can be one of two: "Type A" and "Type B", as shown in Figure 1(a-b) and Figure 1(c-d), accordingly. In the static DML mode of operation (Static mode), the M1 transistor is cut-off by applying the high clk signal for "Type A" and low clk\_bar for "Type B" topology. Therefore, the gates of both topologies operate similarly to the static logic gate, CMOS in this case. For a dynamic operation of the gate (Dynamic mode), the clk is enabled for toggling, providing two separate phases: pre-charge and evaluation. During the precharge phase, the output is charged to VDD in ``Type A" gates and discharged to GND in ``Type B" gates. During evaluation, the output is evaluated according to the values at the gate inputs, in a similar fashion to NORA/np- CMOS implementations. It was shown that DML gates have presented a very robust operation in both static and dynamic modes under process variations (PVT) and at low supply voltages. Dynamic mode robustness is mainly achieved by the intrinsic active restorer (pull-up in ``Type A" and pull-down in ``Type B"). This restorer allows glitches sustainment, leakage of charges and charge sharing. Unique sizing of the DML gate transistors is the key factor for achieving low energy consumption in the static DML mode (in which the topology of the gate is identical to the static gate). This sizing is also responsible for reduction of all capacitances of the gate. In a similar way, the unique transistor sizing enables evaluation through a low resistive network achieving fast operation in the dynamic mode. An intuitive visualization of the tradeoff inherently related to DML is shown in Figure 1(e). Energy efficiency is achieved in the static DML mode at the expense of slower operation (Low Energy and Low Performance, left scales). However, the dynamic mode is characterized by high performance, albeit with increased energy consumption (High Energy and High Performance, right scales). These tradeoffs allow a very high level of flexibility at the system level Fig 1: (a) (b) (c) (d) in Section III. Figure 1(f) and Figure 1(g) show the sizing of CMOS based DML gates in ``Type A" and ``Type B", respectively. These are optimized for dynamic operation. Figure 1(h) shows the conventional sizing of a standard CMOS gate where, WMIN is a minimal transistor width, $\beta$ is the PUN to PDN inherent up-sizing factor and f is the gate's general upsizing factor. As can be seen, the in and out capacitances of DML gates are significantly reduced, as compared to CMOS gates, due to the utilization of minimal width transistors in the pull-of ``Type A" or pull-down of ``Type B" networks Fig 1: (e) . The size of the pre-charge transistor is kept equal *S\_WMIN* in order to maintain a fast pre-charge period, despite the output load upsized gate, where *S* is the evaluation network upsizing factor. Figure 1(b) and Figure 1(d) show the footed ``Type A" and the headed ``Type B" DML gates, respectively. It allows successful pre-charge for a cascaded topology of standard Static gates Synchronous devices to a DML logic. Many aspects of DML gates sizing, as well as preferred set of gates for ``Type A" and ``Type B" topologies, have been analyzed and discussed. Optimization for network up-sizing parameters for load driving was conducted using the Logical Effort (LE) method [3]. The DML key achievement is that while presenting very high performance in the dynamic mode by the proposed sizing, the same topology also enables improved energy efficiency in static mode, as compared to a conventional CMOS. #### B. STATIC DML AS A SEMI-ENERGY OPTIMAL CMOS Design space of a CMOS gate is mainly influenced by VTH, transistor width, VDD, channel length, oxide thickness and body voltage. The influence of those parameters on E-D plain-optimization is being explored. For the CMOS family, the symmetry of the gate (i:e: equal rise and fall times) is highly important. This is due to the fact that in a combinational system there is always some uncertainty regarding the transition type. As a result, the pull-up network (PUN) of CMOS gates, which is constructed by low mobility PMOS devices, is sized up by the $\beta$ parameter. When optimizing a CMOS gate's energy at the expense of its performance, the transistor's width is the main parameter used for reducing the energy consumption. This is due to several facts:(1) Switching energy is proportionate to the load and quadratic dependent on VDD. Under energy optimization, the symmetry of the gates' performance does not constitute a constraint so the transistor's width can be reduced, as well as $\beta$ This significantly lowers the load capacitances. (2) With circuit's VDD lowering and technology scaling, leakage energy has become one of the key factors for static power dissipation. The leakage energy is caused by the numerous leakage currents of a device. The main leakage currents are the sub-threshold and gate leakage currents. These currents are linearly dependent on the transistor's width And under energy optimization they are considerably reduced. CMOS based DML operated in static mode with transistor sizes optimized for the dynamic mode is *de facto* a semi-energy-optimal CMOS structure with an additional negligible output capacitance for the *Clk* transistors (transistors M1 and M2). Static DML is still highly robust due to its complementary nature and withstands aggressive voltage scaling. This methodology can also be referred to as a stand-alone technique for reducing the energy consumption of digital circuits. The E-D tradeoff space under this approach is very wide and in this paper the discussion is limited only to transistors sizing, as shown in Figure 1(f) and Figure 1(g) for DML gates. ## III. CP-DML APPROACHES FOR ENERGY EFFICIENCY AND HIGH PERFORMANCE This section elaborates the proposed design approaches for energy efficient and high performance design of combinatorial systems. Sub-Section A presents an approach which utilizes DML gates in the dynamic mode on the CPs in order to improve their delays. Sub-Section B elaborates various aspects of energy reduction of all non-CP portions of the design. | 2 <sup>(Gates No.)</sup> operation modes for | | | | | High Performance mode with | | | | | | |----------------------------------------------|----------------|----------------|----------------|------------------|----------------------------|------------------------------------|----------------|----------------|------------------------|--| | E-D Space optimization. | | | | | High Energy Consumption. | | | | | | | D * a | Dyñ.∖ | Dŷn.∖ | Dyn.\ | D " # | D " 0 | Dyn. | Dyn. | Dyn. | - D - Ω | | | <u>0</u> | Stat. | Stat. | Stat. | D." | D = 0 | 2 ) 22. | 2 3 11. | 2 ) 221 | D " 0 | | | > " | Dyn.\ | Dyn.\ | Dyn.\ | | | Dyn. | Dyn. | Dyn. | | | | D ** 0 | Stat.<br>Dyn.\ | Stat.<br>Dvn.\ | Stat.<br>Dyn.\ | D = | D " 0 | | - | | D ∂ | | | > | Stat. | Stat. | Stat. | | - 2 | Dyn. | Dyn. | Dyn. | ~ <u>@</u> | | | D * e | Dyn.\ | (2) | Dyn.\ | D = | D 2 | Drve | (-b.) | D | D " 2 | | | ~ <u>Q</u> | Stat. | Stat. | Stat. | - 2 | - 2 Ω | Dyn. | Фул. | Dyn. | ∞ ₹ | | | D *** Ω | Stat. | (69) | Stat. | or 5<br>><br>D 6 | D @ | Dyn.∖<br>Stat. | (d) | Stat. | ~ <u>δ</u><br>><br>D δ | | | ον <u>δ</u><br>D δ | Stat. | Stat. | Stat. | or 5<br>D 5 | D 6 | Staf. | | Stat. | - B | | | D *** Ω | Stat. | Stat. | Stat. | &<br>&<br>& | D 0 | Stat. | Dyn.\<br>Stat. | <b>Bl</b> at. | ο <u>δ</u> | | | D ** Q | Stat. | Stat. | Stat. | > a & | D δ | Stat. | Stat. | Dyn.\<br>Stat. | 20 | | | Energy Efficient mode Cl | | | | | | Close to optimal Energy Efficiency | | | | | | V | Vith Lov | w perfoi | mance. | Wi | With Boosted Performance. | | | | | | **FIGURE 2.** (a) DML design optional operation modes (b) DML design degenerated to the dynamic mode (c) DML design degenerated to the static mode (d) DML design where only the CP's are dynamically operated while the rest of the design operate in the low energy static mode. Where, Dyn. And Stat. stands for Dynamic and Static respectively Theoretically, a general DML design can be controlled (input signal-driven control) or external signal-driven control to Theoretically, a general DML design can be controlled (input signal-driven control or external signal-driven control) to operate each gate in one of two modes: Static and Dynamic. This means that a general design can be operated in $2(Gates\ Number)$ different options, each one leading to a different operating point on the E-D space of the design. Figure 2(a) visualizes this modularity. The degenerated approaches for operating all the gates in one of the two modes, similar to a sole gate, are shown in Figures 2(b) and 2(c). Switching between these two modes leads to the distinct tradeoff, meaning that the design optimized either to achieve maximum performance or minimum energy consumption. ## **A.SOLVING CPS TIMING VIOLATIONS** As discussed in Section I, the CPs of a design are automatically identified using standard design flow tools. By replacing only these paths with DML gates and applying the dynamic mode on these paths, their delay can be reduced. The rest of the design is implemented using standard CMOS static logic. Of course, special design constraints should be enforced in all the intersections between a static path and a dynamic one. In some of these cases, a footer and header should be applied. Figure 2(d) presents a design in which the CPs were located and *only those paths* were given the option to toggle between dynamic and static mode, according to the system requirements. If the system design can withstand slower operation, the CP logic will operate in static mode. If the system is required to meet the defined *Clk* period for all cycles, the CPs will operate in the dynamic mode. Such application can be a smart phone that operates with two frequencies: slow one for power save/hibernating mode and a fast one for video streaming. To emphasize, low complexity systems will normally bear only one frequency for operation and therefore the CPs will constantly operate in the dynamic mode. Normally, the amount of gates on the CP is small as compared to the total amount of gates in the design. Therefore, in most cases, the inherent dynamic-operation energy of these CPs will lead to a non-significant increase in total energy consumption of the design. #### B. SOLVING THE CPs TIMING VIOLATION WHILE REDUCING THE TOTAL ENERGY CONSUMPTION As described in the previous Sub-Section, the CPs are mapped and operated in the dynamic DML mode. In Sub-Section A,the rest of the circuit was assumed to keep a standard CMOS logic gates topology. Therefore, the design was proposed to solve the CPs' timing constraints at the expense of a small degradation in energy consumption, as compared to a complete CMOS design. In this Sub-Section, all portions of the design, which are not a part of the CPs, will be mapped to static mode DML gates (similar to semi-energy optimized CMOS gates, described in section II). In most designs, these non-CPs are not time constrained and therefore the asymmetry behavior of their transitions and consequently their performance degradation will withstand the *Clk* period. The use of the static DML mode for the mass majority of gates in the design will lead to a significant reduction in the total dynamic and static energy consumption. Figure 3 visualizes this approach. #### IV. MODULAR BENCHMARK This section, presents the chosen benchmarks. As depicted in Section III we will discuss three designs: - 1) A CPs accelerator, as described in Sub-Section III(A), which has 2 operation modes: - \_ ``DML Carry Path-Dynamic"- The DML CPs are activated in the dynamic mode. - \_ ``DML Carry Path-Static"- The DML CPs are activated in the static mode. In both of these modes the rest of the non-CPs portions of the system are designed with standard CMOS. - 2) A CPs accelerator with low energy consuming non-CPs, as described in Sub-Section III(B), which has 2 operation modes: - \_ ``DML Carry Path-Dynamic. With low energy non-CP static" The DML CPs are activated in the dynamic mode, while the rest of the system operates in the DML static mode. - \_ ``DML Carry Path- Static. With low energy non-CPs- Static" The DML CPs are activated in the DML static mode, similar to the rest of the system. - 3) CMOS equivalent design. A Carry Skip Adder (CSA, also called carry bypass adder) was chosen as a benchmark to demonstrate and evaluate the proposed concept. The CP of the CSA increases as a function of the number of inputs, making it possible to examine the E-D trends as a function of the CPs lengths. It is important to note that the proposed methods can apply over any combinatorial circuits and a CSA was chosen only due to its modularity and simplicity. #### A. CMOS CSA DESIGN A conventional CSA is composed of a set of Ripple Carry Adder (RCA) blocks. They essentially utilize the carry propagation in order to skip the carry from one RCA to the next RCA block. It is possible to predict the propagation of the carry by a simple XOR gate. Such prediction mechanism can substantially reduce the delay. The CP in CSA occurs when the carry ripples at the first block, and then skips the rest of the blocks and then ripples again at the last block. This is the longest possible route in the CSA. *Lehman et al.* have researched CSAs with non-uniform sized distributed RCA blocks [20]. *Majerski* has presented a multi-level of carry-skip propagation architecture. Guyot et al. and Oklobdzija et al. proposed algorithms for choosing optimized block sizes. In this paper, a simple CMOS CSA design with a fixed size of 4-bits blocks was designed, as shown in Figure 4. Clearly, the methods presented in this paper can be generalized to any CSA block size constant or variable and for multi or single level carry path. A general single-bit Full Adder (FA) equations are: Figure 3. Full Adder Figure 4. Multiplexer Figure 5. Ripple Carry Adder Block Figure 6. Carry Skip Adder Block ## **B. DML CRITICAL PATH DESIGN** Figure 7. DML Critical Path Design Selected CSA Figure 7 shows the DML implementation of the CSA's CP.The CP \_ows through the \_rst NOR (assuming that the carry in of the whole design is 0) and through all the MUXs of the design. The gate level implementation of the CP can be constructed with various topologies of DML: DML NOR gates are most efficiently implemented in the ``Type A" topologies and NAND gates in ``Type B",. The Boolean logic does not allow an efficient implementation of a MUX with a NOR following a NAND or vice-versa, which is the preferred topology for DML logic design. Therefore, in the chosen topology, the CP is composed only of NANDs (where one of them is implemented using efficient ``Type B" and the other one has a less optimal ``Type A" structure). The last inverter in each RCA block is a headed ``Type B" inverter, which maintains correct Pre-Charge phase for the CP. The sizes of the transistors in terms of minimal transistor width are shown in Figure 7. In the design, implemented in such way, only 8% of transistors will (optionally) operate dynamically, while the remaining 92% of the transistors are kept at the low energy static mode. This modular design keeps the same complexity and the same dynamic-to-static-gates ratio, as a function of the input vector's length, N [bits]. ### V. SIMULATION RESULTS The modular benchmarks circuits, described in the previous section were simulated in a standard 180nm CMOS process, using the Mentor Graphics. Implementations of these methods on the benchmark CSAs were mainly examined over the E-D plain and as a function of the operating voltage and the CP's length. Note, the naming convention for the different designs and operating modes is elaborated in the preface of Section III. All energy and delay measurements are per-operation. #### POWER COMPARISON AT DIFFERENT APPLIED VOLTAGE ## DELAY COMPARISON AT DIFFERENT APPLIED VOLTAGE #### VI. CONCLUSION CP timing violation and energy minimization are important issues in all digital circuits. The invaluable possibilities, inherent to design with DML gates, leverage the flexibility of the design to meet CP timing along with reducing the total energy consumed by the circuit, as shown in this paper. Until now, meeting the CP timing was closely related to a rise in the consumed energy by conventional methods. In this work this paradigm is contradicted - both timing and low energy consumption requirements are met. We showed that the performance of the 180nm CSA benchmark circuit was improved by X2, while also achieving reduction of energy consumption of X2.5. Since the CSA circuit is not optimal for DML implementations, it is expected that these improvements will be even more significant for other designs. ## VII. REFERENCES - 1. Guyot.A., Hochet.B., Muller.J.M., (1987), 'A way to build efficient carry-skip Adders', IEEE Trans. Comput, Vol. 36, No. 10, pp. 1144-1152. - 2. Kaizerman.A., Fisher.S., Fish.A., (2013), 'Subthreshold Dual Mode Logic' IEEE Trans. Very Large Scale Integr. (VLSI) Syst, Vol. 21, No. 5, pp. 979-983. - 3. Lehman.M., Burla.N., (1961), 'Skip techniques for high-speed carry-propagation in binary arithmetic units', IRE Trans. Electron. Comput, Vol. EC-10, No. 4, pp. 691-698 - 4. Liu.X., Mourad.S., (2000), 'Performance of submicron CMOS devices and gates with substrate biasing', IEEE Int. Symp. Circuits Syst. Geneva, Vol. 4, pp. 9-12. - 5. Levi.I., Belenky.A., Fish.A., (2013), 'Logical effort for CMOS-based dual mode logic gates' IEEE Trans. Very Large Scale Integr. (VLSI) Syst. - 6. Meijer.M., de Gyvez.J.P., (2012), 'Body-bias-driven design strategy for area-and performance-efficient CMOS circuits', IEEE Trans. Very Large Scale Integration (VLSI) Syst, Vol. 20, No. 1, pp. 42-51. - 7. Ranganathan.N., Mohanty.S.P., Kougianos.E., Patra.P., (2008), 'Low-Power High-Level Synthesis for Nanoscale CMOS Circuits'. New York, NY, USA:Springer-Verlag. - 8. Sutherland.I.E., Sproull.B., Harris.D., (1999), 'Logical Effort-Designing Fast CMOS Circuits', San Mateo, CA, USA: Morgan Kaufmann. - 9. Teman, A., Pergament, L., Cohen, O., Fish, A., (2011), 'A minimum leakage quasi-static RAM bitcell,' J. Low Power Electron. Appl., Vol. 1, pp. 204-218. - 10. Tran.A.T., Baas.B.M., (2010), 'Design of an energy-efficient 32-bit adder operating at subthreshold voltages in 45-nm CMOS', Proc. Commun. Electron. 3rd Int. Conf., pp. 87-91. - 11. Yakovlev.A., Mokhov.A., Sokolov.D., (2012), 'Adapting Asynchronous Circuits to Operating Conditions by Logic Parametrisation', Proc. 18<sup>th</sup> IEEE Int. Symp. Asynchron. Circuits Syst., pp. 17-24. - 12. Wei.L., Sirisantana.N., Roy.K., (2000), 'High-performance low-power CMOS circuits using multiple channel length and multiple oxide thickness,' Proc. Comput. Design Int. Conf., pp. 227-232. - 13. Xu Oklobdzija.V.G., Barnes.E.R., (1985), 'Some optimal schemes for ALU implementation in VLSI technology,' Proc. IEEE 7th Symp. Proc.Comput. Arithmetic, pp. 2-8