

# International Journal of Advance Engineering and Research Development

e-ISSN (O): 2348-4470

p-ISSN (P): 2348-6406

Volume 4, Issue 8, August -2017

## Design of Low power & High Speed 8-Bit Wallace Tree Multiplier using 10T Full-adder

#### NIKITA SINGH

ECE Department, CVRCOE Hyderabad

Abstract—Multiplier is the key element in the digital and high performance systems such as FIR filters, digital processors and microprocessors etc. Most of the arithmetic operations are done using multipliers. Designing multipliers for the high –speed integrated circuit with low power consumption is today's major concern for the VLSI field. Among the existing multiplier, Wallace tree multiplier is popular multiplier architecture. Wallace tree multiplier is a parallel multiplier, hence faster than an array multiplier. Speed of conventional Wallace tree multiplier can be further improved by using compressors. The target is achieved by making use of 4:2, 5:2, and 6:2 compressor techniques. In this paper, two numbers of 8-bits each are multiplied using Wallace tree multiplier. Comparison is done between Wallace tree compressors incorporating 10 T full adders and 28 T full adders. Performance analysis is in terms of power, delay and power-delay product. The multiplier was implemented at the circuit level in 45nm CMOS technology Cadence Virtuoso tool.

**Keywords**— Wallace tree, compressor, multiplier, full adder, power, delay.

#### I. INTRODUCTION

Until the 1970s, most minicomputers did not have a multiply instructions. In 1978, Motorola 6809 was one of the earliest microprocessors which had hardware multiply instruction. The performance of Digital Signal Processor in applications like multimedia and optical communication requires fast processing of a huge amount of digital data [1]. In many DSP applications, multiplier lies in the critical delay path and ultimately determines the performance of the algorithm [2]. Many researches are striving hard continuously to design multiplier with high speed, low power consumption, regular structure, so that it occupies less area for efficient and compact VLSI implementation. In the past many algorithms are being proposed to design a way for multiplication process. Every algorithm has its pros and cons, offering tradeoff between speed, area, power consumption and circuit complexity.

The simplest method perform multiplication is to add series of partial products using successive addition algorithm. It is used when optimized area and power consumption is required and delay can be tolerated [3]. In 1999, Robertson suggested more efficient approach for multiplication which is Shift and Add multiplication. Every multiplier bit gives one multiple of the multiplicand which is added to partial products. If multiplier is very large, more numbers of multiplicands are added. Number of addition will determine the delay of the multiplier. The performance will deteriorate with increasing number of additions. The parallel multipliers perform the computations using very few adders and iterative steps. Most advanced digital systems incorporate a parallel multiplication unit to carry out high speed mathematical operations. It offers high speed but circuitry is more complex as compared to serial multiplier.

Multiplication method comprises of generation of partial products and their accumulation. The speed of multiplication can be improved by reducing the number of partial products and/or accelerating the accumulation of partial products. Booth algorithm and Wallace tree algorithm are basic approaches are high speed parallel multipliers. In parallel booth multiplier architecture, the choice of adder does trade off between area and speed. Ripple carry adder provides smaller area at the expense of speed. Carry save adders provides good speed at the expense of area. In Wallace tree algorithm, partial products in each column are selected at a time and compressed. It is can concluded that the parallel multipliers are much better options than the serial multiplier. The total area in parallel multipliers is much less than that of serial multipliers; also power consumption is also less.

#### II. COMPRESSOR BASED WALLACE TREE MULTIPLIER

In conventional Wallace tree multiplier, three step processes is used to multiply two numbers. First step is to multiply each bit of one of the arguments, by each bit of the other. Second step is to reduce the number of partial products to two by layers of full and half adders. The third step is to group two numbers, and then add them with conventional adder [7] as shown in fig 1.

The conventional Wallace tree multiplier is faster than array multiplier but it comes with a disadvantage of layout complexity [4] [5]. In VLSI, layout feasibility is of great importance. Also by the decreasing the number of adders in the partial product reduction stage, the latency in the Wallace tree multiplier can be reduced. To overcome this drawback we can implement conventional Wallace tree multiplier with compressors [6]. In this paper, partial product reduction is accomplished by the use of 4:2, 5:2 and 6:2 compressors structure. By using compressors, the reduction in the number of

partial product addition is realized. This overall gives an optimized circuit dissipating low power, minimum delay and low transistor count. The new architecture enhances the performance of the Wallace tree multiplier [8].



Fig 1: Conventional Wallace tree multiplier

For high speed multiplication, compressors have been considered as the most efficient building blocks. Rather than entirely summoning partial products with the help of CSA/Ripple adder tree, in the compressor technique the partial products in a single column are together reduced. This would complete the same task in lesser time, efficiently controlling power dissipation and optimization of the area. The compressor based multiplication can be understood by the example of 5\*5 bit multiplication shown below in fig 2.



Fig 2: Multiplication with compressors[5]

Compressors are the essential building block used to speed up the multiplication process by accumulating partial products. The basic idea is to use an n:2 compressor in which n operands can be reduced to two, by doing the addition while keeping the carries and sums separate. This implies that all of the columns can be added in parallel without depending on result of previous column. Full adder is the most trivial compressor and refered to as the 3:2 compressor since it compressoes three operands into two. The next higher level compressor are introduced as follows.

#### **4:2 Compressors**

4:2 compressors were introduced by Weinberger[9], which consists of five inputs and three outputs. It compresses four partial products into two, thus offering a higher compression ratio and a more regular interconnection structure than its 3:2 counterpart. The input-output relationship of the compressor can be defined as follows:

```
x_1 + x_2 + x_3 + x_4 + c_{in} = sum + 2*C_{out} + 2*Carry [10]
```

4:2 compressor is designed by intricate connection of two 3:2 compressor. The structure has a delay of four XORs and its notable feature is that it is free from carry. The carry from the previous stage is not propagated to the next stage. The figure 3(a) below shows the I/O diagram.



Fig 3. 4:2 Compressor I/O diagram

#### **5:2 Compressors**

5:2 is the third widely used compressor of significant importance. Its block diagram is shown in figure 4. It consists of seven inputs out of which five are direct inputs and two are carry-in bits from previous stage. Similarly, there are four outputs of which two are carry-out bits to the next stage and the other two are sum and carry bits. 5:2 compressors can be designed by cascading three 3:2 compressors. The input-output relationship is governed by following equation:

$$x_1 + x_2 + x_3 + x_4 + x_5 + c_{in1} + c_{in2} = sum + 2(carry + c_{out1} + c_{out2})$$
 [10]



Fig 4. 5:2 Compressors I/O diagram Fig 5. 6:2 Compressor I/O diagram

#### **6:2 Compressors**

Another compressor used is 6:2, its block diagram is shown in figure 5. It consists of eight inputs out of which six are direct inputs and two are carry in bits from previous stage. Similarly, there are four outputs of which two are carry-out to the next stage and the other two are sum and carry bits. 6:2 compressors can be designed by cascading five 3:2 compressors.

#### III. CIRCUIT LEVEL IMPLEMENTATION

Wallace tree multiplier circuit implementation is done in 45nm technology in Cadence using Virtuoso tool. In the multiplication process, partial products are generated by AND gates and partial product reduction is done with the help of compressors. Compressors are made up of full adders. Different logic styles can be chosen for implementing full adders. In this paper, full adders are implemented using 28 transistors and 10 transistors. Towards the end we will see the comparison between Wallace tree multilpier based on 28 T full adder and 10 T full adder in terms of power consuption, delay and transistor count. Let us see circuit level implementation of 4:2, 5:2 and 6:2 compressors and their simulation results.



Fig 6. 4:2 Compressor architecture



Fig 7: Simulation result of 4:2 compressor

Fig 7. 5:2 Compressor architecture



Full adder

Carry Sum

Fig 9. 6:2 compressor architecture



Fig 10. Simulation result of 6:2 compressor

Also to further increase the speed and reduce the power consumption, the conventional CMOS 28 transistor full adder is being replaced by full adder made up of 10 transistors. One of the most significant advantages of this full adder is its high noise margins but the use of substantial number of transistors results in high input loads, more power consumption and larger silicon area. To overcome this disadvantage conventional full adder is replaced by 10 T full adder. Advantage of 10 T full adder is that it has less no of transistor count so it will consumes less power, less delay and less silicon area [11]. The schematic of 10 T full adder is shown is figure 11.



Fig 11. 10 T Full adder

#### IV.RESULTS AND COMPARISON

Figure 12 shows schematic of 4\*4 bit Wallace tree multiplier in 45nm technology. In this 4:2, 5:2 and 6:2 compressors are used which are based on 10 T full adders.



Fig 12. Schematic of 4\*4 bit Wallace tree mulitplier

Fig 13 shows the simulation result of 4\*4 bit Wallace tree multiplier in 45 nm technology.



Fig 13. Simulation result of 4\*4 bit Wallace tree multiplier

Fig 14 shows the schematic of 8\*8 bit Wallace tree multiplier in 45 nm technology. For its implementation 4:2, 5:2 and 6:2 compressors are used which are based on 10T full adders.



Fig 14. Schematic of 8\*8 Wallace tree multiplier



Fig 15. Simulation result of 8\*8 Wallace tree multiplier

Comparison is drawn between power, delay, power-delay product and transistor count. 8\*8 bit Wallace tree multiplier is implemented using 4:2, 5:2 and 6:2 compressors. Compressors are designed using 10T based full adder and 28T based full adder. For 8\*8 bit Wallace tree multiplier using conventional CMOS 28 T full adder consumes 6.95mW with a delay of 0.95 n sec [12]. As seen from table I, 8 bit Wallace tree multiplier based on 10 T full adder compressors consumes less power and are faster compared to the 8 bit Wallace tree multiplier based on 28 T full adder compressors. Also the transistor count is drastically reduced in case of compressors based on 10 T full adder.

Table I. Comparison between WTM compressor based on 10 T and 28 T full adder

| Parameters        | 8*8bit WTM (10 T) | 8*8bit WTM (28 T) |
|-------------------|-------------------|-------------------|
| Power(µW)         | 15.6              | 16.23             |
| Delay(psec)       | 32.41             | 82.66             |
| Power Delay       | 505.596           | 1341.571          |
| product (µW-ps)   |                   |                   |
| No.of transistors | 640               | 1792              |

#### I. CONCLUSION AND FUTURE WORK

In this paper, a 8\*8 Wallace tree multiplier using compressors based on 10 T full adder was designed using 45nm technology in Cadence using Virtuoso tool. Wallace tree Multiplier using compressors are better than the conventional Wallace tree multiplier in terms of speed.

### International Journal of Advance Engineering and Research Development (IJAERD) Volume 4, Issue 8, August-2017, e-ISSN: 2348 - 4470, print-ISSN: 2348-6406

From the result, it has been proved that the Wallace tree multiplier based on 10 T has less power consumption and delay as compared to the Wallace tree multiplier based on 28 T full adder. Further scope lies in finding different topologies of full adder which can further help in increasing the speed of the multiplier.

#### VI. REFERENCES

- [1] Soojin Kim, Kyeongsoon Cho," Design of High-speed Modified Booth Multipliers Operating at GHz Ranges", World Academy of Science, Engineering and Technology, 2010.
- [2] Issam S. Abu-Khater, Abdellatif Bellaouar, M. I. Elmasry, "Circuit Techniques for CMOS Low-Power High-Performance Multipliers", IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 31, NO. 10,Page No.1535-1546, OCTOBER 1996
- [3] Y.RamaLakshmanna, G.V.S.Padma Rao, N . Udaya Kumar, K. Bala Sindhuri "A Survey on Different Multiplier Techniques" ,SSRG International Jiurnal of Electronics & Communication Engineering, VOL. 3, ISSUE 3-March 2016
- [4] C.S Wallace, "Suggestion for a fast multiplier", IEEE Transactions on Electronic Computers, Vol.13, pp.14 17, 1964
- [5] Rabaey Et Al., Digital Integrated Circuits
- [6] Giuseppe Carso, Daniela Di Sclafani "Analysis of Compressor Architectures in MOS Current-Mode logic" ICEs 2010, pp.13-16
- [7] N. Ravi, A.Satish, Dr. T. Jayachandra Prasad, Dr. T. Subba Rao,"A New Design for Array multiplier with Trade Off in power and Area", IJCSI, vol. 8,issue 3, May 2011
- [8] V.G.Oklobdzija, D Villeger, S.S Liu, "A method for speed optimized partial product reduction and generation of fast parallel multipliers using an algorithmic approach," IEEE Transactions on Computers, Vol.45, pp.294 306, 1996
- [9] 4 Weinberger, A.: '4-2 carry-save adder module', IBMTech.Discl.Bull., 1981, 23, (8), pp. 3811–3814
- [10] R. Menon and D. Radhakrishnan, "High performance 5 : 2 compressor architectures", IEE Proc.-Circuits Devices Syst., Vol. 153, No. 5, October 2006.
- [11] Raju Gupta, Satya Prakash Pandey, Shyam Akashe and Abhay Vidyarthi, "Analysis and optimization of Active Power and Delay of 10T Full Adder using Power Gating Technique at 45 nm Technology", IOSR Journal of VLSI and Signal Processing (IOSR-JVSP), Volume 2, Issue 1 (Mar –Apr, 2013), PP 51-57, e-ISSN: 2319 –4200, p-ISSN: 2319 –4197
- [12] Pradeep Kumar Kumawa, Gajendra Sujediya," Design and Comparison of 8x8 Wallace Tree Multiplier using CMOS and GDI Technology"IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 7, Issue 4, Ver. I (Jul. -Aug. 2017), PP 57-62e-ISSN: 2319 –4200, p-ISSN No.: 2319 –4197 www.iosrjournals.org