ISSN (Print): 0974-6846 ISSN (Online): 0974-5645 # A VLSI Architecture of Root Raised Cosine Filter Using Efficient Algorithm N. Nivedha\* and R. Muthaiah 1PG Scholar, School of Computing, SASTRA University, Thirumalaisamudram, Thanjavur – 613401, Tamil Nadu, India; nivedha26rathiga@gmail.com, sjamuthaiah@core.sastra.edu #### **Abstract** This paper describe about a reduction of area and power using Vertical and Horizontal Common Sub-Expression Elimination Algorithm in root raised cosine filter. The common sub-expression elimination algorithm is commonly used to reduce the number of adders present the given multiplier architecture is done by reducing MPIS (Multiplications per Input Samples) and APIS (Additions per Input Samples). In the 2bit and 3-bit BCSE algorithms, shift and add method was proposed. Those provide less area than the normal multiplier. Hence algorithm is proposed to further reduce the area utilization, power consumption. Area is utilized by reducing the number of gates in the architecture. This algorithm used for implementing higher adder filters with very few adders and few stages. The proposed algorithm is used to multiply both the signed and unsigned constant multiplication. Here the adjacent co-efficient are grouped by 2bit, 4bit and 8bit grouping as per the horizontal and vertical common sub-expressions. The Shift and add method is used to reduce the number of adders. Similarly multiplexers are used for switching activity to reduce the adders and multiplication stages. These operations are applied to the RRC FIR filter with the different standards. The whole work is done in Quartus II 9.2 web edition, cyclone III. **Keywords:** 2-bit BCSE, CSE, RRC Filter, VHBCSE Algorithm # 1. Introduction In the Digital Signal Processing, filter is used to remove unwanted noise in the signals. Multiplier is the key element to all the filters for high performance in signal processing. In wireless transmission, RRC filter is used in chip stream output for pulse shaping before modulating it to radio frequency. Here they limit the spectrum bandwidth to avoid interference<sup>1</sup>. In the FIR filter, RRC filter is used for transmitting and receiving signal in order to perform matched filter. This reduce the radio frequency content and enhance the flexibility of receiver which emphasize DSP2. In the signal intelligence and mobile wireless communication application we use SDR system. In this recommended paper, we have consequent that reduce power and area. This sub-expression paper proposed an algorithm in order to refine FIR filter and achieved the minimum number of adders3. In this number of full adders are reduced in FIR and IIR filter using coefficient partitioning approach<sup>4</sup>. In the three-bit binary common sub-expression based Binary Common Sub-Expression Elimination (BCSE) algorithm and shift-add method was used to produce the partial products. Disadvantage behind this paper was even numbers can't be given as an input in 3-bit BCSE algorithm. In this vertical and horizontal CSE is used in order to reduce power and area<sup>5-7</sup>. This paper provides proper trade-off between low power consumption and performance deprivation8-9. The technique used in this paper optimize the bit width and hardware resource in FIR filter, it also reduce the area cost .In this they used modified disturbed arithmetic technique in order reduce power and area and this also reduces memory compared to disturbed arithmetic<sup>10</sup>. The consumption of power can be reduced in the FIR filter using CSD technique. In this paper, a low-complexity architecture based on binary CSE (BCSE) algorithm has been implemented. This technique uses a reduced quantity of area and power than CSD-CSE convene by a shift-add block. This further hardware practice utilize extra area and power, and makes the intend <sup>\*</sup>Author for correspondence inappropriate for SDR system where power and area consumptions are reduced. # 2. Existing System # 2.1 Common Sub-expression Elimination (CSE) Method ### 2.1.1 Basic Concept The number of adders and multipliers are used to decide the difficulty of the FIR filter. A CSE technique is used to reduce the hardware. This technique is represented in binary form and used for designing higher order filter. The main aim of CSE is to discover multiple occupancy of similar pattern of bits in the coefficients of CSD and to discard the unwanted multiplication in order to reduce the logical operators. In 2-bit BCSE algorithm, multiplication takes place between input and coefficient takes place in which it compares the 2-bit pattern wise. In partial product generator shift and add operation take place in which it eliminates the common terms in coefficient. The output is given to the multiplier unit in which its select according to pattern 00-11. Then output of multiplexer M7-M0 are given to adders A1-A4. The output of these adder are given to the adder A5-A6. Final output is given to the accumulation unit. # 2.2 Issues in 2-bit BCSE Algorithm In this algorithm, adders usage is high which causes area and power consumption in the algorithm in order to reduce the adders usage VHBCSE algorithm in RRC filter. This algorithm has control logic generator used to select the adder which is needed to complete particular multiplication. # 3. Proposed System # 3.1 Reconfigurable Root Raised Cosine Filter The given block diagram in Figure 1 represents the RRC FIR filter of VLSI architecture using VHBCSE algorithm. In this architecture we have two parameters namely interpolation factor (INTPL\_SEL) and roll off factor (FLT\_SEL). Figure 1. Architecture of RRC filter. The source clock is divided into three namely CLK4 by four, CLK6 by six, CLK8 by eight, Coefficient of INTP\_SEL and RESET which is given as the inputs to data generator, then the output is taken as a 16 bit each and it is given to coefficient generator block as the input with the INTPL\_SEL and FLT\_SEL to produce a output data and forward it to coefficient selection lines. This block perform multiplication between inputs and filter-coefficient. The coefficient generator output is given to the coefficient selection blocks by performing VHBCSE algorithm. After selecting the coefficient from the coefficient selector, it is given to the accumulation unit in order to perform addition operation. ### 3.1.1 Data Generator Block The input data (RRCIN) can be evaluated in this block by selecting the INTP\_SEL and clock signal. The input data is examined according to values of selection lines of multiplexer. In this design we use 25, 37 and 49-tap filters, according to taps we will produce a data generator DG [6:0]. Then the output is given to the coefficient selection unit which produce a 16 bit output in accumulation unit. ### 3.1.2 Coefficient Generator In this block takes the input from data generator DG [6:0] with three filter coefficient namely 25-,37- and 49- and interpolation factor, roll-off factor and produce CS [21:0]. Then it is given to coefficient selector block in which it has 8 block in each block has three 16-bit input. ### 3.1.3 Coefficient Selector In this block, the inputs and filter coefficients are multiplied. It consist of first code pass, second code pass, partial product generator, multiplexer unit and accumulation unit. In order to reduce hardware usage, the two-phase optimization technique is used. In the RRC FIR filter, to guide suitable data to the accumulation unit CS block is used. It's based on the interpolation factor. The input to the coefficient selector is taken from the co-efficient generator block as shown in the Figure 2. Figure 2. Architecture of coefficient selector. #### 3.1.3.1 First Code Pass (FCP) The output is taken from the data generator for the different coefficients (25-,37-,49-). Each coefficient is divided into two sets of 25-,37-,49-preferred by selecting roll-off factor as shown in the Figure 3. In this block we have three code pass which runs simultaneously with three interpolation factor. Event of checking coefficient of same length filter by searching all the bits vertically. **Figure 3.** Block diagram of FCP for 25- and 37- filter coefficient. Figure 4. Block diagram of SCP. ### 3.1.3.2 Second Code Pass (SCP) The output of first code pass is taken for different set of coded coefficient and then it is send to the final coefficient set as shown in the Figure 4. After selecting the coefficient it is given to the partial product generator. # 3.1.4 Vertical and Horizontal Binary Common Sub-expression Elimination Algorithm In this algorithm, X as the input, D as the coefficient and Dm as the coded coefficient as shown in the Figure 5. Figure 5. Block diagram of VHBCSE algorithm. # 3.2 Sign Conversion Block This block take signed decimal format data in order to represent the input and the coefficient .It takes 1's complement to produce reversed form of 16- bit leaving MSB bit of coefficient. Then it is given to the 2:1 multiplexer. For the negative value of original coefficient, the multiplexer will produce complemented form it as an output; or else the same output as shown in the Figure 6. ### 3.3 Partial Product Generator The multiplication operation between input and the filter coefficient is done by shift and add method in order to generate partial product. This technique is used to recognize the common sub-expression to eliminate the common terms in coefficient by shift and add method. In 2-bit BCSE compare the bits by 2 bitwise ranges from 00 to 11 .Among four pattern, pattern 11 require adder. This assists to reduce hardware and improve speed in multiplication operation. ### 3.4 Control Logic (CL) Generator This block takes the coded coefficient (Dm[15:0]) as its input and collects it into a group of 4-bit each (Dm[15],[14],[13],[12]), (Dm[11],[10],[9],[8]), (Dm[7],[6],[5],[4]) and (Dm[3],[2],[1],[0]) and another group into a 8-bit each (Dm[15 to 8], Dm[7 to 0]). Depending upon the algorithm, generator will generate 7 control signal for the output from the data generator with the coefficient of 25,37,49. It has comparator to compare the two 4 bits. Figure 6. Architecture of sign conversion. ### 3.4.1 A 4bit Comparator This block consist of 4 XNOR gate in which the first gate will compare the bit Dm[15] and Dm[11], second gate compares Dm[14] and Dm [10] likewise third and fourth gate. Then its output is given to the AND the output as C1. The same process continuous till it generates C2-C6 as shown in the Figure 7. The C7 is produced by taking AND operation between C2 and C5. Figure 7. Architecture of 4-bit comparator. #### 3.4.2 A 8-bit Comparator This block consist of 8 XNOR gate in which first gate will evaluate the bit Dm[15] and Dm[7], second gate compares Dm[14] and Dm[6] and this is repeated for all the gates. Then the output is given to the AND gate as shown in the Figure 8. Figure 8. Architecture of 8-bit comparator. # 3.5 Unit of Multiplexer According to the coded coefficients, the multiplexers M7-M0 will provide the suitable data from the partial product generator unit. The output of each 4:1 multiplexer is given to the addition layer2 consist of adders A4-A1. # 3.6 Addition Layer2 The output of the multiplexer unit is given to addition layer2 in which it has 4 adders involved in it. It evaluates Dm[15:12] with Dm[11:8]in this layer. While comparing equivalent value is found then omit the output of the adder A2. As a replacement, use the shifted output of adder A1 by 4bit right which can be taken as the input to the adder A5. Or else use the output of adder A2. Likewise compare the remaining bits as shown in the Figure 9. Figure 9. Architecture for addition layer2. # 3.7 Addition Layer3 The output of the layer-2 is given to layer-3in which it has 2 adders. It evaluates Dm [15:8] with Dm[7:0] in this layer. Skip the output of adder A6 if match is found. As a substitute, use the shifted output of adder A5 by 8bit right and it can be taken as input of adder A7.Or else output A6 is taken as shown in the Figure 10. ### 3.8 Final Accumulation Unit Final addition takes place in the accumulation unit. It consist of six adders and six registers. It produces multiplication result of the input and the coefficient. **Figure 10.** Architecture for addition layer-3. # 4. Result and Discussion In proposed work, VHBCSE algorithm, Quartus II, cyclone III and EP3C16U484C6 are used to analyze the power and area of the system. We observe that, power and area has been reduced when compared with the existing which is shown in Table 1. The power obtained is 13.45 and area is reduced to 30% whereas the existing results gives power of 17.46 mw and 36.63 % of increase in area. As shown in simulation result in Figure 11. Figure 11. Simulation result. Table 1. Power and area report | METHOD<br>USED | POWER<br>SUPPLY<br>(V) | LENGTH | POWER (mW) | AREA (%) | |-----------------------|------------------------|-------------------|------------|----------| | VHBCSE<br>Algorithm | 2.5 V | 49 tap<br>(16x17) | 13.45 | 30 | | 2BitBCSE<br>Algorithm | 1.8 V | 49 tap<br>(16x17) | 17.46 | 36.63 | # 5. Conclusion A Root Raised Cosine filter by means of vertical horizontal binary common sub-expression elimination algorithm to cut back on the power and area expenditure is compared to 2-bit BCSE algorithm. The power and area is reduced and the tabulation for the comparison results are presented. This method reduces the power and area at the cost of time. Its applicable in areas where power and area are the major constraint and where time is of less importance. Thus the simulation results are verified and presented. # 6. References - 1. Joost M. Theory of root-raised cosine filter [Internet]. 2010 Dec. Available from: http://www.michael-joost.de/rrcfilter.pdf. - 2. Mitola J. The software radio architecture. IEEE Communications Magazine. 1995 May; 3(5):26–38. - 3. Yu YJ, Lim YC. Optimization of linear phase FIR filters in dynamically expanding sub-expression space. Circuits Systems Signal Process. 2010 Feb; 29(1):65–80. - 4. Vinod AP, Lai EMK. An efficient coefficient-partitioning algorithm for realizing low complexity digital filters. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems. 2005 Dec; 24(12):1936–46. - Mahesh R, Vinod AP. New reconfigurable architectures for implementing FIR filters with low complexity. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems. 2010 Feb; 29(2):275–88. - Durairajaa N, Joyprincy J, Palanisamy M. Design of low power and area efficient architecture for reconfigurable FIR filter. International Journal of Recent Technology and Engineering. 2013 Mar; 2(1):1–6. - Hsiao SF, Jian JHZ, Chen MC. Low-cost FIR filter designs based on faithfully rounded truncated multiple constant multiplication/accumulation. IEEE Transactions on Circuits and Systems—II: Express Briefs. 2013 May; 60(5):287–91. - Deepika A, Bhuvaneswari A. Low power FIR filter design using truncated multiplier. International Journal of Engineering Trends and Technology. 2014 Apr; 10(1):1–6. - Lee SJ, Choi JW, Kim SW, Park J. A reconfigurable FIR filter architecture to trade off filter performance for dynamic power consumption. IEEE Transactions Very Large Scale Integration. (VLSI) Systems. 2011 Dec; 19(12):2221–8. - Pillai R, Beulet PAS. Design and implementation of low-power, area-efficient FIR filter using different distributed arithmetic techniques. Indian Journal of Science and Technology. 2015 Sep; 8(21):1–5. DOI: 10.17485/ijst/2015/v8i21/79128.