# Design of a Reconfigurable Multiplier with Variable-Precision

# Sang Hyun Ahn<sup>1</sup>, Seung Bum Baek<sup>2</sup> and Kyoung Rok Cho<sup>a</sup>

Department of Communication Circuits and System Design Engineering, Chungbuk National University E-mail: ¹ahn651@chungbuk.ac.kr, ²sbbaek@cbnu.ac.kr

Abstract - Multipliers remain as a key computational element in numerous high-performance digital systems determining the overall performance of the system. In this paper, we introduce a variable-precision reconfigurable multiplier by employing vertical and horizontal control signals and compare the performance against the conventional fixed-precision multiplier in terms of power dissipation and propagation delay. The proposed multiplier has enhanced performance in terms of power reduction by 64%, area reduction by 48% and furthermore 60% improvement in propagation delay. The reconfigurable multiplier was implemented under Magnachip / SK Hynix 0.35um process and a 3.3V supply voltage. Xilinx FPGA Basys3(xc7a35tcpg-236L) board was used to verify the function of parallel operation and the performance of the implemented multiplier. As a result, the proposed multiplier shows 22.5ns a worst-case propagation delay.

# Keywords—Arithmetic Unit, Digital SoC, Multiplier, Reconfigurable, Variable-Precision

### I. INTRODUCTION

Multipliers are a major computational element in highperformance digital systems such as FIR filters, microprocessors, Digital Signal Processors (DSP), as well as the emerging high-performance IoT architectures [1-2]. The word-length of the multiplier must be at least equal to the maximum word length of the application running in the digital system. Thus, the word-length of the multiplier must be greater than the data represented in the operand [3]. A conventional multiplier has been suffered from relatively high propagation delays, high power consumption, and difficulty in fabrication to fit into a confined chip area. Innovative solutions have emerged in attempts to address high-speed operation, optimal power consumption, chip area reduction [2-4]. In this paper, we focus to design an enhanced architecture whereby the inclusion of a dynamic word-length by augmenting the input with additional vertical and horizontal control signals. In addition, we have adopted hierarchical architecture to increase flexibility and to enhance signal propagation. We compared the performance of the proposed dynamic word-length binary multiplier with

Manuscript Received Nov. 27, 2019, Revised Dec. 24, 2019, Accepted Dec. 24, 2019

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (<a href="http://creativecommons.org/licenses/bync/3.0">http://creativecommons.org/licenses/bync/3.0</a>) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

a conventional binary multiplier through the FIR filter design.

#### II. DESIGN METHODOLOGY

A. Operation process and structure of the conventional multiplier

Figure 1 shows the binary multiplication of operands  $A_m$  and  $B_m$  that  $P_n$  and that results in  $S_n$  as product and sum terms, respectively [5]. The multiplication is the shift operation of binary data producing the output  $S_n$ .



Fig. 1. Processing of conventional multiplier operation

Figure 2 shows a block diagram of an 8-bit conventional multiplier [6], that white box in the figure represents 1-bit multiplier which is shown in Fig 3. The conventional a 1-bit multiplier consists of a full adder and an AND-gate as its basic element.



Fig. 2. 8-bit conventional multiplier block diagram

a. Corresponding author; krcho@chungbuk.ac.kr

The conventional multipliers are used to compute fixed word-length of data that is diminishing computational precision. Moreover, lots of the sub-blocks of the multiplier are idle state when performing smaller bit-width multiplication rather than its maximum bit-width, which consumes unnecessary power.



Fig. 3. Conventional 1-bit multiplier block diagram

## B. Structure and operation of the reconfigurable multiplier

In this paper, we proposed a reconfigurable multiplier keeping the same resource as it of the conventional multiplier. The basic processing element in the proposed multiplier has data input signals (A and B), and vertical and horizontal control signals ( $V_{ctrl}$  and  $H_{ctrl}$ ) through an additional XOR gate as shown in Fig. 4 [7]. The output of the three-input AND gate finally inputs to the full adder along with the sum-in ( $S_{in}$ ) signal.



Fig. 4. 1-bit proposed multiplier block diagram

The control signals select the input signals A and B through the XOR gate. The XOR gate outputs logic '1' while the two inputs are different, which transfers the result of AND operation of A and B to the full adder. Thus, by setting V and H as '11110000' and '00001111' it enables grey

colored logic blocks of grey colored logic blocks. It separates the multiplier into two parts in Fig. 5. The multiplier could perform two independent multiplications at the same time.



Fig. 5. Process of the control signals with XOR gate in the proposed 8-bit multiplier

Figure 6 shows the process of reconfiguring the 8-bit multiplier into 3-bit and 5-bit multipliers by a combination of the vertical and horizontal control signals. The white-colored blocks, disabled blocks, do not operate but bypass the results ( $S_{in}$  and  $C_{in}$ ) of the previous blocks to the following blocks. The grey and black colored blocks carry out the two independent multiplications in parallel. The  $S_{00}$  to  $S_{05}$  are sum results of the 3-bit multiplication of [ $A_0$ :  $A_2$ ] and [ $B_0$ :  $B_2$ ].



Fig. 6. Block diagram of the active and inactive area on the proposed 8-bit multiplier; (a) Process of the 8-bit proposed multiplier with a control signal, (b) Array of the 8-bit proposed multiplier

### C. Performance verification through FIR filter design

We designed an FIR filter, which is a well-known application among heavily-multiplication-dependent circuits, to verify the performance of the proposed reconfigurable multiplier [4]. As shown in Fig. 7(a), the FIR filter includes four coefficients, that follow by four times multiplications for a cycle. Figure 7(b) shows a modified FIR filter architecture executing two multiplications concurrently which is redesigned with the proposed multiplier. The reconfigurable multiplier has strengths in terms of speed and power consumption because it would reduce the number of required multiplication operations by half. The modified FIR filter is implemented on Xilinx FPGA (xc7a35tics-1L) platform to evaluate the proposed structure.



Fig. 7. Block diagram of 4-tap FIR filter; (a) Block diagram of 4-tap FIR filter with the 16-bit conventional multiplier, (b) Block diagram of 4-tap FIR filter with the 16-bit proposed multiplier

Table I shows the performance evaluation of the FIR filter design introduced in Fig. 7. The FIR filter is implemented on the Xilinx FPGA with the 16-bit reconfigurable multiplier. It reduces power consumption by 64% compared to the design with the conventional one. Also, the total occupied area decreases by 48% for LUTs, FFs, and IOs. Finally, the proposed design improves computation time by 60% compared to it the conventional structure. The reconfigurable multiplier would provide variable multiplication precision with design flexibility and improves power consumption, operation speed, and area occupation for multiplication-dependent digital system applications.

TABLE I. Performance comparison of the proposed multiplier and conventional multiplier

|                  | 1                                      |                                    |
|------------------|----------------------------------------|------------------------------------|
|                  | 4-tap FIR with the conventional design | 4-tap FIR with the proposed design |
| Area(LUT)        | 239                                    | 162                                |
| Power(\(\rho\)W) | 348.1                                  | 212.4                              |
| Delay(ns)        | 12.414                                 | 7.750                              |
| clock(kHz)       | 200                                    | 400                                |

#### III. SIMULATION

The proposed reconfigurable multiplier is fabricated in full custom methodology. We designed a 16-bit reconfigurable multiplier using AND, XOR, and full adder as the basic elements under SK Hynix 0.35um CMOS technology. Figure 8 shows the top-level layout of the 16-bit reconfigurable multiplier with a very regular array structure. Table II shows the core size and the number of utilized transistors. We designed and evaluated two different multipliers on a chip: the first one is a conventional fixed bitwidth circuit and the second one is the proposed reconfigurable bit-width multiplier.



Fig. 8. Layout of the 16-bit reconfigurable multiplier

We configured two 4-bit multipliers on the 16-bit multiplier to see the reconfigurability. Figure 9 shows an overall concept of chip performance evaluation.

TABLE II. Design information

| Core size (except pad) | Number of<br>transistor<br>(full-custom<br>design) | Circuit type |
|------------------------|----------------------------------------------------|--------------|
| 5mm x 4mm              | 18432                                              | Digital      |

In Fig. 9, the test input vectors are generated by using the Basys3 FPGA board. To check the operation logics carefully, we utilized a reduced clock by 8 times. The input  $A_0$ , the fastest signal, is about 97.5 kHz. Figure 10 shows a board generating input signals for the chip.  $A_0$  to  $A_3$  signals are

logic '1', shown as their LEDs are ON, and  $B_0$  to  $B_3$  are '0' represented by their LEDs are in OFF state.



Fig. 9. Chip verification method



Fig. 10. Input data with FPGA Basys3

Figure 11(a) shows chip test results of the logical operation of multiplier with the fixed bit-width: The input A [3:0] is gradually decreasing from 15 to 9 while B [3:0] is set to 9. Thus, the output would be an A [3:0] \*9, the chip works properly as shown in the highlighted data bus (*My Bus 3*). Figure 11(b) shows computation with the variable bit-width multiplier. The 4-bit input A [3:0] value of 10 (1010<sub>2</sub>) and input B [3:0] value of 7 (0111<sub>2</sub>) are divided into two of 2-bit data. It performs two of 2\*2-bit multiplications as '10' by '01' and '10' by '11', the results are '0010' and '0110'. Connecting the two outputs makes '00100110', i.e., 38<sub>10</sub>.





Fig. 11. Simulation result of proposed multiplier; (a) Simulation of fixed bit width in 16-bit proposed multiplier, (b) Simulation of variable bit width 16-bit proposed multiplier

The chip propagation delay is decided by the critical data path of the circuits. The dark red arrows in Fig. 12 classify the critical path on the circuit. The propagation delay for the multiplication operation is the time at which the input value reaches to the last sub-blocks through each cell. The maximum time to reach the last sub-block is represented by the critical path in Fig. 12. The carry propagation mostly occurs at last thus it would be a reference signal for determining maximum operating clock frequency.



Fig. 12. Critical path within a 4-bit multiplier

Figure 13 shows the worst case of the propagation delay by 22.5ns that is the input data, 1111<sub>2</sub> and 1111<sub>2</sub>.



Fig. 13. Worst-case propagation delay of the 16-bit proposed multiplier

# V. CONCLUSION

In this paper, we proposed a reconfigurable multiplier enabling variable precision operation. The proposed architecture performs multiple multiplications on a chip depending on the input data bit-width. For example, the 16-bit reconfigurable architecture could provide many types of multiplications such as two of 8-bit by 8-bit, 4 by 4 and 10

by 4, one of 16 by 16 and so on. The reconfigurable structure has the advantage of being flexible in bit-width, so it performs parallel calculations. Thus, it could reduce propagation delay, power consumption, and area by 64%, 48%, and 60%, respectively. The performance reconfigurable multiplier is verified on an FPGA board their functionalities and implemented by a chip under Magnachip/SK Hynix 0.35um CMOS process. A 4-taps FIR filter is configured with the fabricated chip that shows a propagation delay of 22.5ns and reduces power consumption by 50% compared to the conventional multiplier.

#### ACKNOWLEDGMENT

This research was supported by the Ministry of SMEs and Startups Department's support of enterprise-linked human resource development project in 2019 [Assignment number: \$2755555].

#### REFERENCES

- [1] D. J. M. Moss, D. Boland, and P. H. W Leong, "A Two-Speed, Radix-4, Serial-Parallel Multiplier", *IEEE Transactions on Very Large Scale Integration (VLSI) System*, vol. 27, no. 4, pp. 769-777, 2018.
- [2] Y. Harata, Y. Nakamura, H. Nagase, M. Takigawa, N. Takagi, "A high-speed multiplier using a redundant binary adder tree", *IEEE journal of Solid-State Circuits* vol. 22, no. 1, pp. 28-34, 1987.
- [3] P. Kitsos, G. Theodoridis, and O. koufopavlou "An efficient reconfigurable multiplier architecture for Galois filed GF", *Microelectronics journal*, vol. 34, no. 10, pp. 975-980, 2003.
- [4] SJ. Lee, BS. Park, SW. Cho, KR. Cho, and K. Eshraghian. "Memristor CMOS reconfigurable multiplier architecture", *In Cellular Nanoscale Networks and their Applications(CNNA)*, 2014 14th International Workshop on, PP. 1-2, 2014.
- [5] R. Gnanasekran, "A fast serial-parallel binary multiplier", *IEEE transactions on computers 8*, pp. 741-744, 1985.
- [6] S.S. Mahant-Shetti, P.T. Baslsara, and C. Lemonds. "High performance low power array multiplier using temporal tiling", *IEEE Transactions on Very Large Scale Integration (VLSI)* system vol. 7, no. 1, pp. 121-124, 1999.
- [7] H J. Kang, I C. Park, "FIR filter synthesis algorithms for minimizing the delay and the number of adders", *IEEE Transactions on Circuits and Systems* 1: Analog and Digital Signal Processing, vol. 48, no. 8, pp.770-777. 2001.



Sang Hyun Ahn is currently working toward the M.S degree in the Department of Information and Communication Engineering, Chungbuk National University, Cheongju, South Korea. His reasearch interests are in the field of Intelligence Artificial and Communication Circuit Design.



Seung Bum Baek received the B.S. and M.S. degrees in information and communication engineering from Chungbuk National University, Cheongju, South Korea, in 2015 and 2017, respectively, where he is also currently pursuing the Ph.D. degree. His current research interests include VLSI design for security services targeting resource-constrained devices, mathematical modeling for

biomedical engineering applications, and embedded systems.



Kyoung Rok Cho (S'89M'92) received the B.S. degree in electronic engineering from Kyoungpook National University, Taegu, Korea, in 1977 and the M.S and Ph.D. degree in electrical engineering from University of Tokyo, Tokyo, Japan, in 1989 and 1992, respectively. From 1979 to 1986, he was with the TV Research Center, LG Electronics, Seoul, South Korea. In 1999 and

2006, he was with Oregon State University, Corvallis, OR, USA, as a Visiting Scholar. He is currently a Professor at the College of Electrical and Computer Engineering, Chungbuk National University, Cheongju, South Korea, where he is also the Director of the IC Design Education Center. His SoC platform design for communication system, and prospective CMOS image sensor, memristor-based circuit, and the design of multilayer system on-systems technology. He is currently a Professor in the College of Cheongju, Korea, where he is also a Director of the World Class University program.