Flexible DSP Accelerator Architecture Exploiting Carry-Save Arithmetic

Abstract

Hardware acceleration has been proved an extremely promising implementation strategy for the digital signal processing (DSP) domain. Rather than adopting a monolithic application-specific integrated circuit design approach, in this brief, we present a novel accelerator architecture comprising flexible computational units that support the execution of a large set of operation templates found in DSP kernels. We differentiate from previous works on flexible accelerators by enabling computations to be aggressively performed with carry-save (CS) formatted data. Advanced arithmetic design concepts, i.e., recoding techniques, are utilized enabling CS optimizations to be performed in a larger scope than in previous approaches. Extensive experimental evaluations show that the proposed accelerator architecture delivers average gains of up to 61.91% in area-delay product and 54.43% in energy consumption compared with the state-of-art flexible datapaths.

Existing System

? Existing works on coarse-grained reconfigurable data paths mainly exploit architecture-level optimizations, e.g., increased instruction-level parallelism (ILP). ? The existing accelerator architecture comprises flexible computational units (FCUs), which enable the execution of a large set of operation templates found in DSP kernels. ? The main constraint of existing DSP systems is its inflexibility. The main focus of my work is to implement reconfigurable DSP functions. ? An astounding use methodology has been shown up by the hardware animating operator for the DSP space.

Disadvantages

? Design decisions on the accelerator’s datapath highly impact its efficiency. ? However, research activities have shown that the arithmetic optimizations at higher abstraction levels than the structural circuit one significantly impact on the datapath performance. ? The aforementioned CS optimization approaches have limited impact on DFGs dominated by multiplications, e.g., filtering DSP applications. ? This experimentation targets to show that the scaling impact on the performance does not eliminate the benefits of using CS arithmetic.

Proposed System

• The proposed accelerator architecture delivers average gains in area-delay product and in energy consumption compared to state-of-art flexible datapaths , sustaining efficiency toward scaled technologies. • There are many different types of fast adder like carry skip adder( CSK ), carry-select adder( CSL) and carry-look-ahead adder( CLA) has been developed and also there are many low-power adder design techniques that have been proposed. • The multiplier comprises a CS-to-MB module, which adopts a recently proposed technique to recode the 17-bit P ?in its respective MB digits with minimal carry propagation. • The proposed multirate processor exploits the features of this flexible ALU.

Advantages

? Many researchers have proposed the use of domain-specific coarse-grained reconfigurable accelerators in order to increase ASICs’ flexibility without significantly compromising their performance. ? However, research activities have shown that the arithmetic optimizations at higher abstraction levels than the structural circuit one significantly impact on the datapath performance. ? A CS to binary conversion is inserted before each operation that differs from addition/subtraction, e.g., multiplication, thus, allocating multiple CS to binary conversions that heavily degrades performance due to time-consuming carry propagations. ? Modern embedded systems target high-end application domains requiring efficient implementations of computationally intensive digital signal processing (DSP) functions.

Download DOC Download PPT