Flexible DSP Accelerator Architecture Exploiting Carry-Save Arithmetic
ABSTARCT :
Hardware acceleration has been proved an extremely promising implementation strategy for the digital signal processing (DSP) domain. Rather than adopting a monolithic application-specific integrated circuit design approach, in this brief, we present a novel accelerator architecture comprising flexible computational units that support the execution of a large set of operation templates found in DSP kernels. We differentiate from previous works on flexible accelerators by enabling computations to be aggressively performed with carry-save (CS) formatted data. Advanced arithmetic design concepts, i.e., recoding techniques, are utilized enabling CS optimizations to be performed in a larger scope than in previous approaches. Extensive experimental evaluations show that the proposed accelerator architecture delivers average gains of up to 61.91% in area-delay product and 54.43% in energy consumption compared with the state-of-art flexible datapaths.
EXISTING SYSTEM :
? Existing works on coarse-grained reconfigurable data paths mainly exploit architecture-level optimizations, e.g., increased instruction-level parallelism (ILP).
? The existing accelerator architecture comprises flexible computational units (FCUs), which enable the execution of a large set of operation templates found in DSP kernels.
? The main constraint of existing DSP systems is its inflexibility. The main focus of my work is to implement reconfigurable DSP functions.
? An astounding use methodology has been shown up by the hardware animating operator for the DSP space.
DISADVANTAGE :
? Design decisions on the accelerator’s datapath highly impact its efficiency.
? However, research activities have shown that the arithmetic optimizations at higher abstraction levels than the structural circuit one significantly impact on the datapath performance.
? The aforementioned CS optimization approaches have limited impact on DFGs dominated by multiplications, e.g., filtering DSP applications.
? This experimentation targets to show that the scaling impact on the performance does not eliminate the benefits of using CS arithmetic.
PROPOSED SYSTEM :
• The proposed accelerator architecture delivers average gains in area-delay product and in energy consumption compared to state-of-art flexible datapaths , sustaining efficiency toward scaled technologies.
• There are many different types of fast adder like carry skip adder( CSK ), carry-select adder( CSL) and carry-look-ahead adder( CLA) has been developed and also there are many low-power adder design techniques that have been proposed.
• The multiplier comprises a CS-to-MB module, which adopts a recently proposed technique to recode the 17-bit P ?in its respective MB digits with minimal carry propagation.
• The proposed multirate processor exploits the features of this flexible ALU.
ADVANTAGE :
? Many researchers have proposed the use of domain-specific coarse-grained reconfigurable accelerators in order to increase ASICs’ flexibility without significantly compromising their performance.
? However, research activities have shown that the arithmetic optimizations at higher abstraction levels than the structural circuit one significantly impact on the datapath performance.
? A CS to binary conversion is inserted before each operation that differs from addition/subtraction, e.g., multiplication, thus, allocating multiple CS to binary conversions that heavily degrades performance due to time-consuming carry propagations.
? Modern embedded systems target high-end application domains requiring efficient implementations of computationally intensive digital signal processing (DSP) functions.
|