Feed forward-Cutset-Free Pipelined Multiply-Accumulate Unit for the Machine Learning Accelerator

Abstract : Multiply–accumulate (MAC) computations account for a large part of machine learning accelerator operations. The pipelined structure is usually adopted to improve the performance by reducing the length of critical paths. An increase in the number of flip-flops due to pipelining, however, generally results in significant area and power increase. A large number of flip-flops are often required to meet the feedforward-cutset rule. Based on the observation that this rule can be relaxed in machine learning applications, we propose a pipelining method that eliminates some of the flip-flops selectively. The simulation results show that the proposed MAC unit achieved a 20% energy saving and a 20% area reduction compared with the conventional pipelined MAC.
 ? The proposed design method reduces the area and the power consumption by decreasing the number of inserted flipflops for the pipelining when compared to the existing pipelined architecture for MAC computation. ? The deep neural network (DNN) emerged as a powerful tool for various applications including image classification and speech recognition. ? In a machine learning accelerator, a large number of multiply– accumulate (MAC) units are included for parallel computations, and timing-critical paths of the system are often found in the unit. ? It is well known that pipelining is one of the most effective ways to reduce the critical path delay, thereby increasing the clock frequency.
 ? Number of inserted flip flops increases the pipeline stages. ? Consumes larger area and high critical path delay. ? Power consumption is high.
 • The proposed plan technique lessens the zone and the power utilization by diminishing the quantity of embedded flipflops for the pipelining. • In this area, the proposed pipelining strategy is applied to the MAC engineering by utilizing the one of a kind normal for Dadda multiplier. • The proposed MAC design the FCF (MAC with the proposed FCF pipelining) for the segment expansion and the MFCF-PA for the collection. • We accept that the proposed thought to use the one of a kind normal for machine learning calculation for progressively effective MAC configuration can be received in numerous neural system.
 ? An accumulator consists of the carry-propagation adder. Long critical paths through these stages lead to the performance degradation of the overall system. ? Based on the previously explained idea, this paper proposes a feed forward-cut set-free (FCF) pipelined MAC architecture that is specialized for a high-performance machine learning accelerator. ? Although pipelining is an efficient way to reduce the critical path delays, it results in an increase in the area and the power consumption due to the insertion of many flip-flops. ? While the conventional pipelining method is advantageous because it effectively reduces the critical path delays, it leads mostly to an increase in the area and the power consumption due to the insertion of a large number of flip-flops.
Download DOC Download PPT

We have more than 145000 Documents , PPT and Research Papers

Have a question ?

Mail us : info@nibode.com