Cloud versus Edge Deployment Strategies of Real-Time Face Recognition Inference
ABSTARCT :
In this paper, we present a real-world case study on deploying a face recognition application, using MTCNN detector and FaceNet recognizer. We report the challenges faced to decide on the best deployment strategy. We propose three inference architectures for the deployment, including cloud-based, edge-based, and hybrid. Furthermore, we evaluate the performance of face recognition inference on different cloud-based and edge-based GPU platforms. We consider different types of Jetson boards for the edge, and various GPUs for the cloud. We also investigate the effect of deep learning model optimization using TensorRT and TFLite compared to a standard Tensorflow GPU model, and the effect of input resolution. We provide a benchmarking study for all these devices in terms of frame per second, execution times, energy and memory usages. After conducting a total of 294 experiments, the results demonstrate that the TensorRT optimization provides the fastest execution on all cloud and edge devices, at the expense of a significantly larger energy consumption (up to +40% and +35% for edge and cloud devices respectively, compared to Tensorflow). Whereas TFLite is the most efficient framework in terms of memory and power consumption, while providing significantly less (-4% to -62%) processing acceleration than TensorRT.
EXISTING SYSTEM :
? In order to do face recognition with the pre-existing AWS services, you must make a request from the edge device to the AWS cloud to use their Rekognition face recognition service, which involves sending the image frame containing a face to the cloud.
? We explore two simple types of potential existing camera topology and provide our recommended architecture for optimal performance.
? The need that exists in current state-of-the-art systems, which is to offload both requests and image data to a local cloudlet or the cloud itself, adds extra latency on top of the time it takes to perform inference on an image.
? Face recognition is a significantly different task, as it involves identifying whether two faces belong to the same person, not just whether or not a face exists in the frame.
DISADVANTAGE :
? The proposed approach investigates the problem of video stream analytics by proposing (i) filtration and (ii) identification phases.
? The video analytics problem is decomposed into stages which must be completed in a serial order to maintain their conceptual integrity and completeness.
? These components may reside on one or more resource depending on the specific analytics problem.
? However, cloud-based stream processing suffers from latency issues as it takes time to transport the data to the cloud.
? To mitigate this issue, edge computing has been proposed to provide compute and storage resources near to the source of the data.
PROPOSED SYSTEM :
• Many academic research projects, startups, and companies have proposed solutions in this space .
• Furthermore, this view of the world poses a new, real challenge for systems and computer architecture researchers – proposed mobile hardware optimizations and accelerators need to consider the long IP lifetime.
• In this paper, we propose to break the to-date binary design space of AI inference on the edge, particularly through a best-of-both-worlds system and approach termed Semantic Cache.
• With the pipeline of processing tasks proposed by MIF, each task fuses the low-abstraction input data to extract as output some information at a higher level of abstraction.
ADVANTAGE :
? This scalability is achieved by utilizing hardware resources, including multiple edge, in-transit and cloud nodes, to continuously provide high-performance stream analytics.
? However, performance is a major concern for cloud-based stream analytics systems due to network latency and limited bandwidth.
? The filtration phase allows the early discarding of low-value data to improve performance.
? A placement algorithm is proposed and implemented to optimally map the pipeline stages onto computational resources that are part of the HV architecture, to achieve real-time Quality of Service (QoS) performance requirements.
? The priority of Vigil and Gigasight is to conserve bandwidth while our priority is to optimize performance while saving as much bandwidth as possible.
|