A Multiple Gradient Descent Design for Multi-task Learning on Edge Computing Multi-objective Machine Learning Approach

Abstract

Multi-task learning technique is widely utilized in machine learning modeling where commonalities and differences across multiple tasks are exploited. However, multiple conflicting objectives often occur in multi-task learning. Conventionally, a common compromise is to minimize the weighted sum of multiple objectives which may be invalid if the objectives are competing. In this paper, a novel multi-objective machine learning approach is proposed to solve this challenging issue, which formulates the multi-task learning as multi-objective optimization. To address the time-consuming problem contributed by the multi-objective evolution algorithms, a multi-gradient descent algorithm is introduced for a multi-objective machine learning problem by which an innovative gradient-based optimization is leveraged to converge to an optimal solution of the Pareto set. Moreover, the gradient surgery for the multi-gradient descent algorithm is proposed to obtain a stable Pareto optimal solution. As most of the edge computing devices are computational resource-constrained, the proposed method is implemented for optimizing the edge device's memory, computation and communication demands. The proposed method is applied to the multiple license plate recognition problem. The experimental results show that the proposed method outperforms state-of-the-art learning methods and can successfully find solutions that balance multiple objectives of the learning task over the different datasets.

Existing System

? We now propose an efficient method that optimizes an upper bound of the objective and requires only a single backward pass. We further show that optimizing this upper bound yields a Pareto optimal solution under realistic assumptions. The architectures we address conjoin a shared representation function with task-specific decision functions. ? We use a similar construction. For each image, a different one is chosen uniformly in random. Then one of these images is put at the top-left and the other one is at the bottom-right. The resulting tasks are: classifying the digit on the top-left (task-L) and classifying the digit on the bottom-right (task-R). We use 60K examples and directly apply existing single-task MNIST models.

Disadvantages

? In this paper, we assume that each edge server has the same limited resources to handle the request of the mobile vehicle, i.e., each edge server has the same processing power and these servers are arranged at certain BS locations for mobile vehicle access. ? We treat multiple vehicles as multiple computing tasks. Multi computing tasks scheduling problem is analogous to multi task learning (MTL) model. ? Due to the sharing process that produces data, even real-world tasks that appear to be unrelated have strong dependencies. ? This causes the application of multiple tasks to become the inductive bias in the learning model

Proposed System

? Algorithms proposed in the computational geometry literature address the problem of finding minimum-norm points in the convex hull of a large number of points in a low-dimensional space (typically of dimensionality 2 or 3). ? In our setting, the number of points is the number of tasks and is typically low; in contrast, the dimensionality is the number of shared parameters and can be in the millions. ? On the accuracy side, we expect both methods to perform similarly as long as the full-rank assumption is satisfied. ? As expected, the accuracy of both methods is very similar.

Advantages

It takes advantage of the gradient of each task and solves the optimization problem while determining updates over global parameters. However, the large-scale application of MGDA is still impractical due to two technical issues. (1) Potential optimization problems cannot be extended to high dimensional gradients better, but this naturally occurs in deep networks. (2) The algorithm needs to clarify the gradient of each task, which will increase the number of backward propagations linearly, and multiply the training time by the amount of tasks.

Download DOC Download PPT