Abstract : Deep learning is recognized to be capable of discovering deep features for representation learning and pattern recognition without requiring elegant feature engineering techniques by taking advantage of human ingenuity and prior knowledge. Thus it has triggered enormous research activities in machine learning and pattern recognition. One of the most important challenge of deep learning is to figure out relations between a feature and the depth of deep neural networks (deep nets for short) to reflect the necessity of depth. Our purpose is to quantify this feature-depth correspondence in feature extraction and generalization. We present the adaptivity of features to depths and vice-verse via showing a depth-parameter trade-off in extracting both single feature and composite features. Based on these results, we prove that implementing the classical empirical risk minimization on deep nets can achieve the optimal generalization performance for numerous learning tasks. Our theoretical results are verified by a series of numerical experiments including toy simulations and a real application of earthquake seismic intensity prediction.
 The former focuses on designing preprocessing pipelines and data transformations that result in a tractable representation of data, while the latter utilizes learning algorithms related to specific targets, such as regression, classification and clustering on the data representation to finish the learning task. Studies in the second step abound in machine learning and numerous learning schemes such as kernel methods neural networks and boosting have been proposed. However, feature extraction in the first step is usually labor intensive, which requires elegant feature engineering techniques by taking advantages of human ingenuity and prior knowledge
 The first problem refers to the representation performance of deep nets, needing tools from information theory like coding theory and entropy theory The second one concerns approximation abilities of deep nets with different depth, requiring approximation theory techniques such as local polynomial approximations covering number estimates and wavelets analysis to quantify powers and limitations of deep nets. The last one focuses on the generalization capability of deep learning algorithms in machine learning, for which statistical learning theory as well as empirical processing should be utilized
 ? Furthermore, from an optimization viewpoint, large depth requires to solve a highly nonconvex optimization problem involving the ill-conditioning of the Hessian, the existence of many local minima, saddle points, plateau and even some flat regions, making it difficult to design optimization algorithms for such deep nets with convergence guarantees. ? Based on these, we provide a theoretical guidance for depth selection to extract data features by showing that deep nets with various depths, larger than a specified value, are capable of extracting the smoothness and other data features.
 ? The third way is that the weight matrix is generated jointly by both the above ways. ? Like the most widely used deep convolutional neural networks, we count the number of free parameters according to the third way by considering both sparse connections and weight-sharing . It should be mentioned that such a way to count free parameter is different from which considers deep fully connected neural networks
Download DOC Download PPT

We have more than 145000 Documents , PPT and Research Papers

Have a question ?

Mail us :