Active Domain Adaptation With Application to Intelligent Logging Lithology Identification

Abstract

Lithology identification plays an essential role in formation characterization and reservoir exploration. As an emerging technology, intelligent logging lithology identification has received great attention recently, which aims to infer the lithology type through the well-logging curves using machine-learning methods. However, the model trained on the interpreted logging data is not effective in predicting new exploration well due to the data distribution discrepancy. In this article, we aim to train a lithology identification model for the target well using a large amount of source-labeled logging data and a small amount of target-labeled data. The challenges of this task lie in three aspects: 1) the distribution misalignment; 2) the data divergence; and 3) the cost limitation. To solve these challenges, we propose a novel active adaptation for logging lithology identification (AALLI) framework that combines active learning (AL) and domain adaptation (DA). The contributions of this article are three-fold: 1) the domain-discrepancy problem in intelligent logging lithology identification is first investigated in this article, and a novel framework that incorporates AL and DA into lithology identification is proposed to handle the problem; 2) we design a discrepancy-based AL and pseudolabeling (PL) module and an instance importance weighting module to query the most uncertain target information and retain the most confident source information, which solves the challenges of cost limitation and distribution misalignment; and 3) we develop a reliability detecting module to improve the reliability of target pseudolabels, which, together with the discrepancy-based AL and PL module, solves the challenge of data divergence. Extensive experiments on three real-world well-logging datasets demonstrate the effectiveness of the proposed method compared to the baselines.

Existing System

? Existing work for domain adaptation in sentiment classification mostly belongs to labeling adaptation. ? The problem of domain adaptation has attracted increasing attention in the fields of both machine learning and natural language processing (NLP). ? The domain-independent features generally perform more consistently when the domain changes. ? We empirically show that both FE and PCA-SS are effective for cross-domain sentiment classification, and that SS-FE performs better than either approach because it comprehensively considers both labeling and instance adaptation.

Disadvantages

? In this paper, considering the practical defect diagnosis application, a novel DA approach is proposed to handle the class imbalance problems. ? Within the last decade, DA techniques have been focused on solving the above problem. ? Compared with the previous approaches, our work aims to model the manifold regularization, MVD and the instance reweighting techniques in a unified way to solve the class imbalance problem in fault diagnosis. ? To solve this non-trivial problem, ignored the intermediate density estimate, proposed a non-parametric divergence-MMD to compute the distance across domains by matching the data to the reproducing kernel Hilbert space (RKHS).

Proposed System

• A feature ensemble (FE) model is first proposed to learn a new labeling function in a feature re-weighting manner. • In this work, we propose a joint method, called feature ensemble plus sample selection (SS-FE), to take full account of these two attributes for domain adaptation in sentiment classification. • In formulating our SS-FE method, we first propose a labeling adaptation method via POS-based feature ensemble (FE). • To address this issue, we propose PCA-SS as an aid to FE. • PCA-SS first selects a subset of the source domain labeled data whose instance distribution is close to the target domain, and then uses these selected samples as training data in labeling adaptation.

Advantages

? The fundamental challenge for the generalization performance of DA approaches is to decrease the cross-domain distribution discrepancy. ? By contrast, MVD simultaneously regards the first-order and secondorder statistics, which shows better performance of marginal distribution adaptation and can bridge the cross-domain discrepancy more effectively than MMD. ? A rolling bearing dataset offered by Case Western Reserve University was employed to validate the performance of MRMI in this part. ? The smooth transmission of the object datasets can be guaranteed by mapping the global GFK into a low dimension representation, thus, good diagnosis performance can be obtained. ? This indicates that the performance of DANN decreases dramatically when the cross-domain discrepancy is substantially large.

Download DOC Download PPT