Open Named Entity Modeling from Embedding Distribution

      

ABSTARCT :

In this paper, we report our discovery on named entity distribution in a general word embedding space, which helps an open definition on multilingual named entity definition rather than previous closed and constraint definition on named entities through a named entity dictionary, which is usually derived from human labor and replies on schedule update. Our initial visualization of monolingual word embeddings indicates named entities tend to gather together despite of named entity types and language difference, which enable us to model all named entities using a specific geometric structure inside embedding space, namely, the named entity hypersphere. For monolingual cases, the proposed named entity model gives an open description of diverse named entity types and different languages. For cross-lingual cases, mapping the proposed named entity model provides a novel way to build a named entity dataset for resource-poor languages. At last, the proposed named entity model may be shown as a handy clue to enhance state-of-the-art named entity recognition systems generally.

EXISTING SYSTEM :

? In this paper, we present a novel neural network architecture that automatically detects word- and character-level features using a hybrid bidirectional LSTM and CNN architecture, eliminating the need for most feature engineering. ? We also propose a novel method of encoding partial lexicon matches in neural networks and compare it to existing approaches. ? Preliminary evaluation of our partial matching lexicon algorithm suggests that performance could be further improved through more flexible application of existing lexicons. ? Evaluation of existing word embeddings suggests that the domain of training data is as important as the training algorithm.

DISADVANTAGE :

? There have been efforts to deal with the lack of annotation data in NE recognition by weak supervision and distant supervision methods . ? However, these methods still have certain requirements for annotation resources. ? Actually, the NE recognition task is not only a data annotation problem, but also an embedding distribution task generally related to common sense knowledge representation inside human language, which is hard to be defined by a fixed NE dictionary. ? This means that there will never be an NE dictionary that can stably, sufficiently represent the NE set for a language and all current NE dictionaries have to be frequently maintained.

PROPOSED SYSTEM :

• In this paper, we present a novel neural network architecture that automatically detects word- and character-level features using a hybrid bidirectional LSTM and CNN architecture, eliminating the need for most feature engineering. • Unfortunately there are many limitations to the model proposed by Collobert et al. (2011b). First, it uses a simple feed-forward neural network, which restricts the use of context to a fixed sized window around each word – an approach that discards useful long-distance relations between words. • A well-studied solution for a neural network to process variable length input and have long term memory is the recurrent neural network (RNN).

ADVANTAGE :

? Most traditional high-performance sequence labeling models for NER are statistical models, including Hidden Markov Models (HMM) and Conditional Random Fields (CRF) [35], [36], which rely heavily on hand-crafted features and task-specific resources. ? we use the proposed named entity model to improve the performance of state-of-the-art NE recognition systems. ? The performance is evaluated jointly by two factors, calculating the ratio of NE from the dictionary that is included in the hypersphere (recall), and counting those NE inside the hypersphere but outside the dictionary (precision).

Download DOC Download PPT

We have more than 145000 Documents , PPT and Research Papers

Have a question ?

Chat on WhatsApp