Named Entity Recognition (NER): What It Is & How It Is Used

Named entity recognition (NER) is an essential natural language processing technique that allows computers to extract meaningful information from unstructured text data. As businesses increasingly leverage big data and AI, NER enables more efficient analysis of massive text corpora across a range of applications.

Content Navigation show

In this comprehensive guide, we’ll explore what exactly NER is, how it works, its use cases, and the main approaches used for developing NER systems.

What is Named Entity Recognition?

Named entity recognition (NER) is the process of identifying and classifying key information (entities) in text into pre-defined categories such as people, organizations, locations, medical codes, time expressions, quantities, monetary values, percentages, etc.

NER scans text to detect named entities and categorize them under various predefined classes. For instance, in the sample text above, Albert Einstein is identified as a person, Ulm is identified as a location, etc.

The main goal of NER is to extract structured information from unstructured text data to make it more machine-readable. It transforms words in texts into annotated or tagged data that can be leveraged by downstream applications.

NER is a subtask of information extraction that seeks to locate and classify key information in text. However, NER focuses only on atomic elements in text that refer to specific concepts – named entities.

Applications of Named Entity Recognition

NER enables machines to extract meaningful insights from masses of unstructured text data by structuring and labeling words. This drives a wide range of applications across industries:

Customer service: NER can automatically classify customer complaints based on topics, sentiment, brands/products, and location. This allows routing to the appropriate departments and identifying systemic issues.
HR & recruitment: NER quickly scans resumes and profiles to extract key skills, education, experience to filter candidates faster.
Market research: NER extracts information about competitors, products, market events quickly from news, financial reports, social media.
Search engines & chatbots: NER allows extracting semantic search queries and intents as well as entities to improve relevancy.
Document digitization: NER structurally tags scanned documents to identify critical info like names, dates, IDs, account numbers etc.
Healthcare: NER extracts medication details, symptoms, dosages from doctor notes and health records.
Security: NER identifies threats, hackers, malware by mining logs, reports, threat intel documents.
E-discovery: NER quickly parses legal contracts to extract dates, entities, clauses, obligations.

Overall, NER automates the extraction of atomic facts stored in unstructured text data, acting as a critical capability for text analytics and data mining.

How Does Named Entity Recognition Work?

The typical workflow for an NER system consists of:

The raw text document is preprocessed to clean noise – stemming, lemmatization, lowercasing etc.
The NER model then scans the entire document to detect sentence boundaries using punctuation, cases, endings.
In each sentence, the model identifies subsequences of words (n-grams) that potentially match a predefined entity class based on context.
Every detected entity is classified under the relevant category using context – person, location, date etc. This annotates the document.
Finally, the annotated document can power downstream NLP and text mining applications.

The key challenge is in step 3 – accurately detecting entity boundaries and step 4 – correctly classifying entities to categories using local context. State-of-the-art NER systems today use machine learning, specifically statistical ML and deep learning techniques to learn these steps.

Machine Learning Approaches for NER

Modern NER systems are powered by supervised machine learning, where models are trained on human-annotated text corpora. Statistical ML techniques like Hidden Markov Models, Conditional Random Fields and deep learning approaches like RNNs, CNNs and Transformer networks have pushed NER performance to new levels.

Here‘s an overview of the main techniques:

Hidden Markov Models

Hidden Markov Models (HMMs) were some of the earliest probabilistic models used for NER. HMMs model sequence data by assuming the current state depends only on the previous state, satisfying the Markov property. NER with HMMs treat words in a sentence as observations generated by an underlying hidden state sequence of entity tags.

HMMs define probability distributions over state transitions and observations to jointly model the sequence and its labels. The Viterbi algorithm then decodes the best state sequence. A major limitation however is the assumed independence between current and previous observations.

Conditional Random Fields

Conditional random fields (CRFs) are a type of probabilistic undirected graphical model suited for labeling sequence data. Unlike HMMs, CRFs can model arbitrary, overlapping and non-independent observation features. For NER, the observations are words and entity tags are states.

CRFs can capture contextual cues like capitalization, prefixes, suffixes, grammar etc. that indicate entity candidates in a sentence. CRFs surpassed HMMs as the state-of-the-art ML approach before deep learning techniques emerged.

Recurrent Neural Networks

Recurrent neural networks (RNN) introduced deep learning techniques to sequence modeling problems like NER. RNNs process input text sequences incrementally, maintaining an internal state that implicitly captures past context. This allows modeling long-distance dependencies in sentences.

Two popular variants used for NER include long short-term memory (LSTM) networks and gated recurrent units (GRU). Both can learn contextual patterns from training data that distinguish entities without hand-crafted features.

Convolutional Neural Networks

Convolutional neural networks (CNN) offer an alternate approach via stacking convolutional and pooling layers to extract hierarchical feature representations. CNN models apply filters over windows of words (n-grams) in a sentence to extract local features. Multiple filters detect patterns like capitalization, prefixes, parts-of-speech tags and so on.

Transformer Networks

Self-attention based transformer models like BERT have delivered cutting-edge results on NER tasks. Transformer encoders capture both local features via multi-head self-attention and global context through the full sequence representation.

Fine-tuning large pretrained models on domain-specific NER data has become the standard practice today across both academia and industry.

Building an NER Model

The workflow for developing an NER model involves:

Data Collection: Curate a dataset of text documents representative of the target domain. This is manually annotated with entity labels by human experts.
Data Preprocessing: Text normalization, tokenization, vectorization.
Model Training & Validation: Train a neural network/CRF using annotated datasets. Continuously validate on held-out data.
Model Selection: Select the best model post validation based on metrics like accuracy, F1 score.
Integration & Deployment: Integrate trained model with downstream apps. Deploy into production through REST APIs, docker containers.

With cloud platforms like Google Vertex AI, Amazon SageMaker, Azure ML, the entire workflow from data curation to deployment can be simplified. They provide hosted notebooks, automated ML, experiment tracking and model management capabilities.

Challenges in Developing NER Systems

While state-of-the-art NER systems have achieved human-level performance on standard datasets, some key challenges remain:

Domain Adaptation: Models trained on news perform poorly on technical or conversational text. Retraining using in-domain data is essential.
Data Scarcity: Obtaining large amounts of annotated text data is expensive and time-consuming.
Noisy Text: NER struggles with typos, abbreviations and incorrect grammar in informal writing.
Long-tail Entities: Rare, emerging entities not seen during training are challenging to recognize.

Active research is focused on making NER models more robust. Semi-supervised learning using unlabeled data, adversarial training, multitask learning, and attention mechanisms are some promising directions.

As NLP adoption in enterprise and consumer applications grows exponentially, NER will continue to be a pivotal technology. With computational power and data availability improving, NER systems will become more accurate, flexible and scalable.