1. Introduction #
The Floatchat platform is equipped with its own proprietary NLU engine, designed to ensure accurate responses to user queries in chatbots built on the platform. While complex computations are involved in determining each response, we have made the platform user-friendly for chatbot building and training. The following sections provide insights into the NLP/NLU processes behind resolving each query.
2. NLP Pipeline #
The NLP pipeline consists of a series of synchronized steps to process queries and terms, ultimately providing the best response. Each component in the pipeline serves a specific objective, and the results are consolidated and reconciled to deliver the most suitable match. Let’s explore each component in detail.
2.1 Normalisation #
To reduce bias in the pattern recognition algorithm, tokens are transformed into a consistent lowercase format. This is also important as many users prefer lowercase text for chatting. Additionally, noisy characters such as punctuation can be removed to enhance downstream processing.
2.2 Tokenisation #
Input messages are divided into sentences, and sentences are further split into tokens/words. Tokens serve as the basic unit for subsequent processing.
2.3 Stop Words Removal #
Frequently occurring words like “the,” “and,” “a,” etc., known as stop words, are removed from input messages as they contribute minimally to understanding text. This helps reduce noise and improves accuracy. Custom stop words can also be added to enhance the NLP engine’s training.
2.4 Spell Check #
Floatchat includes a spell check capability to correct misspelled words in user messages. The spell check models continuously learn new words based on training data added by the Bot Administrator. It uses the “Levenshtein Distance” word distance algorithm to compute and track word frequencies in the system for identifying misspelled words. If a user query contains a misspelled word, the system suggests the best alternative based on the word distance algorithm mentioned above. The spell-check dictionary is automatically updated with each new statement added to the system. For example, if the user message is “recharge offers for prepaid with 10GB data and unlimited national calls,” it is processed as “recharge offers for prepaid with 10GB data and unlimited national calls” before further processing.
2.5 Stemming/Lemmatization #
The root form of each word is determined to eliminate affixes. The approach varies depending on the language. In some cases, lemmatization is performed to obtain the root word, while in others, stemming is used. These techniques, although different in approach, aim to distill terms to their base form for improved semantic understanding. For example, the word stem of “liking” and “liked” is determined as “like,” facilitating better comprehension.
2.6 Conversational Context #
Floatchat’s NLU capabilities enable the chatbot to maintain conversational context by tracking entities during interactions with users. This makes the conversation smoother and more efficient, as users do not need to repeat the entity in subsequent queries within the same context. The NLU engine remembers the entities used in a session by saving them as conversation context history. Detailed information on conversational context is provided in a separate section.
2.7 Named Entity Recognition #
Entities play a crucial role in identifying and extracting valuable data from natural language input. While intents help understand the intention behind a user input, entities are used to extract specific pieces of information mentioned by users. Any important data required from a user’s request is considered an entity. The platform includes built-in entities like Day/Date/Time, Quantity, Country, and more. Custom entities specific to a use case can be added, such as “unlimited national calls,” “unlimited data plan,” “fixed call plan” for telecom plans. The Bot Administrator specifies the expected entities in user queries, and the extraction engine extracts these entities accordingly.
For example, given the user query “What are prepaid plans with 10GB data and unlimited national calls?” the entity extraction engine can identify “10GB” as the data amount and “unlimited national calls” as the plan type.
2.8 Figures of Speech Determination #
Tokens/words are tagged with various figures of speech such as nouns, verbs, adjectives, adverbs, etc. These tags are used to classify the query and assign appropriate weights to different parts of speech in the final representation. This influences the responses generated by the NLP engine, resulting in more relevant matches.
2.9 Synonyms #
Synonyms are alternative words that refer to the same object or action. They are particularly relevant in domain-specific contexts and can be used for cases of common misspellings, abbreviations, and similar variations. Multiple synonyms can be added, and the system automatically identifies matching synonyms and adds them to the list. Irrelevant synonyms can be removed as needed.
2.10 FAQ Disambiguation #
The Related Match feature enables the chatbot to provide not only the best response from the training data set but also other options considered closely related to the top match. The number of options shown to the user depends on how many trained responses match the query and the distribution pattern of their scores. The system also determines the most well-formed statement or phrase from the set of variations and presents it as the alternative response.
2.11 Semantic Match #
Semantic matching involves understanding the text semantically and identifying the best response based on that understanding. FAQs and documents uploaded into the bot are broken down into logical tokens and semantically matched to find the most suitable response. Floatchat’s NLP/NLU pipeline automatically ranks and identifies the best response for a user query from multiple possibilities.
If you encounter any issues, please don’t hesitate to reach out to firstname.lastname@example.org.