Depth decrypt the concern of Giants chat robot

Lei feng’s network (search for “Lei feng’s network” public attention) by writer Zhang, articles will be decrypted in detail 1) chat bot to solve three problems; 2) and the models they use.


Chatbot is very recently a word or an application of fire, not just the news media after the storm BOT concept, the Giants also spending huge resources on research and development, brush out the BOT-related paper on the arXiv is more the norm. Hype, hype, PR to the PR, have to say that one embarrassing fact is really difficult to find a really good bot on the market. Bot in accordance with related areas, is divided into an open field (open-domain) and specific task-oriented (task-oriented) bot. Open the domain to do very large, more like a platform that can do anything, no matter what your needs, it can be solved, sort of true AI means, while the bot task-oriented focus on doing one thing, booking, reservation, passport, and so on.

Said open domain bot, everyone contact up of is some answered very does not make sense of entertainment with bot, like many years Qian active in the big social website Shang of small yellow chicken, now market Shang active with many known as master has bot technology, in with depth learning solution bot technology of BOT company, are is this, solution can’t what actual problem, is can and everyone chat Shang two sentence, and many when answered are is irrelevant of, is funny.

Say task-oriented bot, market Shang up of is customer service robot, Bank or, electric business or, didn’t want to repeat sex to answered user of problem, on with a customer service robot to should, and not said effect how, development a specific task of BOT need fee many time, and late also to large of maintenance, because too more of hand crafted features was with to, whole bot of framework horizontal extended sex relative for poor, For one scene is basically just needs to develop a human costs are too high.

Ideal for BOT very plump, big scene at the company also really beautiful, but realised the BOT was poured a bucket of cold water. Expectations high, the greater the disappointment. If the media is always touted bot, as if the whole world tomorrow will be the bot, BOT are not beneficial development, pengsha will only lead to bubble, after the collapse, all the same.

Function powerful of, and open domain of BOT in short-term within is compared difficult achieved of, but if reduced expects, will bot not should do is a technology level of revolution, and should do interactive level of innovation is rational of attitude, bot as a entrance, may everyone are no longer need a carry carry of Terminal, only need found a can recognition identity, can networking of hardware, like side mirror, on can implementation many of task, set tickets, and buy things and so on and so on. BOT at this time is an action entry and perform various task behind the black box, we do not have to see the whole process, nor need to know what is, through simple language interaction, will be able to complete the complex task, the Terminal to do is feedback and receive input, process performed in the cloud, the bot cloud.

And all of this is the key to solve the task-oriented bot, with more data driven solutions to replace the traditional manual features and templates.

| Description

Bot is a comprehensive issue, involves the following three main issues: Dolce Gabbana iPhone plus Case

1、response generation(selection)

Dialogue is the last step, is part of the output. Under summary, there are four solutions:

Solution 1 directly from the context to generate dialogue, this recent paper is too much, especially after the seq2seq+attention framework has swept much of NLP tasks, dialog generates the benchmark model refresh over and over again. Dialogue generation problem, is defined as a build model based on a typical predict words based on the context, involves the question of sentence generation, evaluation can be a difficult problem.

Solution 2 of course paper is not a dialog build definitions for language modeling problems, but one of the next utterance selection issue, a multiple-selection of a problem, given a context, a given utterance candidate list, select one from the list as a response, of course, the difficulty of this type of problem is much smaller, is also very easy to evaluation, Data set ready to spend some time, and bad experience in the practical application.

Solution 3 rule-based or template-based,response end is actually filled in the form of a template, most things are given, only some specific value you need to fill. This type of solution is suitable for the task-oriented bot, too many artificial features and templates lead on difficult to transplant it to a different task.

Solution 4 query-based or example-based,response are from a database called the knowledge base, which contains a large, rich example, based on the user’s query, find the closest example, to return the corresponding response as the output. This type of solution is ideal for entertainment, comedy by the bot, the core technology is to find more data to enrich the knowledge base, to wash the base. But respnose from someone out there may be funny, but most will be irrelevant.

2、dialog state tracking(DST)

Some paper called DST belief trackers, this component is in fact the core of BOT, its role is to understand and capture the user intention or goal, only when you really know what users need, you can make the right action or response. On this section, there will be a Dialog State Tracking Challenge game. In General will be given a State-wide, to predict which state the user belongs to by context, what needs are need to query the weather or to locate the train tickets.

3、user modeling

BOT-oriented businesses are dealing and real user, if only a simple FAQ bot, answer some common questions you may not need this, but if it is more complex and delicate business, needed for user modeling, same problem, bot answered everyone’s response must be different, the reason is very simple. User modeling, involves more than just simple user basic information and some explicit user feedback, and more importantly users history of conversations, these implicit feedback. Before you fire up like recommendation system, everybody is quite satisfactory to sell stuff, but there are some smart analysis of users ‘ behavior, not only those who praise acts, more of those users inadvertently left the “clues” to know that the user is interested in what is potentially, then recommend what system do. For the modeling of user, is to be an individual BOT, each response that is generated has the user characteristics.

Corpus |

Large corpus is used to train the bot open Domain dialog model, data sources generally come from social networking sites. So far as the task-oriented bot, customer data is very small in size, it is also difficult to be data driven solution directly to one of the main reasons on the task-oriented bot.

[1] survey of BOT training corpus, interested students can be read about the survey.

From [13], the English’s corpus does more, Sina Weibo that the corpus laboratory of Huawei’s Noah’s Ark release [12]. From Twitter or have a bot on Twitter data, “conversational in nature” than the data generated from this chat room Ubuntu chat logs is more suitable for training response model, since the more natural pollution-free. [5] also used a large corpus, data from Baidu.

Model |

Research paper of the bot is too much, this is a very active area of research, segmentation is also very much in the direction of, according to the research question, then to introduce some models.

Seq2seq model

Now the most popular solution is the seq2seq+attention,encoder user query feed in, outputs a vector representation to represent the entire query, and then as a condition of the decoder and decoder is essentially a language model, step by step, to build response,[2] is such a programme, Google uses a ton of training parameters such a model, get a good bot.

Typical seq2seq there is a problem, that is likely to generate some “huh” response, that is something very safe,grammatical but no meaningful response, such as “I don’t know!” Or something like that. Because the traditional seq2seq in the decoding process are based on the MLE (Maximum Likelihood Estimate) as the objective function, which generated the most grammatical words, rather than the most useful, these safe sentence in large numbers to appear in the training corpus, after learning, inevitably always produce such a response, and the article [3] speech recognition experience for reference, In decoding with MMI (Maximum Mutual Information) as the objective function, improving the diversity of the response.

[4] that resemble the RNNLM of language models in high quality of generated words simply because without dealing with hidden feature or random noise in the utterance, thus generating next token (short term goal) and future tokens (long term goal) general effect.

When you build your each utterance, using four parts, encoder RNN, and context RNN and the latent variable and the decoder RNN, according to the order of input and output. Here the latent variable and IR LSI a little similar, latent indicates that we don’t know what they are, but probably represent a topic, or sentiment, is a reduction of the representation.

[5] proposed a technique called content method to generate a short response of the introducing.

Step 1 after the given query, predicting a topic keyword as a response, the topic of speech are noun, this keyword does not capture the complexity of meaning and grammar, but according to each word of the query to forecast a PMI (Pointwise Mutual Information) the highest term as a keyword.

Step 2 [5] model call Sequence To Backward and Forward Sequences, the first backward step, given a query, encoder Gets a context,decoder part of a given keyword, as the first word, then decoding, build this part of the equals keyword word in front of the section ; Is the next forward step, and is a typical seq2seq, represented by the encoder will query context, and then given backward and keyword as the first half of the decoder and continues decoding the latter part. Briefly describe the process as a whole:

step 1 query + keyword => backward sequence

step 2 query + keyword + backward sequence(reverse) => forward sequence

step 3 response = backward (reverse) sequence + keyword + forward sequence

User modeling model

[6] in question is the response the problem of inconsistency in the round of talks, the user identity (such as background information, user picture, age, and so on) taking into account the model, build a personalized seq2seq models for different user, as well as the same user on different generates a different style of response.

[6] model called Speaker Model, is a typical seq2seq model is different in the decoding section adds a speaker embedding, similar to the word embedding, except to say that users are modeling here. Because user information cannot be explicitly modeled, using a method for embedding, trained to be speaker vector, the following figure on the left is represented on speaker vector in a two-dimensional plane, with similar background information of the user will be very close, and the word vector a truth.

Reinforcement learning model

Enhanced learning to interactive problem-solving has a long history, but with the hype of AlphaGo, deepmind will enhance the learning back to the stage, combined with deep learning to solve some of the more difficult problems.

Enhance learning with a long term reward as the objective function, will make the training model can predict the response of higher quality, article [7] propose a model framework, and has the following capabilities:

1. custom integration developer the reward function, to achieve the goals.

2. generate a response later, can be used to describe the effect of response to subsequent phases.

Two bot in the dialog class when given an input message and bot1 generates 5 candidate response based on input, and proceed, because each has a 5 response input, with the increase in turn, exponential growth response, each round of dialogue, select 5 by sample, as this response.

Training in a large data set a good seq2seq as the initial value, using reinforcement learning to enhance the model’s ability to implement custom reward functions in order to achieve the expected results.

[7] the model can generate more rounds of dialogue, rather than prematurely into an infinite loop, generate dialogue and diversity is very good.

Task-oriented seq2seq model

Existing task-oriented bot is using rule-based, template-based or example-based, or integrated with, data driven solutions to extremely rare. Article [8] and [9] is to try the BOT on the individual parts of the technical depth to do and come up with a practical plan.

[8] first start from a familiar scene introduces how an experienced customer service with a new service, divided into four phases: Dolce & Gabbana iPhone 6 Plus Case

1. new customer service which “controls” are available, such as: information on how to find customers, how to determine their identity, and so on.

2. new good examples of imitative learning to obey the old customer service.

3. the new customer service trying to service customers, old customer service promptly corrected his error.

4. the old customer service an absentee, the new customer service customer service alone, continuous learning, and experience.

[8] model is designed in accordance with the procedures of the framework:

Developer offers a range of alternative actions, including response and some API functions, used to be called bot.

Experts offer a range of example dialogues using RNN learning.

Simulating user with a randomly generated query,bot response, correct expert.

BOT on-line services, and real customers to engage in a dialogue to improve bot service quality through feedback.

A complete workflow from the above diagram describes specific steps below:

Training is part of the quality of monitoring data to learn SL, RL with enhanced learning model received higher-quality results.

[9] the balance of advantages and disadvantages of two popular programs, presented a set of valuable, practical seq2seq solution.

A total of five components:

1、 Intent Network

The encoder part of the part can be understood as seq2seq would encode user input into a vector.

2、 Belief Trackers

Also known as Dialogue State Tracking (DST), are core components of the task-oriented bot. This Belief Trackers have the following roles:

Support various forms of natural language is mapped into a finite slot-value element in the collection, for the query in the database.

Tracking the State of BOT to avoid learning that no amount of data.

Use a weight tying strategy, you can greatly reduce the need for training data.

Extensible new component.

3、 Database Operator

Belief Trackers-input from the output of a database query, the probability distribution of the slot, the DB maximum input, query gets to the appropriate value.

4、 Policy Network

This component is like a glue, play the role of bond above the other three components. Input is the output of the above three components, the output is a vector.

5、 Generation Network

Last component is to generate model is essentially a language model, the input is the output of the Policy Network and the output is a generated response, after some processing steps can be returned to the user. Here, the treatment is to slot in the response, such as and returned to the true value. This step and the article [8] step 101 will be specified on the restore value to the entity.

Completely solved with end-to-end task-oriented is impossible things, must be in a framework or system using the seq2seq solution to do this thing, [8] and [9] gives a lot of inspiration.

Knowledge Sources based model

Pure seq2seq can solve a lot of problems, but if for a specific task, in seq2seq on the basis of additional related knowledge sources will make many good results. Knowledge here can be sources of unstructured text, such as article [10] Ubuntu manpages, can also be a structured business data, such as articles [9] in the database, or it can be a source of data and business data to extract knowledge graph.

[10] the authors define a bot task as next utterance classification, a bit like question answering tasks, given a context and a response candidate list as an alternate answer, use context to choose the correct response from the candidate list. Contribution of this paper is based on the context, introduced the task relevant external expertise, and the knowledge base is structured.

Model is composed of three RNN encoder, one to encode context RNN, RNN to encode a response, there is an RNN to encode knowledge, and then combine to make predictions, to choose the most appropriate response. Model is called knowledge encoder. Because the DataSet using the Ubuntu technical support-related data sets, external resources will use Ubuntu manpages.

Context sensitive model

[11] the model is relatively simple, but means a lot of issues, history of information modeling for the bot is a great help in solving practical engineering applications, determines whether your bot can work. Author history context Word bag model, instead we use the RNN, context and user query is then passed through a simple FNN, get an output.

Evaluation |

BOT response evaluation is difficult, although BLEU can learn from the automatic evaluation of machine translation method to do, but not very good. Almost every paper is money to hire people to do manual evaluation, designing an evaluation mechanism to rate, human evaluation is more persuasive. Is especially true for practical application, users say is really good. Rather than simply holding their own, biased indicator, and several methods of comparing the bot or any other company, to explain himself.

Thinking |

Read the paper, and after the bot application engineer of communication, reflection, summarized as follows:

1, do you want to do bot? Popular is a theory is no easy to use bot on the market, to solve the problem of BOT needed a lot of technology and progress, may take a very long period of time, to do business with this thing now, is ridiculous. My personal view is that the bot to solve specific task, combining advanced technology, do some framework tool, not something so far, although it is not easy, but it’s very meaningful, solve the problem of vertical field of BOT, it is possible to solve the open domain problem of BOT. Is also not easily, raising the bar, real opportunities arise, the birth of some great technology companies.

2, open domain is task-oriented? If it were me, I would choose the latter, because the former is only a dream, a distant dream, more technical breakthroughs are needed. Task-oriented more specific, more practical, specific business, offer some solutions, there are already many companies do, although a generality or extensibility solution also did not appear, but a trend is a new generation the opportunity of companies doing bot.

3, task-oriented why the bot, which direction the force? End-to-end is a idealized of model, with depth learning model from large training data in the to “capture” some features, “intends collection” some function, although can get is good of effect, and using up does is convenient, but embarrassing on embarrassing in specific of task in the is took not to mass data of, data scale small has zhihou, purely of end-to-end on became very chicken has. In real-world scenarios, however, many enterprises have certain data or have a bot needs, so now is a mature solution for your specific business, to designing features,templates and rules, when the customer’s business changes, need to continue to maintain the existing bot system is very time-consuming. Real scene often involves a lot of structured business data in purely to directly generate response based on context, violence is possible, the article [8][9] are very enlightening solutions are given, use end-to-end in local, rather than a whole, coupled with technologies such as Information Extraction and Knowledge on Graph, achieving a highly available architecture, This should be a task-oriented bot development direction.

4, response generation and what should be on this? Response quality is good or bad, you need links to these features: (1) the user query, a user’s question, what users are asked in this round of dialogue, precise understanding of the user’s intent, this is crucial. (2) user modeling, user modeling, including basic information about users, and more importantly users the mining history conversation logs, this work is very hard, but at the same level, is a technology company to prove their technology a way to cool. Now logs mining is common, not everyone did well, and the logs are not set, structured index, but unstructured text logs, and began digging harder. Another point, paper types, user emotion and sentiment analysis is more NLP Research task, the user’s mood is directly related to the success of sales, if enough cows, factors to be considered will be enough analysis of the user are clear enough. History hung in the model is not a good idea, because history is growing problem will cause the model to capture info, the better approach might be to build things like user profile, history precipitated as a vector representation, or a knowledge graph to represent a user. With this bot, say sounding a little personalized bot. (3) knowledge external knowledge sources, when related to the specific business, business data is a kind of knowledge, how knowledge modeling to model, generate dialogue can be more professional and precise is also a very important issue. Bot is a general problem, not just hard on the system, and is modeled on the hardest.

5, I have always felt that life and will have to look at the issue, the world is not black and white, but rather somewhere in between successive values. Can’t say either as a open-domain Big Mac Bot or bot without a specific function, can not only see the existing bot is not mature, and BOT out of reach in the fantasy, dark this area, laugh at somebody else can get investment. Fight these senseless, really meaningful is dig deep in this area, identify pain points and difficult points of breaking one by one, continue to promote development in this area, rather than some of the streets to watch the man, it’s boring! Before the breakthrough in many fields, as if never see dawn, but not all of the many problems that cannot be solved in a few years is a Red Sea, streets are now? A generic bot for a long time is a difficult thing, but a high availability, extensibility, good bot solution there is to look forward, not overconfident, nor to belittle and get down to do it.

Lei Feng network Note: this authorized by ResysChina reprinted, for reprint, please contact the original author.

Like this:

Like Loading…


Originally published at on August 11, 2016.