How AI-Powered Tailored Text Generation Can Help Copywriters Deal with Increasing Demand for Text Across Multiple Platforms
Content writers and copywriters who write for enterprise are under increased pressure to create a lot engaging content for a lot of channels. Never before have people consumed content on so many screens and in so many ways; mobile, desktop, experiential experiences are all fuelled by content, and someone has to create and alter it for these platforms.
All of this can be overwhelming for writers. Every piece of content includes core messages — line items that need to be communicated in some way, via a newsletter, email, message, push notification, etc., or often all of the above; all at different lengths, serving different audience attention spans, and with varying user requirements. It was this overwhelming challenge facing content creators, that got us thinking about how we could use AI to lighten the load.
There are multiple ways AI can automate and help some of this workflow, notably through the automatic generation of summaries for different formats and platforms. When given an input text, AI can be used to create a summary and, with the right guidance from engineers, can also be tailored to unique needs, like making the summary simpler, or having it touch on multiple aspects of a text (but not on others). First, let’s get one thing out of the way, for the writers out there.
The first question I often get from copywriters is, is this a step towards replacing their job. The answer is no.
The reality with AI is it’s not going to take over the world any time soon, but it is going to make some jobs easier by doing the mundane tasks that take a lot of time. These tasks can be automated. Automated text generation, in the form of summaries, makes writers’ jobs easier by allowing them to easily enable diversification and personalization of their texts for individuals and platforms. This means writers can spend more time doing what they love: being creative.
Pointer-Generator Framework for summary generation
The Pointer-Generator Framework (see references for more details) involves a Sequence-to-Sequence based network that helps model the summarization problem by taking the input text as a sequence of words, and using that input to generate the output summary. First, words are embedded into a space capturing all known information about that word and its meaning. Then that is taken into an encoder module. This encoder takes the sequence of words and encodes them into a hidden state, which is then decoded by the module and turned into the summary.
When we summarize an article, we read the whole text to understand the overall meaning. This is what the encoder is doing, and then the decoder uses this information and writes the summary.
Attention Distribution to focus on specific parts of a text and guide networks with words they don’t know
Recent neural networks have benefited a lot from what’s called Attention Distribution, working on top of the process above. As humans, we don’t usually read a whole passage once and then summarize it. Usually, we read it, digest the information, then focus on specific parts of a text that really matter and generate the summary from those parts. Attention Distribution does this — it tells the neural network what to focus on as it’s generating the summary.
The decoder can then incrementally focus on specific parts of a text to generate the summary. The decoder then generates a probability distribution over the vocabulary space that we have, and that determines the next word needed to be generated.
Another area where Attention Distribution can be helpful is if the network hasn’t seen a word before. In this case, the final distribution not only takes the text from the decoder (the probability of the next word to be generated), but also takes another input that makes use of the Attention Distribution to determine if the network needs to use this distribution to generate one word from its vocabulary, or whether it should just copy factual statements from the input text. This allows, in the output, an entity that isn’t present in the vocabulary, and so the network is able to copy it from the source and present it in the decoder.
How we built on these frameworks to solve user problems with text generation
The main problem with the above process is that it generates a single summary from a text, by following the steps. The nature of this one summary is dependent on the data on which the network has been trained, and this can be limiting for copywriters that want to generate multiple variants that have a similar flavor, but with different characteristics (for different platforms or audiences, for example).
Tailored generation to get the kinds of summaries you want
When it comes to generating certain texts with certain characteristics, it is often hard to define what those characteristics are and how to tailor them in the network.
A lot of content and user requirements go into text generations. These two different aspects can define the tailoring. Say, for example, you have a larger article with multiple topics, and a single summary won’t do this big text justice.
Similarly, in creating an executive summary for a document, you may want to focus on specific entities (like profits, for example). This would include focusing on different parts of the document and generating the summary accordingly. Since Attention Distribution determines how to generate the next word, you can obtain a topic-based summary of a text (say politics or business in an article about both), by modifying the Attention to focus on the specific parts of the document that talk about this.
Token-based modelling for tailored text generation
One of the simplest ways to achieve tailoring in text generation is to add an additional flag to the article, indicating the desired topic for the summary. This tells the network implicitly that, if this was the topic, this is how the summary should be.
The network is very dependent on data here. Looking specifically at how to boost Attention Distribution, we determined one way to about it would be to use specific mapping to boost the Attention Distribution. This could work, but the output summary could start producing some incoherent language since it’s just focusing on the topic words. A much better technique to achieve this result is to create the data that can speak to the topic. In other words, creating an artificial dataset with sufficient diversity that can teach the network what it means to create topic-oriented summaries about a certain subject.
To do this, we feed the system input articles with one dominant topic (to avoid confusing the network), with summaries for each, and from here we can add an additional topic flag accordingly. With this topic flag and the Attention boost, we saw that the model performed very well with some of these standard datasets. The generated summaries featured the desired topic in at least 75 percent of cases.
We essentially need to teach the network what is important, with explicit indicators, to make sure we get what we want out of these networks. The data does talk for itself, driving the text generation, but we can do our boosting via the Attention Distribution.
Tailoring text generation to styles of expressions
In many cases, we may want to generate the same summary with different linguistic expressions and characteristics. But when it comes to the use of expressions, they may not be present in the input articles the network is looking at.
Say you want to generate a summary with simpler words, from a text filled with complex jargon. To do this, we can modify the decoder properties and achieve tuning. It is possible to tune the decoder so that we increase the probability of simpler words.
You can use a Bayesian process to define quality or preference code for certain words in a target vocabulary. For example, if you want to generate a simple summary, you can teach a network how simple a particular word is. You take a standard input text, something very simple, and you have the network look at the frequency of a particular word. This then becomes a proxy for how simple a particular word is, and can be used to determine the use of that word in the summary of your desired text.
The key takeaway, in the results we’ve seen, is that you can improve the simplicity of a summary this way, with the network replacing complex words with simpler words. Doing this contextually at the time of generation, versus after generation is complete, always yields better results.
Document and sentence levels
If you need to generate a summary that is formal, it’s not just the choice of words that are used, but also the way a sentence is structured. In order to account for this, fiddling with decoder properties isn’t enough. We need to resort to a more complex way to teach the network how to generate text in this way..
We want to add the ways we evaluate expressions, so the network learns how we generate these different stylistic choices as humans. We do this via Reinforcement Learning (RL). The network learns to generate the next word given the current context, which is analogous to a policy in the RL framework. In our RL framework, we allow the network to generate a sentence/passage and, based on the generation, we provide a reward to the network. For example, if we want to generate readable summaries, we give a higher reward for higher readability of a generated summary. Over time, the network learns to generate readable summaries, since the rewards incentivize the network to optimize in that direction.
We also used this technique to evaluate more formal summaries, in which we generate more formal words and give feedback on how formal or informal a particular generation is. Reinforced rewards allow the network to understand the nitty gritties of what formality is, and in the end network gets better and better at generating formal summaries when asked.
Generations are getting better and better
Text generations are getting better and better, and it’s paving the way for a more effective way to work for content writers and copywriters in an increasingly demanding marketplace. Our technologies help in tailoring several content-centric and stylistic variants that can satisfy different user needs — thus helping writers handle personalization needs at scale.
Abigail, Peter J. Liu, and Christopher D. Manning. “Get To The Point: Summarization with Pointer-Generator Networks.” Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Vol. 1. 2017.
Topic based summarization: Krishna, Kundan, and Balaji Vasan Srinivasan. “Generating Topic-Oriented Summaries Using Neural Attention.” Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 2018.
Simplicity tailored summarization: Krishna, K., Murhekar, A., Sharma, S., & Srinivasan, B. V. (2018, August). Vocabulary Tailored Summary Generation. In Proceedings of the 27th International Conference on Computational Linguistics (pp. 795–805).
Readability tailored + RL: Chawla K, Singh H, Pramanik A, Kumar M, Srinivasan BV, “Abstractive Text Summarization tailored to target characteristics”, 20th International Conference on Intelligent Text Processing and Computational Linguistics (CiCLing), April 2019