The Impact of Automation on Enterprise Content

Natural Language Generation and Natural Language Processing are going to be the third stage of Automation. Are you ready?

Robot Quill, an installation consisting of an industrial robot which writes calligraphy. Creative Commons Attribution 2.0 Generic License, Wikimedia

The machines are coming. Not only to automate content — that has been in play with Enterprise Content Management (ECM) for many decades — but to generate it as Natural Language Generation (NLG).

Let’s begin with a simple version of a Turing test. One of these passages was written by a seasoned sports correspondent of a national newspaper, the other, by a machine in a few seconds. Can you spot the machined text?

Sample 1

Having jumped 13 places in a year, leaping from fourteenth to first, Leicester City are easily the most improved side in the league and Jamie Vardy’s role in their staggering rise cannot be overstated. The second top scorer in the league with 24 goals, Vardy has scored 35.29% of Leicester’s 68 goals. Only Harry Kane and Odion Ighalo were a bigger source of goals for their team, with Kane scoring 25 of third-placed Tottenham Hotspur’s 69 goals (36.23%) and Ighalo scoring 15 of 13th-placed Watford’s 40 goals (37.5%).
That underlines Leicester’s overall effectiveness. Although they conceded as many goals as second-placed Arsenal, and one more than Tottenham, they have been more consistent. They were first at Christmas, while Arsenal were second and Tottenham were fourth. ‘It’s a magical season,’ Claudio Ranieri, Leicester’s manager, says, justifiably so, given that a summer expenditure of £26.7m on transfers made them the eighth lowest spenders.

Sample 2

It was a season for the ages for Leicester City as they lifted the Premier League Trophy and were crowned champions of England. Leicester City featured one of the league’s most skillful attacks, netting 68 goals. Jamie Vardy led the way with an incredible 24 goals. In addition to their offensive prowess, Leicester City possessed one of the strongest defenses in England.
Shipping only 36 goals all season, their defense was able to frustrate even the most potent of attacks. Hoping to finish in the top ten after a fourteenth place finish last season, Leicester City splashed out 26.70 million in the summer transfer period. Leicester City sat in first place at Christmas after an incredible start to the season, and they continued to impress the second half of the season. After taking a few moments to reflect on the season, the Leicester City manager weighed in with, ‘It’s a magical season.’

Sample 1 was written by Jacob Steinberg, football correspondent for The Guardian and Sample 2 was written by a software called Wordsmith.

As it happens, sports and financial services have led the march on content automation with NLG engines. Once the data is fed into a spreadsheet — whether match and individual scores or annual or quarterly reports — whole articles and investment research and fund fact-sheets can be created using today’s commercial today’s commercial technology. And this content is device-appropriate, and interactive. In the case of financial reporting, it supplies investment managers with content that enables split-second investment decisions.

Content automation has been growing in banking and insurance too, with standard operating procedures to be made available in a regulatory environment across languages and geographies.

Stages of Enterprise Content Automation

Why does an enterprise need content? For the simple reason that better content makes for better decisions — both, by external as well as internal customers. Nearly everyone has come across poorly presented content, and recognizes the reach of its impact, whether it takes place before purchase, during use, or while troubleshooting. This could impact consumers such that they may not buy the product, or if they do, they may not recommend it to others.

Content quality depends on people and teams with three principal attributes: great English, a good grasp of the technical domain, and the understanding of tools and technologies to present content. As any C-level executive or hiring manager will tell you, this combination of skills isn’t easy to find. Content quality has always included the dimensions of style, accuracy, consistency, and ease of comprehension. In the digital age, the timeliness of content — getting it published and delivered faster — has been increasingly flagged as high-priority. In the mobile age, quality concerns now include ideas such as “device-appropriate” and “interactive” which are becoming the new minimum requirements to satisfy content consumers.

Content Automation allows for enterprises to meet their deadlines, while matching the quality and pace of production of content that is required in today’s business world. Beyond solving today’s problems, content automation can also enable a company to be more agile including the ability to create new information products and communications dynamically, as well as quickly support new generations of devices and formats such as eyewear displays, smart watches, and more.

As AI becomes stronger, non-data-led content recognition and generation will explode. Already the capability to create product descriptions for e-commerce, datasheets for print and PDF exists. Product companies that make easily componentized or versioned products have been using content automation for technical support and technical documentation for at least a decade. But of course there was no capability to parse the actual content, and as it becomes available, we will see a surge in actual content generation.

Content Automation comes into being in three stages.

Content Automation 1.0: Enterprise Content Management

The problem with the traditional process of writing content in independent organizations is the huge amount of rework and time involved. The quality is low, and the content is not publication-ready.

Tightly-coupled Content and Design: Content is often locked to one media type because the author and designer commit content to design very early in the process.

Low Reuse: Documents are typically disconnected, often leading to the same content being recreated and translated for multiple documents and mediums. With traditional content creation tools, content is reused by authors and designers by copying and pasting content between documents and media, increasing the opportunity for errors and inconsistencies.

Updates and Collaboration: Reviews and approvals involve Word documents and PDFs being emailed back and forth, which is time-consuming, error-prone and expensive. Updates to content must be made manually across multiple documents and media, requiring further rounds of review and approval.

No Metadata: Content has traditionally been very document-centric, and doesn’t contain metadata, which makes the reuse of content for different audiences a manual and lengthy process.

The first generation of Content Automation helps global organizations streamline their content processes, and enables them to deliver business-critical content with precision — typically with the use of an Enterprise Content Management system. This automation of process and single instance of content (single-sourcing) found two uses:

  1. Billing Statements have taken advantage of this automation to improve the quality and timeliness. It is typically limited to converting and publishing relational data from a database as a PDF and increasingly as Web and Mobile HTML.
  2. Technical Documentation teams use this to improve efficiency for publishing to multiple formats such as a printed user manual, one or more customer help systems, and sometimes custom applications such as aircraft maintenance systems. One of the biggest challenges in Technical Documentation is that target formats are always changing and expanding.

Content Automation immediately shows key benefits:

Productivity: Subject matter experts such as financial and legal analysts, product managers, and government officials who contribute authored content are 30–70 per cent more productive. They no longer have to waste time manually “formatting” the content and they the ability to reuse already existing components of content.

Reduced Content Maintenance and Increased Agility: Content componentization and managed reuse removes the need to copy/paste or rewrite content that already exists. Rather than store content as monolithic documents, a scalable content automation system enables authors to create, manage, and deploy text, data, and media components as “single source of truth” assets. For example, if a publication requires one or more legal disclaimers, that disclaimer is stored once and used by reference in multiple publications. If changes are required, the disclaimer is edited once and all references to that disclaimer are automatically updated as well. Usability of components is similar to copy/paste, but without the associated problems of trying to manually update hundreds of different documents where paste was used.

Collaboration: The ability for cross-department teams to work in parallel on complex and/or large publications by leveraging componentization and automation creates better results faster.

Quality: When managed reuse is deployed and content maintenance costs are reduced, information quality increases dramatically: accuracy is improved; consistency is dramatically improved; and time-to-market is reduced significantly. Further, content automation can also generate omni-channel outputs without manual intervention. So the resulting publications have consistent style and branding, as well as enabling the inclusion of interactivity features such as slideshows, pop-ups, animated text, and more.

Time-To-Market: Content Automation can reduce the cycle time of a content production workflow from months to days. A custom electronic component manufacturer, who used to take 4 weeks of human effort to create a 40-page product data sheet for each customer request for a specific configuration, can now do so in hours and can produce output in for print, web, and mobile seamlessly.

Content Automation 2.0: Natural Language Generation on data underlay

The cost of the Content Automation 1.0, custom, hand-crafted (and often programmer-intensive) automation is high and adaptability low. Also for single-sourcing to work the initial content is still written by the team tasked with a grasp of English, technology and presentation. The InfoTrends™ Content Automation Research Survey (2016) illustrates this well:

50% respondents say increasing customer satisfaction is the cornerstone of their content strategy in the next 12 months
76% respondents say their stakeholders want more mobile & interactive content
30% respondents say their current ECM is difficult to configure for their specific requirements
25% respondents state that their ECM doesn’t support automated content reuse and updating
50% respondents said PDFs are difficult to review and annotate
70% found email an inefficient way to review and approve content

The next stage of Content Automation addresses two aspects:

· Generation of actual language output based on underlying data context, and

· Separation of the content from its presentation

The initial example of the Leicester City reporting falls into this stage of automation where the machine is generating text the text based on data and a larger pool of contextual data. Already Google Rankings is a AI program and Google Analytics uses Narrative Science tech to present dense analytics in a readable format, and the resultant reports provide context in an accessible way.

But penetration in the business content is still limited. While the early adopters see the clear benefit of single-sourcing, re-use and republishing content branded, design-rich, and interactive content with an overlay of analytics still requires a new and different business language to be created, along with the creative. The key to that is Language Processing.

Content Automation 3.0: Machine parses, understands and generates content

Kris Hammond, Chief Scientist at Narrative Science and a Professor of Computer Science at Northwestern University, estimates that content written by algorithms will make up 90% of journalistic reporting by 2030! This suggests that in the long term, automated content is going to have to get vastly more interesting as well as more personalized.

AI can research swathes of data far quicker than a person, and compile relevant information and present it in suitable ways. So the author comes in to analyze the automated report and add insight, context and flair to the piece. This is not so different from the way national papers currently work. Local press agencies source stories and send copy to the national dailies. The newspaper then uses in-house writers to meet house guidelines.

AI relies on two things: smart algorithms and data points to create the context base on which to parse and understand context. Thanks to the Internet of Things, data will be available from cars, CCTV, social media, the internet, live video, people’s homes and much more.

How should you go about Automating Content?

If you want to start on Content Automation, first prepare the groundwork. It starts with creating a Content Automation strategy.

Laying down the strategy should be simple: figure out the structure, pour your content in, automatically extract content as needed, and publish it everywhere. Once that is achieved, create engines that generate content to fill in. Therein lies the complexity.

Even before we get to the automation piece, we need to recognize that information exists in multiple areas and it differs in content, style, tone, and message. Customers don’t know which one is correct, most up-to-date, or comprehensive. This can be confusing and lead to poor customer experience.

Pre-automation stage: First, create the framework to provide the right content to the right person at the right time. Few brands know how to do this well. If you want to create an exceptional experience, you need to figure this out.

Content Automation: Second, we need processes and workflows to manage all this content. While quality content is always a priority, we also need to figure out how to automate what we can so that our efforts scale.

Automating content generation: Once this is achieved we figure how to generate content; make it distributable as marketing content, learning content, and technical content; and make it ubiquitous across distribution channels.

TWB_ and Content Automation

Ten years ago, TWB_ pioneered the offshore creation of technology content which was hitherto created by companies either internally or with ‘consultants’ hired by companies from vendors. TWB_ changed that by creating domain depth that mirrored the customer’s own technology capability with deep SME teams for a variety of industries including Information Technology, Defense & Aerospace, Engineering, Life Sciences. TWB_ then integrated this technology capability with high-quality content creation teams that could create product, learning and marketing content across these industries, with the rigor of the software engineering process.

As TWB_ started scaling up, it was natural to look for efficiencies in developing content and helping customers break through the stovepipes in which their content lay. That need, coupled with TWB_’s deep understanding of content technologies, became the base of the first level of automation we could provide our customers.

Content Automation 1.0 available since 2010

TWB_ released India’s first cross-ECM integration platform called the “TWB_ Center of Excellence in Technical Communication” (TWB_ COE) in 2010, which is a copyrighted and trademarked process. The TWB_ COE allowed the integration of different ECM, technologies and programming to give enterprise customers a unified view of content, automate content flow while integrating existing ECMs. It also allowed enterprise customers to measure and manage the ‘quality of content’ at each content node. TWB Consulting and Research teams proved single-sourcing, managing quality of content, and automation of information flow could save enterprises up to 80% in costs and time. The platform brought together (a) content reuse by having information available where required, when required, without duplication of effort, and (b) integration of existing information and content management architectures to deliver content automation.

Legacy, ECM, and opensource integrations include TWB_ partners such as Microsoft®, Author-it®, Madcap® , Adobe® suites, various open source ECM, CMS and LMS solutions as well as native XML/DITA implementations.

Here are two examples of automating content with the TWB_ COE platform:

1. Content Automation 1.0 for the world’s largest software company: The customer who develops its own editors and workflow platforms needed to bring in collaboration, workflow and versioning capability for internal use for its distributed teams while using a familiar interface. The data is not XML however and is proprietary to the company. The TWB_COE solution included workflow that came from SharePoint®, versioning and single-sourcing came from Madcap® and manual conversion of some legacy data to XML and SGML content migration to XML. Bringing in XML into proprietary system.

2. TWB_ COE for Europe’s leading insurer: The customer serviced 13 markets in as many languages in Europe with different regulatory requirements, and twice as many markets in the Americas, Asia, and Africa. The base legal documentation was managed manually and different versions maintained combining (a) One of ’n’ insurance products for the market (b) regulatory compliance(s) for the market © localization of this content from English. The flow of information was via e-mail. The TWB_ COE content automation solution combined workflow and versioning from AuthorIT® information fields from the SAP HANA® databases and some content migrated to XML.

Content Automation 3.0 available for pilots

TWB_ is currently piloting Pāṇini which uses NLP/Machine learning algorithms and supervised learning techniques to analyze documents for scope, legal, financial and regulatory compliance and provide meaningful insights, and redline documents for user intervention.

TWB_ is where creativity meets technology. We’re the strategic + creative + content agency for technology brands. We work with several Fortune 500 brands, including Microsoft, Lenovo, Intel, Cisco, Oracle, SAP, and Samsung. TWB_ makes technology stand out. shift@twb.in

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.