Different types of Text Generation using LLMs

Uttaran Tribedi
think AI
Published in
8 min readSep 24, 2024
A classification of the key tasks in text generation and the challenges associated with each.

There are five prominent areas of text generation namely open-ended text generation, summarization, translation, paraphrasing, and question answering. Lets dive into each of them one by one:

1. Open-ended text generation:

Open-Ended Text Generation (OETG) creates new text by building on a given prompt, aiming for the final output to be both coherent and natural. Unlike other text generation methods, OETG offers greater flexibility, especially in terms of variable output length. To achieve fluent text, language models are employed to maximize the probability distribution of word sequences. In contrast, human language relies not only on grammar and syntax but also heavily incorporates semantic and pragmatic elements.

There are broadly three main sub-tasks of OETG, namely open-domain OETG, story generation, and dialogue generation.

a. Open-domain

Open-domain OETG refers to models that are not restricted by a specific topic or structure, allowing them to generate diverse and highly variable text across numerous scenarios. Modern language models like GPT and LLaMA can handle multiple domains and formats, thanks to pre-training on vast, varied datasets such as web text and knowledge bases. This broad pre-training enables them to generalize across different topics and tasks.

However, a significant challenge is that many of the most advanced models are developed by large tech companies, which often keep key details, such as datasets and hyperparameters, private. This lack of transparency makes it difficult to replicate or build upon these models, though some companies do adopt open-source practices.

b. Story generation

A story generation model crafts narratives that appear coherent to the reader by describing sequential events and maintaining consistent characters within the plot. To ensure this, many methods integrate additional modules for enhancing narrative consistency. For instance, Fan et al. use a convolutional language model to generate a structured prompt and improve coherence through a text-to-text approach.

One of the key challenges in story generation is producing long, coherent narratives with evolving characters.

c. Dialogue generation

Dialogue generation refers to simulating conversations between two or more parties. While two-party setups are common in AI assistants, multi-party dialogues, like those seen in movies, present unique challenges. Standard models like BERT and GPT, which are trained on general text (e.g., Wikipedia), aren’t optimized specifically for dialogue, as conversations involve one-to-many relationships.

Most conversational data used for training is synthetic, such as the AMI corpus, which includes 279 hours of simulated meeting dialogue. However, dialogue generation faces challenges with maintaining consistency, semantics, and interactivity, especially in open-domain scenarios. Long context, character personalities, sentiment, and dialogue policies are key factors that influence these difficulties. While two-party dialogue is well-researched, multi-party dialogue remains less explored due to its complexity, often found in confidential settings like business meetings.

2. Summarization:

Summarization refers to creating a condensed version of a longer text using one or more references. These references are usually more than twice the length of the summary itself. Summarization can be split into two types: extractive and abstractive. Extractive summarization selects and stitches together key segments from the source text, while abstractive summarization generates new sentences, often with minimal overlap in wording compared to the original.

Extractive methods rely on statistical techniques to find important text spans and are relatively simple to implement, such as by ranking sentences or fine-tuning language models. On the other hand, abstractive methods use semantic features, like word embeddings, to interpret the text’s meaning and create summaries. Hybrid approaches that combine both extractive and abstractive techniques are being explored to reduce the computational burden of handling long documents with purely abstractive methods.

The most popular summarization sub-tasks are single-document summarization, multi-document summarization, and dialogue summarization.

a. Single-document summarization (SDS)

This focuses on creating a concise and accurate summary from one document, such as a news article. Techniques like extractive summarization utilize models like BERT, while methods such as pointer-generator networks copy key tokens from the source text to generate more relevant summaries. These networks allow the model to choose between generating new words or selecting from the source, enhancing the handling of out-of-vocabulary terms. Datasets in SDS primarily come from the news domain. However, SDS faces challenges with handling long documents due to the computational demands of transformers, though specialized models like Longformer help mitigate these issues. Another difficulty lies in maintaining faithfulness to the original text, as models can sometimes hallucinate or introduce irrelevant information.

b. Multi-document summarization

This involves generating a summary from a collection of related documents. Architectural strategies include ensemble networks, hierarchical approaches, and graph neural networks, all of which handle cross-document relationships and attention mechanisms. For example, the Multi-News dataset includes clusters of documents to summarize. MDS inherits many challenges from SDS but faces additional difficulties, such as resolving conflicts or redundancies across documents and managing overlapping or complementary information. The diversity of datasets is also limited due to the high costs of creating quality MDS datasets.

c. Dialogue summarization

This distills the key points from multi-turn conversations, such as meetings, and is especially useful in scenarios where manual note-taking is time-consuming. Most datasets for dialogue summarization, such as the AMI corpus, rely on synthetic data, which can be problematic. The lack of real conversation datasets, often due to confidentiality issues, can lead to performance drops when models trained on synthetic data encounter the complexities of natural conversations, including coreference errors or difficulties in maintaining temporal coherence.

3. Translation

Translation, or machine translation, involves converting text from one language to another, such as translating French to English. While translation can apply to various formats like text-to-code, the most common task is text-to-text translation. The key challenge is ensuring that the translated text maintains the original meaning without semantic shifts. Most translation research focuses on high-resource languages like English and Chinese due to the abundance of training data, making it difficult to train models for low-resource languages. To address this, techniques like back-translation help improve performance by aligning training data more closely with the testing environment.

The most common translation sub-tasks are sentence-level and document-level translation

a. Sentence-level translation

This involves converting a single sentence from one language to another. Recent techniques use encoder-decoder Transformer models or prompt pre-trained language models to perform this task, leveraging large databases of translation examples. Popular datasets, like those from the WMT General Machine Translation task, cover a variety of languages such as Chinese, German, and Japanese across different domains, including news and e-commerce. While sentence-level translation provides efficient results, it can face difficulties due to a lack of context, especially with ambiguous words or pronouns. Despite these limitations, the approach remains effective for shorter text sequences.

b. Document-level translation

This focuses on translating multiple sentences or paragraphs by considering the relationships between them, allowing for better handling of ambiguities compared to sentence-level translation. This approach uses the broader context within a document, ranging from small paragraphs to entire books. However, document-level translation faces challenges due to the scarcity of high-quality datasets, many of which are paid and limited in size and language coverage.

4. Paraphrasing

Paraphrasing involves generating a text that conveys nearly the same meaning as the original but uses different words or structures. Unlike tasks like translation, the length of a paraphrased text isn’t a key factor — it can be shorter, longer, or the same length as the source. Paraphrasing can involve word substitutions, grammatical changes, or altering word order. A major challenge in paraphrasing is evaluating how well the generated text captures the original meaning, as common metrics like BLEU, which focus on word overlap, often correlate poorly with human judgments. Multiple reference texts could improve system training, but most datasets provide only one reference.

The two major sub-fields of paraphrasing are uncontrolled paraphrasing and controlled paraphrasing.

a. Uncontrolled paraphrasing

This involves rewording a text without focusing on specific types of changes or maintaining strict guidelines. Most models for this task use an encoder-decoder architecture, which first understands the meaning of the original text and then generates a paraphrase. Datasets like TURL and Quora Question Pairs (QQP) can be used for this type of task. One challenge here is the lack of diverse reference paraphrases to compare generated outputs, as existing datasets usually provide only one reference for each source text, limiting evaluation quality.

b. Controlled paraphrasing

In this case the text is rephrased with specific changes, such as modifying sentence structure, tone, or grammar. Researchers have developed models that can generate paraphrases according to set rules, like changing polarity or keeping the syntax the same. Some works categorize paraphrase types based on morphology, syntax, and discourse features. Models like BART and PEGASUS are used to produce paraphrases that adhere to these specific constraints.

5. Question Answering

Question answering (QA) involves taking a question as input and providing a concise answer or a list of potential answers based on available knowledge. This knowledge can come from internal sources like the system’s training data or external sources such as documents or knowledge bases. Internal knowledge includes information within the text, such as keywords and linguistic patterns, while external knowledge comes from databases or other external texts. While some QA systems focus on specific domains like math, most research targets open-domain QA, where the system can handle a wide range of topics. One challenge in QA, particularly for open-domain systems, is accurately evaluating answers when there is no clear reference answer in the source text. Researchers are exploring methods like Chain-of-Thought Prompting to improve reasoning in QA systems.

Let’s elaborate more on the two major types of QA sub-tasks which are internal knowledge-grounded QA and external knowledge-grounded QA.

a. Internal knowledge

In internal knowledge-based QA, the system relies solely on the information it has learned from its training data, without using any external resources to answer a question. It uses pre-trained language models (LMs) that memorize data from large datasets allowing it to answer questions based on this internal knowledge. For more specific tasks, these models are fine-tuned on specialized datasets. One of the main challenges with internal knowledge QA is that models may sometimes “hallucinate” information, providing answers that aren’t factually correct.

b. External knowledge

In external knowledge QA, the system uses additional resources at inference time, such as a knowledge base, graph, or external documents, often referred to as “reading comprehension”. The system uses these external texts to generate an answer. Popular datasets like SQuAD help train these systems by providing questions based on specific text passages.

Reference:

Becker, J., Wahle, J., Gipp, B., & Ruas, T. (2024). Text Generation: A Systematic Literature Review of Tasks, Evaluation, and Challenges. ArXiv, abs/2405.15604.

If you liked this post or thought it helped you in any way, don’t forget to follow this page for more such useful content!

--

--