Automated Generation of News Titles for Search Engine Optimization

SZDM Data Science
Süddeutsche Zeitung Digitale Medien
6 min readMar 15, 2023

(This blog post is based on the master thesis of Severin Schmid at LMU Munich, who has been a working student at SZDM for multiple years and has supported us in various digital products and topics such as our paywall, sz-magazin.de and data science.)

Süddeutsche Zeitung (SZ) uses Search Engine Optimization (SEO) techniques to attract users to sueddeutsche.de. While there are many aspects of a website which can be optimized to boost search ranking, e.g., site reliability, response time and timely content, we wanted to explore the potential of natural language processing (NLP) to automatically generate search-engine-optimized headlines for news articles.

This work is heavily inspired by Axel Springer’s ideas outlined in their medium post and respective paper DeepTitle — Leveraging BERT to generate Search Engine Optimized Headlines. They build upon BertSum to implement an encoder-decoder architecture, which has a pre-trained BERT model as encoder. The decoder, on the other hand, is trained from scratch in order to generate German titles. Finally, the most relevant keywords in an article’s text are identified and then boosted during headline generation via tweaking the beam search algorithm.

In the following, we describe our approach which is based on similar ideas but follows different implementation choices.

Data

We created a German news corpus of around 170.000 articles with their texts and SEO titles published on the SZ website between 2018 and 2022. As a first step, we plot a word cloud that captures how often words appear in titles.

Words and their frequencies captured in a word cloud — the bigger a word the more often it appears in titles.

The word cloud points to an important aspect of our SEO titles — the most common words are names of people, places, organizations and other entities. In fact, we discovered that titles include between one and two of such named entities. The most important ones are usually found at the beginning of a title.

Average number of entites per title, compared across six of SZ’s larger departments.

Here are example headlines that show a common structure we observe in the data:

  • Freising: “Vorhang auf” für klassische Musik in der Stadtbibliothek
  • München: Wohnungsinhaber erwischt Einbrecher auf dem Balkon

Looking at the lengths of the articles’ texts and SEO titles in our corpus helps us decide which model to use and how to configure maximum output length and other parameters.

Distribution of lengths of SEO headlines.

Language Modeling

Developing NLP applications has never been more accessible thanks to the transformer architecture and libraries such as huggingface.co. There is an entire ecosystem of different transformer-based language models for all kinds of generative tasks. This post will omit any details and refer to the above links or the book Natural Language Processing with Transformers.

We experimented with different constellations of input data and various pre-trainined models, in particular T5, Bert2Bert and BART. The models were fine-tuned with a subset of our article-headline pairs which contained ~10k articles from each of our seven largest departments. After some experiments, we decided on BART for the remainder of this work. We also tried fine-tuning the BART embeddings on the masked language modelling task, however, while the perplexity on the dataset decreased successfully, the results on the downstream title generation task got slightly worse.

+-----------------------+----------+
| model | ROUGE-2 |
+-----------------------+----------+
| BART untuned | 1.19 |
| BART fine-tuned | 25.93 |
| T5 fine-tuned | 17.69 |
| Bert2Bert fine-tuned | 20.56 |
+-----------------------+----------+

To evaluate the quality of our automatically generated titles, we used the ROUGE score. Developed especially for summarization, it is a standard score for tasks like ours. ROUGE measures how similar a generated summary is to a reference summary by computing word or word-sequence overlap between both. Note that it has some limitations, such as the ability to account for synonyms or multiple possible solutions.

ℹ️ Technical detail: We ran most of the trainings and evaluations on an Amazon SageMaker Notebook instance of type ml.p3.2xlarge with 8 vCPUs and 61GB memory. Smaller experiments were done on a 2020 MacbookPro equipped with the M1 chip and 16GB of RAM.

Boosting Entities

We identified relevant entities in the articles’ texts using named entity recognition via Stanford’s NLP package Stanza and then obtained search volume scores and related topics with the Google Trends API and its Python interface pytrends. Keywords are then weighted according to their respective search volume so that we can generate titles that are optimized based on what people are actually searching for.

For the implementation of the keyword weighting, we used and modified huggingface’s transformer library. The custom token scores were injected into the decoding process by customizing a LogitsProcessor which is used whenever the method model.generate() is called. Custom LogitsProcessor classes already exist, for example for suppressing bad words or the opposite, forcing a list of tokens to appear in the output. Since we aim at only slightly favoring the identified keywords we implement our own LogitsProcessor class as follows:

### LogitsProcessor for favoring specified keywords during inference
class SEOLogitsProcessor(LogitsProcessor):
def __init__(self, scores_map: dict, temperature: int):
self.temperature = temperature # set to 0.9 in our experiments
self.mask = torch.ones(len(tokenizer.vocab))
self.seo_words_ids = list(scores_map.keys())
for k, v in scores_map.items():
v = max(v, 0.0001) # avoid ZeroDivisionError
self.mask[k] = (10/v) * temperature

def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor)
-> torch.FloatTensor:
if self.temperature == 1:
return scores
for k in self.seo_words_ids:
self.mask[k] *= 1.1
return scores * self.mask

At every decoding step, the current output sequence is passed to the call function of the SEOLogitsProcessor along with the scores tensor. The latter contains a map of every token in the vocabulary along with its probability to be the next token in the sequence. We multiply this by a vector that contains weights based on the search volume for the favored tokens and a neutral value 1 for all other tokens. With the temperature and a decay of 1.1 we favor keywords more at the beginning of the sequence than at the end.

The following table shows the results when favoring the identified keywords in the headline generation process:

+-------------+------------------+-----------------+---------+
| experiment | entities / title | keywords recall | ROUGE-2 |
+-------------+------------------+-----------------+---------+
| reference | 1.29 | 6.36 | -- |
| model | 1.76 | 9.53 | 26.65 |
| model+KW | 1.79 | 10.89 | 22.73 |
| model+KW+RQ | 2.06 | 11.24 | 26.78 |
+-------------+------------------+-----------------+---------+

Reference refers to the original SEO title written by our editors, model is the SEO title generated by our fine-tuned BART model without any SEO tweaks. For model+KW, we provided the model with a list of keywords as discussed above. Finally, KW+RQ extends the list of keywords by related queries which we extracted with pytrends. Here are a few examples of generated titles that illustrate the impact of boosting keywords:

model: Kardinal Müller greift Papst an
model+KW: Vatikan - Kardinal Müller greift den Papst an

model: Bürgermeisterin Franziska Giffey im Interview
model+KW: Berlin: Bürgermeisterin Giffey im Interview

The results show that our model works as expected and favors keywords without compromising on the quality of generated headlines. It even encourages more entities on average than the original SEO titles.

In conclusion, we were able to show that article headlines can be automatically generated favoring relevant keywords for SEO purposes. However, manually looking at the generated headlines, we occasionally observe errors, even factual errors. This is why we would recommend the described system to be an additional guidance for human SEO editors rather than a fully automated solution.

--

--