Extracting Tasks from Emails: first challenges

Our main goal in Courier is very simple: let users spend less time reading emails. To that end, we are trying to help users easily spot meaningful information in the content of their emails, for example tasks.

A task can be understood as a request that places an obligation on a recipient. Such requests can be asking for information, to perform a specific action or to schedule a meeting, to name a few examples. Task statements occur in emails in the context of textual conversations, and as for many other Natural Language Processing issues, the identification of such utterances is not a trivial task.

In order to extract tasks from emails, we decided to train a Machine Learning classifier capable of detecting if a sentence of an email is a task candidate or not. Training a system of this type requires labelled data, that means, examples of task and non-task sentences that the system can use as input to learn. And this was exactly one of the first challenges we faced: we had to manually annotate sentences that we considered tasks or non-tasks, but it was hard to agree between us, the annotators, if some sentences were placing a request or not.

Lampert et. al.’s [1, 2, 3] studies are a good starting point to understand the complexity of automatically extracting tasks from emails. These studies explore the levels of human agreement while manually annotating requests, and describe some interesting border cases that should be taken into consideration before labelling data.

In this post we describe some of those borderline cases that we also found during the development of Courier.

Borderline Cases


Task requests tend to be ambiguous in the sense of placing a direct or indirect obligation to a recipient. Take the following sentences for example:

  • I need your report by the end of the Week
  • I would like to get the report by the end of the Week
  • We need to finish the report by the end of the Week

We can argue that the first sentence is the more direct and explicit request, the second one could be a polite variation of the first one but still somehow indirect, while the obligation of the third one can not be fully determined only at the sentence level without more context: is the recipient of the email being considered part of this “we”?

Another ambiguity case takes place at the syntax level. Many requests follow some syntactic patterns, e.g., they are usually introduced in imperative form, which is characterized by the omission of the subject. However, some sentences which omit the subject are not imperatives, leading to further potential ambiguity. Both examples below follow the same syntactic pattern, but only the first sentece is a request.

  • Hope you can send me the report by the end of the Week
  • Hope you didn’t spend too much additional time on this


Some cases involve conditional clauses with the format “if this, then that”. Sometimes they are task related while other cases are not.

  • If you could send me your report soon, that would be appreciated
  • If you need more information, please call John

Here the first sentence makes use of a conditional clause to introduce the request, which can also be seen as a polite form. But the second example is just an hypothetical case that is not placing any direct or indirect obligation to the email recipient.

Requests for Inaction

Another common case is sentences that are asking for inaction or require the user to avoid doing something. The recipient of the email should not feel any obligation to perform something. However, it is common to find statements that are using a negated form and indeed are being used to request something:

  • Please do not reply to this e-mail
  • Please do not forget to send me your report

“Let me know…” pleasantries

Perhaps one of the most common border cases are polite requests that make use of the pattern “let me know…”:

  • Please let me know if you have any questions”.

This type of sentences are more of a polite farewell message than a genuine invitation to ask questions. However, the same form can be used to compel the interlocutor to perform an action:

  • Please let me know if you are going to attend
  • Please let me know the name of the contact to send the report

Third-Party Requests

Another interesting case is when a request places an obligation to a third party, considering that a textual conversation within an email can involve more than one recipient:

  • John, please call our accountant to set up a meeting
  • My assistant, John Doe, will call you to set up a meeting

In this scenario, we can only infer that they are clear requests if we know that the involved third party user is also a recipient of the email.

Attachment Review Requests

Finally, there are also many common statements in emails that simply inform of the presence of an attachment, but in a manner that can be interpreted as a request: “Please find attached a copy of the report.” In such cases it is hard to determine the degree of obligation without knowing more context:

  • Please see the attached invitation
  • Attached please find my resume for your review

Labelling Data

Taking into consideration the borderline cases described above, we defined our own annotation guidelines to help us decide if a candidate sentence was a task or not. Remember that this manually annotated data is intended to serve as input to train a ML classifier. Making sure that annotations are as clean and correct as possible is crucial in ML, because any trained model based on wrong labelled data would replicate the same mistakes as the human annotators. Defining what is a task and what is not a task, and having a clear agreement about it before labelling data, allowed us to make annotation decisions with a higher degree of reliability.

We compiled a corpus composed of 2000 sentences from the Enron corpus [4] and our own email inboxes. These sentences were previously automatically classified as Command/Request or Desire/Need by our Speech Act Classifier. Following the guidelines mentioned above, each sentence of the corpus was manually labelled as task or non-task by three annotators.

Inter-annotator agreement over a corpus of 2000 sentences

As a result, all annotators agreed on 1445 cases, which means that we have an observed agreement of 72.3% over the input corpus.

To put our results in perspective, it is important to mention that Lampert et. al. [1] report a Kappa agrement of 0.78 between three annotators to label sentences as Request-for-Action, which they consider a good agreement result. They also report a Kappa agreement of 0.60 for deciding the illocutionary strength of the each Request-for-Action, i.e., whether a sentence places a strong, medium or weak request. They consider this strength agreement as tentative.


We can conclude that our inter-annotator results show a reasonable level of agreement if we take into account the complexity of the problem, including all the linguistic, stylistic, pragmatics and polite variations that can be used to express requests in conversational emails.

With this post we wanted to give you a brief overview of the challenges we face while developing a system to automatically extract tasks from emails. In a subsequent post we will talk about the ML model we trained using our labelled dataset, as well as some of the post-processing heuristics we implemented to boost the classification performance. Meanwhile, have you tried Courier?


[1] Lampert, A., Paris, C. and Dale, R. (2007). Can Requests-for-Action and Commitments-to-Act be Reliably Identified in Email Messages? In Proceedings of the 12th Australasian Document Computing Symposium, pp 48–55, Melbourne, Australia.

[2] Lampert, A., Dale, R. and Paris, C. (2008a). The Nature of Requests and Commitments in Email Messages, In Proceedings of EMAIL-08: the AAAI Workshop on Enhanced Messaging, pp. 42–47, Chicago, USA.

[3] Lampert, A., Dale, R. and Paris, C. (2008b). Requests and Commitments in Email are More Complex Than You Think: Eight Reasons to be Cautious, In Proceedings of Australasian Language Technology Association Workshop, pp. 64–72. Hobart, Australia.

[4] Klimt, B. and Yang, Y. (2004). The Enron corpus: A new dataset for email classification research. In 15th Proceedings European Conference on Machine Learning, pp.217–226.