Otaku: The Anime Chatbot — Conversational AI

Aishvarya Jaishankar
8 min readMar 1, 2023

--

An Interactive NLP-based AI Experience in Linguistic Creativity | Python & NLTK.

Otaku is an interactive text-based dialogue system that can simulate human conversations through Natural Language Processing (NLP) by performing tasks like small talk, retrieving information about anime, and engaging the anime enthusiasts through its personality!

Academic Project: Human-AI Interaction, University of Nottingham

Duration: September — November 2022

Platform: Developed on Jupyter Notebook and VSCode

Introduction

In recent years, Conversational AI chatbots that use natural language have revolutionized the way people interact with systems to access information and have improved task efficiency. In this case study, conversational AI uses Natural Language Processing techniques to automate tasks, reduce search time, and improve the user experience to develop the chatbot ‘Otaku’. On the other hand, computational linguistics is a broader field that encompasses areas of NLP & Conversational AI focusing on specific areas like answering questions or engaging in casual talks.

“The inspiration behind building a conversational AI bot with a focus on anime was to explore the fundamental concepts of computational linguistics with a blend of linguistic creativity”

This chatbot can later be integrated into websites or apps to provide support and assistance to people interested in anime & manga!

Prelude to the Creation of Chatbot on Anime

Otaku, the chatbot is an AI dialogue system that retrieves information about anime. Otaku is capable of greeting the user, asking for and storing the user’s name through identity management, making basic small talk, and answering questions based on anime. The bot is named ‘Otaku, drawing parallels to which in Japanese refers to people having a deep consuming interest, particularly in anime and manga symbolizing the essence of their cultural styles (Akio Nakamori., 1983).

“ Miyazaki Hayao‘s anime stood at the center of otaku culture and anime fandom in the 1980s”

The motivation behind creating a chatbot focused on anime was to explore the concept of ‘Linguistic Creativity’, that is to use language as a medium of expression, to create a newer meaning by the conversational AI, and make it understandable in all forms by the user (Veale., 2011).

Jupyter Notebook

According to Cohn, N (2012) comics like manga and anime- an animated version of the same in Japanese culture reflect upon human expression with a focus on linguistics through speech gestures. It highlights the use of computational linguistic methods of code to explore structures and patterns of language to better understand the structure of anime, an extended comic. Another aspect of anime revolves around conceptual metaphors in emotions through linguistic creativity. As creativity is always construal in language, creative text retrieval of information is essential for the response expressions to convey a non-literal meaning of the usual sense (Veale, 2011). Figurative language is thus used to convey deeper meanings in applications like anime of linguistic creativity.

Otaku, the chatbot on anime learns through the trained datasets and is able to retrieve creative language and communicate through a rich meaningful relationship between user queries and response texts. This piqued my interest to build a chatbot around anime that is a form of expressive learning and interaction making Otaku available for all anime enthusiasts!

Natural Language Processing

Natural language processing is an interdisciplinary domain that merges computer science and artificial intelligence. It consists of two subfields, Natural Language Understanding (NLU) and Natural Language Generation (NLG). NLU mainly focuses on understanding the text as natural language from humans by machines and finding insights through analysis. NLG is when the natural language is generated by machines in response to a human query. This dialogue-based versatile chatbot was created using Python, NLTK, & Scikit libraries for the NLP Pipeline, similarity functions, and text classification.

Chatbot ‘Otaku’ Implementation Strategy

The Information retrieval of the chatbot in Anime is an intersection of Linguistic Creativity and Natural Language Generation (Cohn., N 2012). Otaku is trained with data on anime by web-scrapping relevant websites to find pertinent responses, appropriate semantic words, and extensive information for NLG tasks. The flowchart below depicts the conversational flow of interaction for Otaku as executed in VS Code and Jupyter Notebook.

User Flow Diagram of Chatbot Otaku (Anime), created in MIRO

Proposed ‘Otaku’ Chatbot System

Otaku chatbot is implemented considering four main concepts namely, text pre-processing, language modeling, text classification, and similarity matching. It performs the below features that correspond to its objective of learning and talking about anime!

Greeting Block

When the user inputs a greeting, Otaku chooses from a list of responses and randomly outputs a greeting back to the user indicating a positive start to the conversation.

Name Management Block

The chatbot asks for the name of the user at the beginning, memorizes it, and uses it in addressing the user throughout the conversation for a personalized experience. Although the chatbot is built with a keenness to improve its knowledge and functionality around Anime, research indicates that the chatbot with a personality offers a stable pattern to the users’ perceptions and adds consistency to the user experience with the interface. Incorporating identity management shows the personality of the chatbot to be more relatable to users throughout the conversation (Smestad and Volden, 2019).

If the intent of the user is to change the name, the previously stored name is overwritten with the new name. To recall the user's name, Otaku is given the query — “What is my name?” and the input lines to change the user's name are strings like ‘Call me’, ‘My name is’ or ‘change my name to’ serving as indicators to the intent.

Otaku does Turn-Taking Conversations on Small Talk

The small talk focuses on short conversations and works using keyword matching. It is based on turn-taking conversations on a set of question-answer pairs. As the user asks basic small talk queries, the chatbot Otaku relates to the answer of the question that closely matches the query. The data pairs include a mix of both casual and factual questions. As the questions and answer statements are already framed short, there is no need for further preprocessing to be done. Small talk is also made into a vector matrix and works with computing cosine similarity.

Conversational Information Retrieval Block

Otaku acts as a search engine to retrieve answers to broad questions based on matching the keyword in the dataset. If the user's query matches the question-answer intent, then the contents of the datasets & query are lowered, tokenized into smaller units of text, and punctuations are removed. I have used ‘Lemmatization’, as word standardization over stemming as each word is brought down to its dictionary meaning producing a more accurate and consistent pre-processing. It also uses Part-of-speech(POS) tagging to classify and label words into their Parts-of-Speech.

Next, the content is filtered of its stop words according to the frequency of the word. As the Bag-of-words model only creates vectors with the presence of known words, and counts of occurrences and uses raw frequency in weighing, I considered TF-IDF as the weighing function to make rare words more prominent, remove the common words, and thereby normalize the count. The idea behind that is the ‘log’ of the Inverse document frequency gives an IDF score of 0. Moreover, as Count Vectorizer uses the ‘bag-of-words’ approach to count the occurrence of words, I built this feature populating the vector matrices with the TF-IDF values generated using TF-IDF vectorizer for focusing on the frequency of words and providing the importance of each word.

Lastly, the Cosine Similarity metric is commonly used to measure the similarity between the dataset and the query in Information retrieval. The cosine of the angle between the two vectors is calculated and the answer to the question from the Anime dataset with the highest similarity to the query is retrieved back as a response to the user.

The Main Block — Intent Matching

This is the main loop of the Anime chatbot that takes in the user’s input, predicts the intent of the user’s query, and calls for the respective function block.

Conclusion

The development of Otaku thus gave me an opportunity to explore the various facets of Natural Language Processing i.e NLU which happens during turn-taking conversations while NLG takes place at the point of retrieving relevant responses to the user’s questions. The major challenge while creating the experience was to keep the conversation flow as natural and similar to that of humans.

The Prototype

References

  1. Tony Veale. 2011. Creative Language Retrieval: A Robust Hybrid of Information Retrieval and Linguistic Creativity. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 278– 287, Portland, Oregon, USA. Association for Computational Linguistics.
  2. Cohn, N. (2012). Comics, Linguistics, and Visual Language: The Past and Future of a Field. In: Bramlett, F. (eds) Linguistics and the Study of Comics. Palgrave Macmillan, London. https://doi.org/10.1057/9781137004109_5
  3. Jordanous, A. (2019). Evaluating Evaluation: Assessing Progress and Practices in Computational Creativity Research. In: Veale, T., Cardoso, F. (eds) Computational Creativity. Computational Synthesis and Creative Systems. Springer, Cham. https://doi.org/10.1007/978-3-319-43610-4_10
  4. Caliskan A, Bryson JJ, Narayanan A. Semantics derived automatically from language corpora contain human-like biases. Science. 2017 Apr 14;356(6334):183–186. doi: 10.1126/science.aal4230. PMID: 28408601.
  5. Smestad, T.L., Volden, F. (2019). Chatbot Personalities Matters. In: , et al. Internet Science. INSCI 2018. Lecture Notes in Computer Science(), vol 11551. Springer, Cham. https://doi.org/10.1007/978-3-030-17705-8_15
  6. Ruane, E., Farrell, S., Ventresque, A. (2021). User Perception of Text-Based Chatbot Personality. In: , et al. Chatbot Research and Design. CONVERSATIONS 2020. Lecture Notes in Computer Science(), vol 12604. Springer, Cham. https://doi.org/10.1007/978-3-030-68288- 0_3 9
  7. Faur, C., Clavel, C., Pesty, S. and Martin, J.C., 2013, September. PERSEED: A self-based model of personality for virtual agents inspired by socio-cognitive theories. In 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction (pp. 467–472). IEEE.

--

--

Aishvarya Jaishankar

I am a HCI graduate student at Univ of Nottingham, passionate about designing interfaces to craft digital experiences of products to solve real-world problems.