Build an AI Assistant with Wolfram Alpha and Wikipedia in Python
Wolfram Alpha is a computational search engine that tends to evaluate what the user asks. Imagine asking a question like “What is the current weather in London” or “Who is the president of United State of America”. Wolfram Alpha will be able to evaluate the question and respond with an answer like “15 degrees centigrade” or “Donald Trump”.
Wikipedia, however, is a search engine that unlike Wolfram, does not compute or evaluate the question but rather searches for the keywords in the query. For example, Wikipedia cannot answer the questions like “What is the current weather in London” or “Who is the president of United State of America” but can search for keywords like “Donald Trump” or “London”.
In this tutorial, these two platforms (Wikipedia & Wolfram) will be combined to build an intelligent assistant using python programming language
Things we need
- Make sure you have python installed
If you prefer using a virtual environment, you can find a tutorial here on how to create one
Get Wolfram Alpha App ID
You can register on the Developer’s Portal to create an AppID. (Note: This ID will be deleted)
Application Workflow
User’s input will be passed to Wolfram Alpha for processing. if a result is obtained, the result will be returned to the user. If no result is obtained, an interpretation of the input is used as a keyword(s) for Wikipedia query.
Lets start coding
Let’s begin by installing all the required python packages using PIP
pip install wolframalpha
pip install wikipedia
pip install requests
- Create a python file and open it with any code editor of your choice
- Import the pre-installed packages
import wolframalpha
import wikipedia
import requests
Implementing Wikipedia Search
Let’s create a function “search_wiki” that takes the keyword as parameter
# method that search wikipedia...
def search_wiki(keyword=''):
# running the query
searchResults = wikipedia.search(keyword)
# If there is no result, print no result
if not searchResults:
print("No result from Wikipedia")
return
# Search for page... try block
try:
page = wikipedia.page(searchResults[0])
except wikipedia.DisambiguationError, err:
# Select the first item in the list
page = wikipedia.page(err.options[0])
#encoding the response to utf-8
wikiTitle = str(page.title.encode('utf-8'))
wikiSummary = str(page.summary.encode('utf-8'))
# printing the result
print(wikiSummary)
The wikipedia.DisambiguationError occurs when Wikipedia returns multiple results as shown below. Therefore, the first result (at index=0) will be selected
wikipedia.DisambiguationError:
“Trump” may refer to:
Donald Trump
Trump (card games)
…
Tromp (disambiguation)
Implementing Wolfram Alpha Search
Create an instance of wolfram alpha client by passing the AppID to its class constructor
appId = ‘APER4E-58XJGHAVAK’
client = wolframalpha.Client(appId)
The image below shows a sample response returned by Wolfram Alpha. The important keys are: “@success”, “@numpods” and “pod”
- “@success”: This means that Wolfram Alpha was able to resolve the query
- “@numpods”: Is the number of results returned
- “pod”: Is a list containing the different results. This can also contain “subpods”
- The first element of the pod list “pod[0]” is the query interpretation and the first subpod element has a key “plaintext” containing the interpreted result
- The second element of the pod “pod[1]” is the response that has the highest confidence value (weight). Similarly, It has a subpod with key “plaintext” containing the answers.
Note: Only “pod[1]” with key “primary” as “true” or “title” as “Result or Definition” is considered as the result
You can read more about the “pods” and “subpods” here
So, let’s create a method “search” and pass the “search text” as a parameter.
def search(text=''):
res = client.query(text)
# Wolfram cannot resolve the question
if res['@success'] == 'false':
print('Question cannot be resolved')
# Wolfram was able to resolve question
else:
result = ''
# pod[0] is the question
pod0 = res['pod'][0]
# pod[1] may contains the answer
pod1 = res['pod'][1]
# checking if pod1 has primary=true or title=result|definition
if (('definition' in pod1['@title'].lower()) or ('result' in pod1['@title'].lower()) or (pod1.get('@primary','false') == 'true')):
# extracting result from pod1
result = resolveListOrDict(pod1['subpod'])
print(result)
else:
# extracting wolfram question interpretation from pod0
question = resolveListOrDict(pod0['subpod'])
# removing unnecessary parenthesis
question = removeBrackets(question)
# searching for response from wikipedia
search_wiki(question)
Extracting Item from Pod — Resolving List or Dictionary Issue
If the pod has several subpods, then we select the first element of the subpod and return the value of the key “plaintext”. Else, we just return the value of the key “plaintext”
def resolveListOrDict(variable):
if isinstance(variable, list):
return variable[0][‘plaintext’]
else:
return variable[‘plaintext’]
Remove Parenthesis (Brackets)
Here, we are splitting the bracket from the text and selecting the first item e.g. “Barack Obama (Politician)” will return “Barack Obama”
def removeBrackets(variable):
return variable.split(‘(‘)[0]
Enhancing the Search Result with Primary Image
It will be better if we can attach a primary image to the search result. For example, searching for “Albert Einstein” will return both text and his image in the result. To get the primary image of a query from Wikipedia, one needs to access it via a REST endpoint: (titles = Keyword)
https://en.wikipedia.org/w/api.php?action=query&titles=Nigeria&format=json&piprop=original&prop=pageimages
The “pages” dictionary may contain zero or more items. Usually, the first item is the primary image
def primaryImage(title=''):
url = 'http://en.wikipedia.org/w/api.php'
data = {'action':'query', 'prop':'pageimages','format':'json','piprop':'original','titles':title}
try:
res = requests.get(url, params=data)
key = res.json()['query']['pages'].keys()[0]
imageUrl = res.json()['query']['pages'][key]['original']['source']
print(imageUrl)
except Exception, err:
print('Exception while finding image:= '+str(err))