Language Processing

Violet Whitney
Data Mining the City
2 min readNov 8, 2017

http://www.jillhubley.com/project/nyclanguages/

Counting the number of word occurances in YouTube Videos:

Using the Text Analyzer with Lyrics about New York City. Lyrics were taken from this source.

Bringing in text files.

def setup():
f=open('Lyrics-about-NYC.txt','r')
print(f.read())
def draw():
pass

Counting word occurrences in Python:

def setup():

count=0
count2=0
count3=0
f=open('Lyrics-about-NYC.txt','r')
myFileLines=f.readlines()
for line in myFileLines:
if " he " in line:
count=count+1
if " she " in line:
count2 = count2+1
if "New York City" in line:
count3 = count3+1

print("he: " + str(count))
print("she: " + str(count2))
print("New York City: " + str(count3))
f.close()
def draw():
pass

1 Bring in your own text file and count the number of occurrences of a spatially significant word.

Natural Language Processing

Natural language processing is inherently biased as its trained on historical models of text which is biased. Thus its important to remember these tools are very subjective and incorrect.

Measuring Quantity

Quickly analyze the number of times something is said:

NLPTKs:

There are a number of great natural language processing toolkits that are open source, some which are completely free with lots of tutorials. I highly recommend diving into this if you want to focus your project on language. However, these toolkits require installing separate applications that vary widely across computers. Because of that complication, we will not cover the install of these resources in the course. The following NLPTK’s and tutorials should guide you most of the way there:

Python NLTK
spaCy
Google NLPTK

Tutorials:

NLP Concepts with spaCy by Allison Parrish
Word Vectors by Allison Parrish
Text Analysis, Michelle McSweeney

--

--

Violet Whitney
Data Mining the City

Researching Spatial & Embodied Computing @Columbia University, U Penn and U Mich