Language Processing

Violet Whitney

Published in

Data Mining the City

2 min readNov 8, 2017

http://www.jillhubley.com/project/nyclanguages/

Urban Language Ecologies

The Urban Language Ecologies Projects explore the role that language plays in shaping urban space. Language is…

www.c4sr.columbia.edu

Counting the number of word occurances in YouTube Videos:

Spatializing Syria's YouTube War

Edit description

violetwhitney.com

Using the Text Analyzer with Lyrics about New York City. Lyrics were taken from this source.

Bringing in text files.

def setup():
    f=open('Lyrics-about-NYC.txt','r')
    print(f.read())
def draw():
    pass

Counting word occurrences in Python:

def setup():
    
    count=0
    count2=0
    count3=0    f=open('Lyrics-about-NYC.txt','r')
    myFileLines=f.readlines()
    for line in myFileLines:
        if " he " in line:
            count=count+1
        if " she " in line:
            count2 = count2+1
        if "New York City" in line:
            count3 = count3+1
    
    print("he: " + str(count))
    print("she: " + str(count2))
    print("New York City: " + str(count3))
    f.close()def draw():
    pass

1 Bring in your own text file and count the number of occurrences of a spatially significant word.

Natural Language Processing

Natural language processing is inherently biased as its trained on historical models of text which is biased. Thus its important to remember these tools are very subjective and incorrect.

Measuring Quantity

Quickly analyze the number of times something is said:

Text analysis, wordcount, keyword density analyzer, prominence analysis

text analysis

textalyser.net

NLPTKs:

There are a number of great natural language processing toolkits that are open source, some which are completely free with lots of tutorials. I highly recommend diving into this if you want to focus your project on language. However, these toolkits require installing separate applications that vary widely across computers. Because of that complication, we will not cover the install of these resources in the course. The following NLPTK’s and tutorials should guide you most of the way there:

Python NLTK
spaCy
Google NLPTK

Tutorials:

NLP Concepts with spaCy by Allison Parrish
Word Vectors by Allison Parrish
Text Analysis, Michelle McSweeney