SpaCy Library Cheatsheet

Farnaz Ghassemi Toudeshki
3 min readMay 30, 2024

SpaCy

  • Library for Natural Language Processing
  • pre-trained statistical models and word vectors
  • Convolutional neural network models for tagging, parsing and named
    entity recognition
  • Interacts well with Deep Learning Libraries

1-Sentence Segmentation

import spacy

# Load English Model
nlp = spacy.load('en')
text = "Twenty-two years after the original Jurassic Park failed, the new park,also known as Jurassic World, is open for business. After years of studying genetics, the scientists on the park genetically engineer a new breed of dinosaur, the Indominus Rex."

# Run SPaCy pipeline
sp_text = nlp(text)

# Segment into sentences
for sentence in sp_text.sents:
print(sentence)

2- Tokenizing

import spacy

# Load English Model
nlp = spacy.load('en')
text = "Twenty-two years after the original Jurassic Park failed, the new park,also known as Jurassic World, is open for business. After years of studying genetics, the scientists on the park genetically engineer a new breed of dinosaur, the Indominus Rex."
# Run SpaCy pipeline
sp_text = nlp(text)
# Get tokens
for word in sp_text:
print(word.text)

--

--