Training a Swedish NER-model for Stanford CoreNLP part 2

Andreas Klintberg
Nov 2, 2015 · 4 min read

Getting started, the properties file

trainFile = output_clean_training.txt
serializeTo = se-ner-model.ser.gz
map = word=0,answer=1

useClassFeature=true
useWord=true
useNGrams=true
noMidNGrams=true
maxNGramLeng=6
usePrev=true
useNext=true
useSequences=true
usePrevSequences=true
maxLeft=1
useTypeSeqs=true
useTypeSeqs2=true
useTypeySequences=true
wordShape=chris2useLC
useDisjunctive=true
trainFile = output_clean_training.txt
serializeTo = se-ner-model.ser.gz
map = word=0,answer=1
useClassFeature=true
useWord=true
useNGrams=true
noMidNGrams=true
maxNGramLeng=6
usePrev=true
useNext=true
useSequences=true
usePrevSequences=true
maxLeft=1
useTypeSeqs=true
useTypeSeqs2=true
useTypeySequences=true
wordShape=chris2useLC
useDisjunctive=true

Training time!

java -cp stanford-ner.jar edu.stanford.nlp.ie.crf.CRFClassifier -prop swedish-ner.prop
-mx8g 

Testing the model

java -cp stanford-ner.jar edu.stanford.nlp.ie.crf.CRFClassifier -loadClassifier se-ner-model.ser.gz -testFile output_clean_test.txt

Results

Andreas Klintberg

Written by

Prototyper and tinkerer | Data Science@Meltwater

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade