Package shorttext 1.0.0 released

Stephen Ho
1 min readJul 14, 2018

--

The Python package shorttext 1.0.0 has been released. This package provides functions and classes that facilitates the text preprocessing, the use of topic modeling, machine learning, various deep neural network architectures, and computation of certain metrics. It smoothes the process of text mining pipelines.

The package runs under Python 2.7, 3.5, and 3.6.

To install, type in the command line

pip install -U shorttext

You might need to add sudo in front to do it as admin. It provides functions and classes to do the following:

  • example data provided (including subject keywords and NIH RePORT);
  • text preprocessing;
  • pre-trained word-embedding support;
  • gensim topic models (LDA, LSI, Random Projections) and autoencoder;
  • topic model representation supported for supervised learning using scikit-learn;
  • cosine distance classification;
  • neural network classification (including ConvNet, and C-LSTM);
  • maximum entropy classification;
  • metrics of phrases differences, including soft Jaccard score (using Damerau-Levenshtein distance), and Word Mover’s distance (WMD);
  • character-level sequence-to-sequence (seq2seq) learning; and
  • spell correction.

# Links

The PyPI page: https://pypi.org/project/shorttext/

Github: https://github.com/stephenhky/PyShortTextCategorization

Documentations: http://shorttext.rtfd.io

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Stephen Ho
Stephen Ho

Written by Stephen Ho

Applied Quantitative Researcher

No responses yet

Write a response