Package shorttext 1.0.0 released

Stephen Ho
1 min readJul 14, 2018

--

The Python package shorttext 1.0.0 has been released. This package provides functions and classes that facilitates the text preprocessing, the use of topic modeling, machine learning, various deep neural network architectures, and computation of certain metrics. It smoothes the process of text mining pipelines.

The package runs under Python 2.7, 3.5, and 3.6.

To install, type in the command line

pip install -U shorttext

You might need to add sudo in front to do it as admin. It provides functions and classes to do the following:

  • example data provided (including subject keywords and NIH RePORT);
  • text preprocessing;
  • pre-trained word-embedding support;
  • gensim topic models (LDA, LSI, Random Projections) and autoencoder;
  • topic model representation supported for supervised learning using scikit-learn;
  • cosine distance classification;
  • neural network classification (including ConvNet, and C-LSTM);
  • maximum entropy classification;
  • metrics of phrases differences, including soft Jaccard score (using Damerau-Levenshtein distance), and Word Mover’s distance (WMD);
  • character-level sequence-to-sequence (seq2seq) learning; and
  • spell correction.

# Links

The PyPI page: https://pypi.org/project/shorttext/

Github: https://github.com/stephenhky/PyShortTextCategorization

Documentations: http://shorttext.rtfd.io

--

--

Stephen Ho
Stephen Ho

Written by Stephen Ho

Applied Quantitative Researcher

No responses yet