The Python package shorttext 1.0.0 has been released. This package provides functions and classes that facilitates the text preprocessing, the use of topic modeling, machine learning, various deep neural network architectures, and computation of certain metrics. It smoothes the process of text mining pipelines.
The package runs under Python 2.7, 3.5, and 3.6.
To install, type in the command line
pip install -U shorttext
You might need to add
sudo in front to do it as admin. It provides functions and classes to do the following:
- example data provided (including subject keywords and NIH RePORT);
- text preprocessing;
- pre-trained word-embedding support;
gensimtopic models (LDA, LSI, Random Projections) and autoencoder;
- topic model representation supported for supervised learning using
- cosine distance classification;
- neural network classification (including ConvNet, and C-LSTM);
- maximum entropy classification;
- metrics of phrases differences, including soft Jaccard score (using Damerau-Levenshtein distance), and Word Mover’s distance (WMD);
- character-level sequence-to-sequence (seq2seq) learning; and
- spell correction.
The PyPI page: https://pypi.org/project/shorttext/