Protein Alignment and Search by Graphical Models
Finding similar proteins or protein sequences by aligning has many applications in biological field.
All these algorithms use matrix substitution and gap scoring to retrieve globally aligned sequences, with variations in implementation. This can be used for aligning local sequences too.
Instead of searching by matrix substitution methods, using Graph database based Graphical Reasoning algorithm to find the aligned sequences is possible.
Word sequences and protein sequences are comparable as sequences with structure. Consider various human languages, two natural languages differ in grammar but work as sequence of words with structure.
If a grammar algorithm works for more than one natural language, it can also work in sequences found in other fields such as protein sequences. It can be used in motif extraction too because it searches based on patterns.
Check out the demo NaturalText Protein Search
Data used : Random FASTA formatted downloaded from NCBI
Number of Protein Sequences : 25000
Database : custom developed General Purpose database as Graph Database
Graph Algorithm : Custom developed Graph Framework
Hardware Details : 2 core, 2 GB RAM.
Execution Details : Pure Python based single process execution
As this is a proof of concept and hosted in low config machine, it may be slower than existing solution.
Originally published at naturaltext.com.