MindsDB vs. EvaDB
We have been asked several times about the difference between MindsDB and EvaDB, as both systems provide a SQL interface to do AI inside a database system. In this article, our goal is to provide a comprehensive yet fair comparison of these two systems.
This article marks the first installment in a series of articles comparing EvaDB against similar database systems. Our goal is to help application developers pick the appropriate database system for building an AI application based on their objectives and constraints. We will explore multiple facets of these systems, including performance, system architecture, usability, integration, and more. In this article, we exclusively focus on performance — time taken to run the application’s queries.
Performance Showdown: A CNN News Summarization Case Study 🏎️💨
For comparing the runtime performance of these systems, we use a representative AI application that summarizes roughly 12,000 CNN news articles using Hugging Face’s text summarization model (specifically “sshleifer/distilbart-cnn-12–6”). If you are interested in checking out the dataset, here is the link to the Kaggle dataset. We solely used the test dataset split in our analysis. Our primary objective was to gauge the processing speed and efficiency of these two systems.
MindsDB:
In the case of MindsDB, the summarization task was achieved through a series of queries. The first query involved creating a Hugging Face model while configuring its parameters.
CREATE MODEL mindsdb.hf_bart_sum_20
PREDICT PRED
USING
engine = 'huggingface',
task = 'summarization',
model_name = 'sshleifer/distilbart-cnn-12-6',
input_column = 'article',
min_output_length = 5,
max_output_length = 100;
The subsequent query runs the summarization model on all the CNN news articles, generating a new table that holds the summarized content. MindsDB took 4 hours and 45 minutes to complete this task.
CREATE OR REPLACE TABLE sqlite_datasource.cnn_news_summary(
SELECT PRED
FROM mindsdb.hf_bart_sum_20
JOIN sqlite_datasource.cnn_news_articles
);
EvaDB
In EvaDB, we register a TextSummarizer function (user-defined function or UDF) and supply the corresponding Hugging Face parameters.
CREATE UDF IF NOT EXISTS TextSummarizer
TYPE HuggingFace
'task' 'summarization'
'model' 'sshleifer/distilbart-cnn-12-6'
'min_length' 5
'max_length' 100;
Once the UDF, TextSummarizer, is registered, it can be leveraged to summaries for all the CNN articles, which are then stored in a new table. EvaDB’s performance stood out, completing the task in 1 hour and 9 minutes under standard deployment conditions. Tuning a few parameters in EvaDB’s configuration to further improve GPU utilization led to a further reduction to just 42 minutes.
CREATE TABLE IF NOT EXISTS cnn_news_summary AS
SELECT TextSummarizer(article) FROM cnn_news_articles;
EvaDB runs the news summarization AI application 7 times faster than MindsDB.
Model Compatibility and Flexibility
A critical aspect of setting up these experiments involves the flexibility and compatibility of both database systems with various Hugging Face models. The necessity to specify columns during model creation can lead to the creation of multiple models for different columns or tables. In contrast, EvaDB’s architecture decouples models/functions from execution. So, you can use the same TextSummarizer model in EvaDB across multiple queries on different tables (i.e., datasets). This eliminates the need to maintain separate models for each table.
Furthermore, EvaDB’s approach to model output accommodates a wider range of output structures from Hugging Face models. While MindsDB sometimes encountered errors with models, EvaDB’s approach of returning the model’s output without restructuring facilitated better compatibility. This approach empowers users to harness the model’s output to suit their specific needs.
Conclusion
In this inaugural comparison of MindsDB and EvaDB, we focused on performance and support for HuggingFace models. Both platforms have the ability to easily run AI models inside your database. Due to its maturity, MindsDB offers a richer set of integrations than EvaDB. However, EvaDB clearly succeeds in performance and better utilization of GPU.
Have Questions?
Join our Slack community! We are actively engaged and eager to discuss EvaDB and your use cases.
Try it out!
Our documentation offers numerous use cases for your reference. If you don’t find your specific use case listed, please don’t hesitate to contact us. We’re here to assist!