How to Effectively Handle Millions of Rows in Bold BI

Ragavan Angamuthu
Bold BI
Published in
6 min readFeb 15, 2022
How to Effectively Handle Millions of Rows in Bold BI

Introduction

Many companies have adopted business intelligence tools for breaking down their data to gain business insight. Creating interactive dashboards with useful visualizations helps business owners to actively monitor their key metrics and KPIs. Such dashboards need to load and respond to user interactions quickly. A dashboard’s poor loading performance affects its widgets as well as the connected database’s querying performance. In this blog post, I’ll show you how to handle millions of rows of data with Microsoft SQL Server’s database columnar storage and indexing.

Let’s look at:

  • Setting up Microsoft SQL Server database with a sample data set and recordings.
  • Techniques for improved performance.
  • Creating a dashboard using Microsoft SQL Server database.
  • Performance metrics.

Setting up Microsoft SQL Server database

Let’s take a sample data set of over 3.7 million records. For this example, we are using a marketing data set in a Microsoft SQL Server database running in a local machine with the following environments.

Operating System   : Windows 10
CPU : 1.60GHz 1.80 GHz
RAM : 16.0 GB
Server Application : Microsoft SQL Server Management Studio v17.9.1

Executing a simple count statement, it takes six seconds for the first run to complete query execution. The second run takes only three seconds since it utilizes the Microsoft SQL Server default caching operations of previous query results.

Query   : SELECT COUNT(Channel_Id) FROM Marketing_Details_new
1st run : 6 Seconds
2nd run : 3 Seconds

You’ll find two more sample queries with recordings for testing purposes during both runs.

Query   : SELECT TOP 10 [Date], Sum(Cast([Expense] AS DECIMAL))FROM  Marketing_Details_new GROUP BY [Date] ORDER BY [Date]
1st run : 8 Seconds
2nd run : 3 Seconds,
Query : SELECT Sum(Cast([Expense] AS DECIMAL)) AS NEWCOLUMN FROM Marketing_Details_new
1st run : 9 Seconds
2nd run : 4 Seconds

Techniques for improved performance

Techniques for improved performance
Performance improvement techniques in SQL database

Using the following methods, we can speed up the data access and aggregation:

  • Indexing
  • Aggregated tables and views as data marts
  • Table partitioning

Indexing

A SQL index is a set of lookup tables designed to help users find items they require regularly. An index is a data structure that is small, fast, and designed for speedy searches. It’s great for linking relational tables and searching large databases.

By converting the rowstore to a columnstore, you can gain high performance for analytic queries that require scanning large amounts of data. Execute the following command to create the clustered columnstore.

CREATE CLUSTERED COLUMNSTORE INDEX MarketingColumnarIndex ON Marketing_Details_new;

To learn more about columnstores in Microsoft SQL Server, refer to the Microsoft documentation.

Starting with Microsoft SQL Server 2016, you can combine rowstore indices and columnstore indices on the same table. You can create a nonclustered row index on the same table by executing the following command. It will improve table seek performance, so queries that look for matching values or return a range of values will perform better in a row index than a columnar index.

CREATE INDEX MarketingIndex ON Marketing_Details_new([Date],[DestinationURL],[AdCost],[Visitors],[Visitors after landing page],[Sessions],[Impressions],[Clicks],[Revenue],[Expense],[Channel_Id],[Channel_Name],[Product_Used],[Leads],[Wons])

To verify whether the index has been created, try running the following command. This will describe the table details fully. At the bottom of the results, you can see that two indices are present.

sp_help [Marketing_Details_new]
Indexing performance
Indexing performance

Now, rerun the sample queries to test the indexing performance.

Indexing performance
Indexing performance

You can see that there is monumental improvement in SQL query performance once the columnar indexing and row indexing are done. To learn more about combining column and row indexing for data warehousing needs, refer to this documentation.

Aggregated tables and views as data marts

Aggregated tables or views as data marts in SQL are database tables that contain aggregated values by combining multiple rows of data. They are used to perform dynamic calculations and reduce the input/output, CPU, RAM, and swapping requirements. Also, they increase the SQL query performance with large data sets.

Now, let’s run the following queries to create the aggregated table and view as data marts with the marketing data set.

SELECT CONVERT(Date, getdate()) AS [Date],COUNT([DestinationURL]) AS [DestinationURL],SUM([AdCost]) AS [AdCost],SUM([Visitors]) AS [Visitors],SUM([Visitors after landing page]) AS [Visitors after landing page],SUM([Sessions]) AS [Sessions],SUM([Impressions]) AS [Impressions],SUM([Clicks]) AS [Clicks],SUM([Revenue]) AS [Revenue],SUM([Expense]) AS [Expense],COUNT([Channel_Id]) AS [Channel_Id],COUNT([Channel_Name]) AS [Channel_Name],COUNT([Product_Used]) AS [Product_Used],SUM([Leads]) AS [Leads],SUM([Wons]) AS [Wons]INTO marketingAggregatedFROM Marketing_Details_new GROUP BY Date ORDER BY Date ASC;
CREATE VIEW [dbo].[Marketing_View]([Date],[DestinationURL],[AdCost],[Visitors],[Visitors after landing page],[Sessions],[Impressions],[Clicks],[Revenue],[Expense],[Channel_Id],[Channel_Name],[Product_Used],[Leads],[Wons]) ASSELECT CONVERT(Date, getdate()),COUNT([DestinationURL]),SUM([AdCost]),SUM([Visitors]),SUM([Visitors after landing page]),SUM([Sessions]),SUM([Impressions]),SUM([Clicks]),SUM([Revenue]),SUM([Expense]),COUNT([Channel_Id]),COUNT([Channel_Name]),COUNT([Product_Used]),SUM([Leads]),SUM([Wons])FROM Marketing_Details_new GROUP BY Date;

Once the aggregated table or view is created, rerun the sample queries to test the SQL query performance.

Difference of original table, aggregated table, and aggregated view
Difference of original table, aggregated table, and aggregated view

You can see the improvement in SQL query performance once the aggregated table is created. It takes less than a second to fetch the data from the aggregated table. To learn more about SQL aggregate functions, refer to this documentation.

Table partitioning

Table partitioning is a database process. Large tables are divided into smaller parts based on the values in one or more table columns. The table partitioning helps you maintain the large tables, reduce the overall response time, and read and load data for the SQL operations.

Once the partition is done, let’s rerun the sample queries to check the SQL query performance.

Table partitioning
Table partitioning

To learn more about table partitioning in SQL, refer to this documentation.

Creating a dashboard using Microsoft SQL Server database

Let’s build a dashboard connecting to this Microsoft SQL Server database using Bold BI. To learn about how to connect with Microsoft SQL Server from Bold BI, read this blog post. Use the following widgets and data configuration to prepare a dashboard with the marketing data set.

Creating a dashboard using Microsoft SQL Server database
Creating a dashboard using Microsoft SQL Server database

Now, you can see the completed dashboard in the following image.

Marketing Summary Dashboard
Marketing Summary Dashboard

Performance metrics

The average performance of this dashboard is calculated with the previous techniques. To calculate the query execution time in Microsoft SQL Server, use the SQL profiler trace option by connecting with the database server.

Performance metrics
Performance metrics

Conclusion

We have used three techniques to improve the performance of SQL queries using a large data set. Every technique improves the handling of millions of data points. We find that aggregated tables or views as data marts load dashboards faster than the other techniques. But you can choose any of these techniques to increase the query performance for your case.

Bold BI helps you integrate dashboards in your applications written in ASP.NET Core, ASP.NET MVC, ASP.NET, Ruby on Rails, and more, seamlessly. It will help you save time and prevent you from doing redundant work. Click this link to explore its features. To learn more about embedding dashboards into your applications, refer to this blog and our help documentation.

Get started with Bold BI by signing up for a free 15-day trial and create interactive business intelligence dashboards. You can contact us by submitting questions through the Bold BI website or, if you already have an account, you can log in to submit your support questions.

Originally published at https://www.boldbi.com on February 15, 2022.

--

--

Ragavan Angamuthu
Bold BI
Writer for

Technical writer with 30 months of experience at Syncfusion & Cloud Destinations, and published 40+ blogs on BI, Embedded Analytics, Cloud & Automation topics.