Saving space and gaining speed: numbers’ storage in databases

Published in

Globant

7 min readJun 6, 2022

How to discover opportunities for improvement in database tables

This image shows how we can speed up the links — Photo by Sawyer Bengtson on Unsplash

As per standard practice, using a float data type column to just store integer values is no good; it requires more space and affects query performance. We may be able to find out whether we really need float data types by analyzing the values stored in float columns, and then we will be able to take a proper decision. However, if we want to know which float type columns have integer values by manual operation, that would be very time-consuming with high chances of error.

In this article, we will see how we can find the list of float-type columns of database tables that have integer values, to enable possible column type optimizations. This article is written considering you have an intermediate level of expertise in Azure SQL, temporary tables, and dynamic query writing.

Problem statement

In general, we define the column data type as per the data we are expecting into it, and if some other type of value comes into that column, it may impact performance and produce errors. In our case, it is not advised to use float fields to just store integer values. We can detect such columns manually, but it involves lots of time, and error occurrence would be on the higher side. With a large number of tables and columns, it wouldn’t be recommended to go ahead with manual operation. We want to:

1. Find which columns are numeric

2. Find which columns could be optimized

Find which columns are numeric

In the beginning, we need to find out a list of tables columns from across the database that have float as a datatype. We can get these details from INFORMATION_SCHEMA.COLUMN table after applying a filter on the DATA _TYPE column with value as a ‘float’.

SELECT TABLE_NAME
 ,COLUMN_NAME
 ,'SELECT COUNT(*) FROM ' + TABLE_NAME total_cnt
 ,'SELECT COUNT([' + COLUMN_NAME + ']) FROM ' + TABLE_SCHEMA + '.' + TABLE_NAME + ' 
WHERE ' + 'CHARINDEX(''.'',[' + COLUMN_NAME + ']) = 0 ' total_int
FROM INFORMATION_SCHEMA.COLUMNS
WHERE DATA_TYPE = 'float';

In the above SQL statement, we are getting table name, column name, and two SQL dynamic queries to get the total record count and total integer count for the table. Refer to the below screenshot for the SQL statement result.

We will get this kind of output when we run the mentioned SQL statement — SQL Statement Output

We generated two dynamic queries to get results about columns. The following query counts how many records are there in each table.

SELECT COUNT(*) FROM <table_ name>

When we run the query statement mentioned above , we will get total record counts as shown — Total Record Count from Table

The second query lets us know how many of those records have integer values.

SELECT COUNT([<column_name>]) FROM <table_ name> WHERE CHARINDEX('.',[<column_name>]) = 0

When we run the query mentioned above Integer count will provide the result like this — Total Integer Records present in the Table

We can determine whether the value is float or integer by checking whether it has a (decimal) period somewhere; if not, it’s an integer value. We check if there’s a period using the CHARINDEX function: if not present, it returns a zero.

Find which columns could be optimized

Creating a cursor on the query which we have defined in the “Find column from the whole Database” section will provide the output as follows. Now we will get the details of the table_name, Column_name of Float data type, a query which will provide us a total count of records present in the table having float datatype column, and a query which will provide the integer records count for float data type column.

When we run the cursor which we defined earlier then we will get the result in this format as shown in image — Cursor Output

We will go through the cursor one row at a time. We have created a variable counter to keep a count of records that are being processed in declare section. We need a temporary table to store the total record count of the table in it, with columns sr_no with identity property, int_count, column_name, float_count, and total_count. The following statement creates the table.

CREATE TABLE [dbo].[integer_check] (
 [sr_no] [int] IDENTITY(1, 1) NOT NULL
 ,[int_count] [int] NULL
 ,[table_name] [nvarchar](100) NULL
 ,[column_names] [nvarchar](100) NULL
 ,[total_count] [int] NULL
 ,[float_count] [int] NULL
 );

For a better understanding see below.

This image will let us know the result table details like how columns are defined and purpose of each column — Result Saving Table’s Details

Let’s go ahead with the working part now. As mentioned earlier, we have created the cursor with filtered records of float datatype. We will process it in a loop till the fetch status of the record is 0. We will be incrementing the counter variable value by one for each record of the loop. Below are the actual steps getting performed inside the loop.

WHILE @@Fetch_Status = 0
BEGIN
 SET @counter = @counter + 1; – Step 1TRUNCATE TABLE #tmp_table; – Step 2INSERT dbo.integer_check (int_count)
 EXECUTE (@v_query);        – Step 3INSERT #tmp_table
 EXECUTE (@total_cnt); – Step 4UPDATE dbo.integer_check
 SET [column_names] = @COLUMN_NAME
  ,[table_name] = @TABLE_NAME
  ,[total_count] = (
   SELECT *
   FROM #tmp_table
   )
 WHERE sr_no = @counter; – Step 5
 UPDATE dbo.integer_check
SET float_count = (CAST(total_count AS INT) - CAST(int_count AS INT)); – Step 6

Step mentioned in above block are explained in sequence as follows

We increase the counter variable value by 1 each time inside the loop which we define in the declaration.
We will truncate the temp table first.
Filtered SQL query for integer count will provide result dynamically from a written dynamic query using execute command. That result would be stored in the int_count column of the Interger_check result table along with that sequential value would be stored in the sr_no column due to identity property.
We will put the result provided by the filtered dynamic query of the total count of records of the table into a temporary table.
We will update the integer_check result table’s columns table_name,column_name from cursor’s table_name and column name as well as total_count column will be updated by taking the value of total count from the temporary table while matching sr_no with a counter like if we are going through 2 nd record then sr_no value and counter value will be 2 which will make sure that correct record is being updated.
We have processed all filtered records using cursor and looping. We have the total count and integer count of values inside the integer_check table. We will subtract integer count from total count to get the float count value as shown below

One point to remember here is that the float count will also consider null values. If we have a column with float datatype having all NULLs, the float count will equal the total count. The final output would look like this.

This image shows the final output we received after the code run which will help us to take a decision on column optimization — Final Output Of code

From the output, we see that rows 7 and 8 have all integer values, and we might conclude that we have wrongly defined the data type. Before jumping to conclusions, think about why we defined the float datatype: from the column name itself, we can say it’s a discount percentage, which could always be an integer, but maybe not! Reasoning about the discount amount suggests that it may have non-integer values; it would depend on the discount calculation. In short, be mindful while thinking about changing the data type only on a result basis but also look at the purpose of using float datatype or future scenarios.

Conclusion

We know that it is not a good practice to keep integer values in the float column since it would affect the performance of the query. With the solution provided, we can get these details within a few minutes without any error. A bonus is that we can use this result to check whether we need that data type for a column; if not we can change it after analysis.

Please refer to the below links, which would be helpful for you to get better in SQL Query writing.

https://www.techonthenet.com/sql_server/index.php

https://www.w3schools.com/sql/

Code

Please have a look at the code block for your reference. You can also convert this block of code into a procedure too as per your requirement.

DECLARE @v_query NVARCHAR(1000)
 ,@TABLE_NAME NVARCHAR(1000)
 ,@COLUMN_NAME NVARCHAR(1000)
 ,@total_cnt NVARCHAR(100)
 ,@count INT
 ,@counter INT = 0;
IF EXISTS (
  SELECT *
  FROM sys.objects
  WHERE object_id = OBJECT_ID(N'[dbo].[integer_check]')
  )
BEGIN
 DROP TABLE [integer_check];
END;
CREATE TABLE [dbo].[integer_check] (
 [sr_no] [int] IDENTITY(1, 1) NOT NULL
 ,[int_count] [int] NULL
 ,[table_name] [nvarchar](100) NULL
 ,[column_names] [nvarchar](100) NULL
 ,[total_count] [int] NULL
 ,[float_count] [int] NULL
 );
DROP TABLE IF EXISTS #tmp_table;SELECT 1 AS val INTO #tmp_table;DECLARE cur_new CURSOR
FOR
SELECT TABLE_NAME
 ,COLUMN_NAME
 ,'SELECT COUNT(*) FROM ' + TABLE_NAME total_cnt
 ,'SELECT COUNT([' + COLUMN_NAME + ']) FROM ' + TABLE_SCHEMA + '.' + TABLE_NAME + ' 
WHERE ' + 'CHARINDEX(''.'',[' + COLUMN_NAME + ']) = 0 ' total_int
FROM INFORMATION_SCHEMA.COLUMNS
WHERE DATA_TYPE = 'float';OPEN cur_newFETCH NEXT FROM cur_new INTO @TABLE_NAME,@COLUMN_NAME,@total_cnt,@v_queryWHILE @@Fetch_Status = 0
BEGIN
 SET @counter = @counter + 1;TRUNCATE TABLE #tmp_table;INSERT dbo.integer_check (int_count)
 EXECUTE (@v_query);INSERT #tmp_table
 EXECUTE (@total_cnt);UPDATE dbo.integer_check
 SET [column_names] = @COLUMN_NAME
  ,[table_name] = @TABLE_NAME
  ,[total_count] = (
   SELECT *
   FROM #tmp_table
   )
 WHERE sr_no = @counter;FETCH NEXT FROM cur_new INTO @TABLE_NAME,@COLUMN_NAME,@total_cnt,@v_query;
END;
CLOSE cur_new;
DEALLOCATE cur_new;UPDATE dbo.integer_check
SET float_count = (CAST(total_count AS INT) - CAST(int_count AS INT));