Database Design Considerations — Questions answered

Subha Ganapathi
Apr 19 · 8 min read

Database design can be a tricky and laborious project and is often associated with stringent timelines. Below are some questions that you need to have answered before starting a database design initiative. In this article, we go through each of the below questions in a detailed manner.

•What are some things you should plan while doing your database design?

•What are the steps involved in a database design project?

•Should indexing be done before loading data into the database or after loading data into the database?

•When to use Row-store Indexing/When to use Column-store Indexing?

•What are some nice features of SQL Server with respect to Indexes and Constraints?

•Should I use only one type of indexing (Rowstore/Columnstore) throughout?

•Which should be built first? — Clustered Index or Non-Clustered Index

•How do I tackle issues with fragmentation caused by indexing?

Let’s see the answers to these questions in a detailed manner.

What are some things you should plan while doing your database design?

The following are some aspects that should be planned while planning your database design -

•Identify constraints, indexes and relationships

•Note that indexes are only a best-case estimate and will have to be tweaked after database load

•Determine whether your tables need a Row-store or a column-store.

•You might have to do a mix of Row-store indexing and column-store indexing based on the need for each table.

•As a best practice, non-clustered indexes must be created on all foreign keys in a table.

Moving on to the next -What are the steps involved in a database design project?

A database design project involves the following steps -

Ø Conceptual Database Design

A conceptual database design/conceptual data model is a high-level conceptual model which does not include any details about the database structure. It is created by business stakeholders and data architects and is intended for business audience. It is independent of any infrastructure/capabilities like data storage, databases, etc. It is primarily designed as a high-level abstract document in order to showcase data flow to a real-world business user.

Ø Logical Database Design

A logical data model is usually created using the Conceptual Database design model (or some other specific data model). In this stage, the design document will contain the relationships between tables and will also contain datatypes for the attributes. In most cases, the data is 3NF normalized at this stage.

Ø Physical Database Design

This is the next step from the logical database design and is related to implementation on secondary storage. This is a more detailed document having details on base relationships, indexes, constraints, and so on.

Ø Application Design

Application Design is a parallel activity to the Database Design. As the name implies, this is the design of the user interface and the applications and programs that are used by the database and those that are used to process (e.g. feed the database) the database.

The schematic below shows the different steps involved in a database planning activity. (This could be done in any of the above design phases.)

Schematic — Different steps in a database planning activity

Moving on to the next question- Should indexing be done before loading data into the database or after loading data into the database?

•When we start creating the Database Design document, we plan for Indexes, Constraints based on the available business rules, and based on the relationships identified between the tables. At this point, we do not know how data will be accessed once data has been loaded. This is a best case estimate of what indexes to have. Hence, this is called ‘best effort’ estimation of indexing plans.

•Once data is loaded and we have users accessing the data, we start understanding data access patterns. This is when we add more indexes as required.

•Hence, index planning will happen both before and after loading data into the tables of the database.

Let’s move to the next question- When to use Row-store Indexing/When to use Column-store Indexing?

Before we see the answer to this question, let’s understand Row-store indexes and column-store indexes.

Both Row-store and Column-Store are data storage formats provided by Microsoft SQL Server. Until before 2012, data storage was always in a row-store. In 2012, Microsoft introduced a new format called Column Store promising increased compression rates. Both types of indexes (1. clustered 2. non-clustered) can be done in both types of storage formats.

Fig. 1.0 Row-Store and Column-Store Indexes

Let’s understand row-store and column-store indexes with an analogy -

•Imagine that you have a file of size 5GB. Let’s assume that the file contains the columns City, State and Country. There are 1 million records. You zip the file and send it to person X, your colleague who wants to see the number of occurrences of a city. X unzips it and looks for records containing the city of ‘New York’. The search takes around 10 minutes as each of the 1 million records needs to be scanned. This is analogous to a table scan without indexing.

•Let’s say that now you decide to split the data files into 1000 files with a size of 5 MB each. You send this to Person X. Now, Person X opens each 5 MB file and looks for records with city ‘New York’. The search is faster than the previous one. However, there is the work of having to unzip and open each file 1000 times. This is analogous to a row store storage format.

•Now, you decide to split the data files not by their records but by their columns. Now, we will have 3 files one for each column. Person X now can open the necessary file (in this case the ‘city’ file) just once and search for New York to see the number of occurrences. This is analogous to a column store storage format.

Row-store (using the analogy above)

Fig 1.1 Row-Store

Column-store (using the analogy above)

Fig 1.2 ColumnStore

Column-store indexes have some limitations in the datatypes that they support. The following are some unsupported datatypes in Column-store indexes -

•ntext, text, and image

•nvarchar(max), varchar(max), and varbinary(max) (Applies to SQL Server 2016 (13.x) and prior versions, and nonclustered columnstore indexes)

•rowversion (and timestamp)

•sql_variant

•CLR types (hierarchyid and spatial types)

•xml

•uniqueidentifier (Applies to SQL Server 2012 (11.x))

Having learnt about the Row-store and Column-store formats, let’s see when to use Row-store and when to use Column-store indexes.

Row-store Indexing can be used in cases where the query involves highly selective queries. A highly selective query is that which involves a well-defined predicate i.e., ‘where’ clause or ‘join’ that results in very few query results. E.g. Searching for a student by the Student ID.

Column-store Indexing can be used for low selective queries. A low selective query is that which uses aggregate operations such as Sum, Avg, etc. and that results many matching results. E.g Querying the total marks for all subjects by all students of a certain grade in a particular year.

When bulk-insert operations need to be done, it is better to use Row-store indexes. When update operations are involved, it is better to use Column-store indexes. Column-store indexes are recommended for tables that are large and wide. It is better to avoid indexing for small tables.

What are some nice features of SQL Server with respect to Indexes and Constraints?

When a Primary key is defined, SQL Server automatically creates a clustered index on the primary key. This doesn’t imply that for a clustered index to exist, the primary key should always exist. Clustered indexes can be created even if there is no primary key in the table.

When a unique constraint is defined, SQL Server automatically creates a non-clustered index on that column. An index can be created while creating a table or can be added later.

Should I use only one type of indexing (Rowstore/Columnstore) throughout?

The answer is ‘No’. A mix of Row-store and Column-store indexes can be used wherever necessary in the database.

Which should be built first? — Clustered Index or Non-Clustered Index

There are two ways to work with clustered and nonclustered indexes in database tables.

1. Creating clustered indexes and then creating multiple nonclustered indexes on tables.

2. Creating multiple nonclustered indexes, creating clustered indexes on tables, and then rebuilding the nonclustered indexes.

Out of these two ways, the second way is a recommended practice that can be adopted in a real-time environment. Why?

SORT_IN_TEMPDB — determines whether the sorted results are to be stored in the TEMPDB.

STATISTICS_NORECOMPUTE — When it is ON, SQL Server does not automatically recompute outdated data statistics for that particular index.

  • How do I tackle issues with fragmentation caused by indexing?

Before delving into this, let’s see how to identify data objects that are fragmented.

In this case, I am trying to find fragmented data objects on the ‘AdventureWorks’ database.

The above query shows the percentage of fragmentation in SQL Server.

In order to tackle issues with fragmentation caused by indexing, one needs to re-organize or rebuild indexes. This is done using the REORGANIZE/REBUILD commands.

LOG_COMPACTION = ON ensures that the pages of Large Object data (LOB) datatypes such as VARCHAR(MAX),VARBINARY(MAX),NVARCHAR(MAX),TEXT,NTEXT, IMAGE are also compacted.

When re-organize is used for defragmentation, SQL Server compresses the objects using minimal resources. Re-organize physically re-orders the leaf data pages and compacts the indexes accordingly.

Rebuild syntax is given by

Concluding Remarks

This article aims to answer the most basic questions related to database design. There may be a lot more complex and convoluted scenarios pertaining to one’s business that needs to be addressed before database design (may be while doing the Conceptual Database Design phase.

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data…

Sign up for Analytics Vidhya News Bytes

By Analytics Vidhya

Latest news from Analytics Vidhya on our Hackathons and some of our best articles! Take a look.

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

Check your inbox
Medium sent you an email at to complete your subscription.

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Subha Ganapathi

Written by

An avid learner and a passionate worker.

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store