Blogs
Archive
  • September 2017 (4)
  • August 2017 (3)
  • July 2017 (3)
  • June 2017 (3)
  • May 2017 (1)
  • February 2015 (2)
  • January 2015 (1)
  • December 2014 (3)
  • February 2014 (2)
Tags View All Blogs
Scrum.NetMicrosoftVisual StudioVC++C#JavaTestingQuality AssuranceUX DesignSPAAngularKnockoutBackboneUnderscoreDatabaseMySQLBlockchainHyperledgerChaincodeEnterprise MobilityUIIonicCordovaHybrid AppCode ReviewTypescriptAndroidIOTReactICOVenture CapitalArtificial IntelligenceKotlinIoTFuchsia OSVirtual RealityVOIP

Database optimization techniques

Every person, working with database, developer or DBA has to come across with issues related to database performance. One morning I reached office and my boss said that they are experiencing slowness in the production system and wanted my team to fix it before it crashes. I and my co-developer sat together, looked at few things, applied some techniques and finally optimized the database performance. Though, it would not be possible to list out all approaches but I would like to walk you through what I did to optimize database performance.

Missing indexes

Indexes do not matter when your database is small. In that case, SQL server does full table scan and reads data. But when your database becomes large, these scans kill the database performance. We analyzed the queries used in our code, looked at the columns in the JOIN, WHERE and ORDER BY CLAUSE. We then checked and found indexes on those columns were missing. We analyzed and created Non-Clustered indexes on them and it helped us achieve performance gains. Following is a snapshot of a query that had Clustered Index specified but still it is not giving us much performance gains because of Scan operation. After creating Non-Clustered indexes on appropriate columns, 50% gain is achieved. Though I cannot post the actual query, still I have tried to create a similar query to explain this scenario.

Database before Non-Clustered Index

Figure 1: Before Non-Clustered Index

Database after Non-Clustered Index

Figure 2: After Non-Clustered Index

Use Covering Indexes

Once, it happened that there was a simple query that worked fine in development but when put in production, it was taking longer time. Usually, development environment does not have as much data as in production. Look at the query in below snapshot. It's a simple query but does lookup even though index is created on the column. The reason is, it does Bookmark Lookups (in the absence of Clustered index on the table) or RID Lookup (In the presence of Clustered Index on the table). The reason is that to retrieve other column data (columns in the select list), SQL server does scan operation along with index seek. This is the reason behind Bookmark or RID Lookup which is an exhaustive process. Creating covering indexes eliminates these Lookups. Again, I created a sample query to demonstrate this scenario.

Without Covering Index

Figure 3: Without Covering Index (performs Lookup)

Covering Index

Figure 4: Covering Index

With Covering Index

Figure 5: With Covering Index

Avoid the use of * in SELECT statement

It is always advisable not to use SELECT *. Instead use SELECT Col1, Col2, Col3. I happened to be victim of SELECT * which totally changed the way I used to write queries. Let's see what are the consequences of SELECT * over SELECT Column_List.

  • Query parsing is slow with 'SELECT *'. When a query is written as SELECT*FROMTABLE1, database engine first hits system tables to get the list of columns associated with TABLE1. This process takes some time and for heavy database and busy environment, this time is a pain. Instead, write queries like SELECT COL1, COL2, COL3 FROM TABLE1 . This technique saves us the overhead of querying system tables to get column list for the associated table.
  • With 'Select *', when table schema grows i.e. more columns are added, we unintentionally force SQL server to return those columns as well even if they are not required. This is not a good practice. This approach unnecessarily causes network jam. By explicitly specifying column list, we return only what is required and hence less network traffic.

SET NOCOUNT ON

I was unaware of the issues that NOCOUNT OFF can cause until I myself ran into it. I was once working on a large transactional database. I wrote one stored procedure that did a complex task. It was doing many SELECTS, few INSERTS and UPDATES. After completing the stored procedure, I gave it to my DBA. Our DBA reverted back to me for adding SET NOCOUNT ON at the start of stored procedures body. I was quite surprised and in my curiosity I asked WHY? Then he explained to me that 'When NOCOUNT is OFF, an additional information, number of rows affected, is returned back to the client application. This information is sent over a network and when you are using a large number of SQL statements, it is a considerable performance hit. Mostly, this information is irrelevant. So, it is considered a good practice to set it to ON to restrict this information from being sent to the calling application thereby freeing up the network bandwidth'.

Database when NOCOUNT is OFF

Figure 6: When NOCOUNT is OFF

Database when NOCOUNT is ON

Figure 7: When NOCOUNT is ON

Keep Transactions short

I remember I was once approached by my friend over the stored procedure that once run causes slowness in the system. Until that code finishes, few other stored procedures were behaving slow. Upon further investigation, we found that it was actually explicit transaction which was causing system slowness. We investigated that transaction was started well before they were required to be. There were many complex validations that were carried out under the transaction and the interesting thing was that there was no need to transaction for those validations. Though, explicit transactions are a good way to make sure that database is in consistent state. Means, it helps us to make sure that either data is fully written to the database or rolled back completely. For batch operations, explicit transactions play a vital role. But like every coin has two faces, transactions also have a drawback when used without proper attention. Drawback is that once started, they lock the system objects till they either COMMIT or ROLLBACK. Until a transaction is complete, other transactions are forced to wait (which need access to same objects). The result is slowness in the system. Longer the transactions takes to complete, more is the slowness encountered in the system. We moved validations out of transaction. Other stored procedures were executed before this stored procedure reaches the transaction and it provided performance benefit. So, try to keep transactions as short as possible so that database objects are available to other transactions as soon as possible.