Skip navigation

This is first post in 2011. It’s hard to believe 2011 is here. This year is very important year for me from my private life and let’s cross fingers and hope everything will run as I plan.  :>

This post, however, is not about Vmware or virtualization technology. This post is about how to tuning Dynamic AX 2009 performance with in SQL 2005 environment. Please also be aware that the knowledge is suitable for all database regardless it’s SQL or Oracle or any other ERP system.

sqlplan_012

Environment:

We have a dedicate SQL 2005 clusters (all physical boxes) running for the database. We have all AOS(physical), Terminal servers(VMs) running as application and file and front servers.

For the ERP, we have batch jobs run every night to calculate all important figures and forecast of business data.

Symptoms:

Well, it’s simple. The batch jobs run too SLOW!. For our normal batch jobs to run, we have only 6 hours window dedicate for it. And with only half of real data online, we have already hit on 5 hours! If we can’t finish job in 6 hours, we won’t be able to backup or do anything else.

However, the batch job doesn’t always run with 5 hours. Sometimes, it finishes at 2 hours. And sometimes, it finished even longer. We have run those tests in our Test environment. With same data, we get different result of running time and it seems it’s random. Why?

Solutions:

Background knowledge

Actually, the answer is simple. This issue is caused by SQL EXE Plan selection.

Before SQL runs the job, it needs to do 2 different steps to interpret the queries.

1. Processes that occur in the relational engine

2. Processes that occur in the storage engine.

After the first step, the SQL server will evaluate query and select “best” Execution plan  for it. The basic plan can be considered as single threads or multiple threads. I’m not going to get too deep on that. The issue of this selection is that SQL server doesn’t always select best plan for the query.

For example, if  you have kept running small jobs for whole day, when you run a big long batch job, the current execution plans(resulted by small jobs) have stayed in the buffer and SQL will select the plan in the buffer instead of recalculating.

If  you got some maintenance jobs stuck in the active job list, it’s going to flush your plans constantly which will disturb the SQL selection as well.

So what do we do?

Solution A:

Clean the Execution plan buffer.

Before you run a long big batch job, you can execute a command to clean the plan buffer. It will force SQL server to interpret query and select a plan. Yes, the plan may not be best plan for that query, but most of time, it’s still quite good.

By doing that, you need to connect to SQL server with Management studio and run this command.

DBCC FREEPROCCACHE;
DBCC DROPCLEANBUFFERS

sqlplan_001

Solution B:

Well, with solution A, you are more like to rely on SQL server itself. If it’s still select wrong plan, you have live with that. With issue like other jobs in the SQL stuck in the queue, it will be more likely your SQL still pick the wrong plan.

With considering that, we need to go for Solution B. Dedicate plan for dedicate query.

With ERP we run in our environment, we know certain job will take long time. All what you need to do is to capture SQL running stats in the buffer, and analysis what query took most time of CPU and I/O. Once you identify this query, you need to clean your SQL Exec plan buffer and to have a lucky shot.

If you are lucky enough (most of time you will be if you run it in a clean test environment), you will have a normal 5 hours batch job run as 2 hour. Then, you need to dump the buffer again and use that as base line.

You need to find exactly which exec plan was using in this lucky shot and apply it to your SQL server. So in the future, no matter what happens, as long as SQL server runs this query, it will use optimized (lucky shot) exec plan and to make sure it always run in optimized mode.

All right, let see how to do it in each step.

Step 1.

capture what happened in the SQL buffer.

To do that, you need to have Performance Analyzer 1.0 for Microsoft Dynamics.

You can simply download it from here.

Once you download it, you need to extract it to a local folder which is the same server as your SQL Management studio is.

Follow the installation pdf,  you should be able to deploy it into your SQL server easily.

sqlplan_002

Once you have finished deployment, you should see a job in the SQL Server Agent as DYNPERF_Capture_Stats.

sqlplan_003

Now, you need to run your batch job as normal. Let’s say, it takes 5 hours. Once you finish your batch job, you need to run DYNPERF capture job ( make sure you change the DB name to match your own database).

The capture job will run 2 to 8 mins depends on the size of your buffer.

Step 2:

Analysis the data.

Well, this package you have downloaded contains the scripts for Dynamic Ax. You can modify the script to fit your own ERP. But right now, I’m going to use those script to demonstrate.

You need to open  your “DynamicsPerf.ssmssqlproj” with Project/solution from SQL Management Studio.

sqlplan_004

Then, you need to open this query. “Query – Investigate SQL Statements.sql”.

Now, you can execute some of queries to get TOTAL_ELAPSED_TIME, or TOTAL_LOGICAL_READS.

Let’s see some scary figures.

sqlplan_005

As you can see, the red arrow points to more than 1,126,904,574 which made system to run more than 5 hours. With exact same data, after I run cleaning buffer command, you can get 263,470,593 which is one fifth of I/O load with finishing jobs in 2 hours!

Step 3:

locate the query and find the current exe lan.

Still with same table, lets move scroll bar to left side and looking for SQL_TXT field. sqlplan_006

Copy that query A to a notepad.

Then, you click QUERY_PLAN (the blue http link on above picture). Now, it depends on which version of SQL Management studio you are using. If you are using 2008 version, you will directly open plan like this.

sqlplan_007

This is actually a simply EXEcution plan which caused run for 5 hours.

If you have 2005 version,  you won’t be able to open plan like that. You will be only open as XML format.

sqlplan_008

This is actually plan with different XML format. If you are using 2005 version, what you need to do in terms of seeing plan like you can see in version 2008 is to save as your xml to a place. And rename the .xml file to .sqlplan. Then, you can open it with SQL Management studio and  you will see exactly same thing as what you see in SQL 2008.

So, this is bad plan for this query. It took 5 hours to finish.

You can do exactly same thing for the green arrow plans as well. In our case, that plan is a good plan. Let me show you what a good plan look like.

sqlplan_009

Step 4:

Replace old plan with good plan.

Now, it’s time to use  your SQL command to do the job.

First of all, we need to go back to find Query A you copied on the notepad.

sqlplan_010

then, we use SQL command to put them together. The red part is prefix. The green part is actual query.

sqlplan_011

So you see the name of planguide. Query, and prefix. And the plan (you need to put your xml version of good plan on). And also there is a remove script in this picture to let you to remove planguide if you need to debug.

The is planguide will follow SQL server and it will failover to another cluster node. Any queries runs on the SQL server box will be effected by this planguide. As long as the prefix, query are the exact same thing as in this planguide, the SQL server will use this planguide to execute the plan.

Conclusion:

Well, this is first step of tunning SQL performance. I believe you can do lots of things by yourself after you dig more.

It’s not Vmware, but it is a good start for this year. I reckon. 🙂

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: