Tuesday, June 30, 2015

Azure SQL V12 Performance

I was tasked to look into our Azure SQL database's growth rate and its performance to "M user" load. I found a good number of online resource posted past years, and discovered that some post is out-of-date because Azure Update, as now. the latest Azure SQL V12, which we are using.

1. How to measure current site and past growth

There are two ways to get the database size

SELECT SUM(reserved_page_count)*8.0/1024 FROM sys.dm_db_partition_stats

Which gives you a snapshot of current size. However, Azure SQL also provide Size over time for past months (-30 days)

SELECT * FROM sys.resource_stats WHERE database_name = '[your db name]' ORDER BY start_time DESC




Data Size is sum of  reserved, data, index_Size and unused, Using 

sp_spaceused '[your db name]' 



With these info. for some key tables, then you can have a table like this



2. Index Size and Fragmentation
overall index fragmentation can be found table above in "index/data".  also you can run script below by Dilkush Patel

SELECT
 DB_NAME() AS DBName
 ,OBJECT_NAME(ps.object_id) AS TableName
 ,i.name AS IndexName
 ,ips.index_type_desc
 ,ips.avg_fragmentation_in_percent
 FROM sys.dm_db_partition_stats ps
 INNER JOIN sys.indexes i
 ON ps.object_id = i.object_id
 AND ps.index_id = i.index_id
 CROSS APPLY sys.dm_db_index_physical_stats(DB_ID(), ps.object_id, ps.index_id, null, 'LIMITED') ips
 ORDER BY ips.avg_fragmentation_in_percent desc, ps.object_id, ps.index_id

you will see something like


3. Top Queries
SSELECT TOP 20 query_stats.query_hash AS "Query Hash", 
    SUM(query_stats.total_worker_time) / SUM(query_stats.execution_count) AS "Avg CPU Time",
    MIN(query_stats.statement_text) AS "Statement Text"
FROM 
    (SELECT QS.*, 
    SUBSTRING(ST.text, (QS.statement_start_offset/2) + 1,
    ((CASE statement_end_offset 
        WHEN -1 THEN DATALENGTH(st.text)
        ELSE QS.statement_end_offset END 
            - QS.statement_start_offset)/2) + 1) AS statement_text
     FROM sys.dm_exec_query_stats AS QS
     CROSS APPLY sys.dm_exec_sql_text(QS.sql_handle) as ST) as query_stats
GROUP BY query_stats.query_hash
ORDER BY 2 DESC;

3. Top I\O queries
SSELECT TOP 20  q.[text],  (total_logical_reads/execution_count) AS avg_logical_reads,  
(total_logical_writes/execution_count) AS avg_logical_writes,       
(total_physical_reads/execution_count) AS avg_phys_reads,        
Execution_count  FROM sys.dm_exec_query_stats         
cross apply sys.dm_exec_sql_text(plan_handle) AS q   
ORDER BY (total_logical_reads + total_logical_writes) DESC







Friday, April 18, 2014

HD Insight Client Lib Version


updated: 04.18.2014
Windows Azure HDInsight 1.1.1.3, 04/16/2014
Microsoft .NET API for Hadoop, 1.1.1.3, 04/16/2014
Microsoft .NET Map Reduce API for Hadoop, 0.12.5126.42915, 01/14/2014
Microsoft .NET API for Hadoop WebClient, 0.12.5126.42915, 01/14/2014
Microsoft Linq to Hive, 0.12.5126.42915, 01/14/2014
Microsoft .NET API Library for Avre, 1.1.1.3, 04/16/2014

Saturday, February 15, 2014

Azure Scheduler, simple but not work flow manager like oozie

Azure Scheduler was released about one month ago as public preview. I worked with Azure Scheduler team to implement the scheduler manager (work role) to manage our HD Insight jobs few weeks before its public preview release, end of 2013. The scheduler often apparently simple. but not really. many years ago. I was involved in big proposal for a local (Seattle) bi-tech company on scheduling after they just spent very handsome money on scheduling system designed for aviation industry like airline etc. 

Now, the Azure Scheduler has been running for us over 2 months, so far, it is pretty good on simple jobs. However, it is no way near Hadoop's oozie, but oozie is inside of Hadoop, make hard to manage. The scheduler is not work flower manager, will never be. we still develop our own work flow eng.




HDInsight 3.0 preview is NOT hadoop 2.2

Here is the list of Hadoop 2.2, HDI 3.0 preview and HDI 2.1 GA





More detail HDI 3.0 can be found http://www.windowsazure.com/en-us/documentation/articles/hdinsight-component-versioning/

Sunday, December 8, 2013

Wednesday, November 20, 2013

“transient” cluster pattern ?



The issue came up if our data source on another Azure storage, but not attached to HDI Cluster ? as not in core-site.xml file. HDI Client lib does support to attach new storage once the Cluster provisioned.

Suggestion I received as TAP program member is that HDI cluster is in “transient” cluster pattern. My interpretation is prevision HDI on demand. Note HDI provision takes about 10 mins, here is one breakdown by my observation.

Accepted: 2 mins
Windows Azure VM Configuration: 4 mins
HDInsight Configuration: 3 mins
Operational: 1 mins
Running: ready. 





Good part is its default WASB can be reused, and no need moving data around. So if you need attach new storage, just delete existing one, attache storage during HDI prevision.  


I probably go for this approach, especially I was told my subscription cost last month was the highest in our team. why ? because I have many HDI cluster "running". I should delete them once I done..... that is "transient".

HDInsight 2.1 release

HDInsight Cluster has 2.1 a.k.a GA release, as well as its Client lib. You can get from NuGet in VS




I am having lots problem porting my HDI job submission code (1.5) to Azure worker role for scheduling, then discovered its latest release is 4.0 and 4.5 mixed.

Windows Azure HDInsight is 4.0, used for job submission. Hadoop WebClient is 4.5 to manage folder and file because it depends on WebAPI (5.x), which depends on 4.5. Hadoop will fail if output folder already exists, that is why I need Web Client to manage folder and file, using Azure HDInsight to submit job. interestingly, I can do both in 1.5 release, but not now (2.1). 

Windows Azure HDInsight is 4.0, your app can be 4.0 or 4.5, I confirmed.

one issue is missing: Microsoft.WindowsAzure.Management.Framework.Threading, solution: you just have to get it from previous release :)

2nd issue is "build for x64", any CPU will fail you.

With all trouble people reported, and I experienced. we all probably did not pay attention to what HDI team said in their roadmap


please note majority of their release is “experimental”.