posted 4/14/2012 by MarkGStacey - Views: [4526]
SQL Scripts: http://www.bidn.com/Assets/Uploaded-CMS-Files/PartitoningInTabular-ed77bbde-367f-4fdc-b1aa-404423f0d05e.zip
This blog as a document: http://www.bidn.com/Assets/Uploaded-CMS-Files/Partitioning in Tabular-e2b2799d-a10a-41a2-ae3a-22895fe33b2e.docx
Partitioning in Analysis Services has, for most devs, generally been a post hoc effort - build your cube first, then during productionisation, add the partitioning schema (or even just let the DBAs do it….).
And this mostly worked, because you would build out the full model, deploy and process, and head out to lunch while the full cube process ran.
In the tabular world, this model simply doesn’t work - every time you add a table to the model, the data is queried, and the model is processed. Great…. Now we have to wait for the processing every time.
The standard answer is to abstract your tables into views (a best practice I always recommend), but of course, if you have 100 tables in your model, that’s 100 views you need to change when you go to production - with a little bit of thought, adding a table to maintain partition boundaries, and a subsidiary table to maintain dev/prod status to switch these partitions on and off lets you manage this easily.
I’m going to show you how to build out these tables, and then give you a script that, based on these tables, will automatically create the partitions if they don’t exist.
What you’ll need to follow my samples: I’m using AdventureWorks2012 Data (*here*http://msftdbprodsamples.codeplex.com/releases/view/55330#DownloadId=258486 on Codeplex) as well as Adam Machanik’s “Make big data” script (*here* http://sqlblog.com/blogs/adam_machanic/archive/2011/10/17/thinking-big-adventure.aspx)
Note that you’re going to want to change the growth size on your ADW DB before running this.
I’m going to be building the Tabular model off the AdventureWorks standard DB, so no DimDate table ~ note that using a DimDateID for the partition key would be *much* more performant than the date field that I’m using here. You’ll note that I’m using a DimDateID field as my driver in the partition table, and simply doing a calculated field for the date data type J
Start by creating an “Environment Table” - this will keep a list of your environments.
Four fields - EnvironmentID is an identity field, CreatedDate is defaulted to getdate(), a name for your environment, and an Active flag. I’m using a new schema to organise my tables.
The reason it’s “Development and Production” is that I am going to use a subset of production partitions in development. Your alternative is to use a duplicate of *every* production partition in development, for instance only selecting a single day for each month in each partition.
CREATE SCHEMA PartitionSchemeGOCREATE TABLE [PartitionScheme].[Environment]( [EnvironmentID] [int] IDENTITY(0,1) NOT NULL, [CreatedDate] [datetime] NOT NULL, [EnvironmentName] [varchar](255) NOT NULL, [Active] [int] NOT NULL, CONSTRAINT [IX_Environment] UNIQUE CLUSTERED( [EnvironmentID] DESC)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]) ON [PRIMARY] GOALTER TABLE [PartitionScheme].[Environment] ADD CONSTRAINT [DF_Environment_CreatedDate] DEFAULT (getdate()) FOR [CreatedDate]GO
ALTER TABLE [PartitionScheme].[Environment] ADD CONSTRAINT [DF_Environment_Active] DEFAULT ((1)) FOR [Active]GO
I am going to insert two entries in this table, “Development and Production”, and “Production”
insert into [PartitionScheme].[Environment](EnvironmentName, Active)SELECT 'Development and Production', 1UNIONSELECT 'Production', 0
Next up, the actual partitioning table:
An identity ParittionID, CreatedDate, and Active flag.
Partition Name Suffix - because we are going to be generating the Partitions later, what should we add to the name to make it unique? A start date and an end date for the partition, as well as Computed Columns for the integer equivalents (for when you have a date table)
CREATE TABLE [PartitionScheme].[PartitionList]( [PartitionID] [int] IDENTITY(1,1) NOT NULL, [CreatedDate] [datetime] NOT NULL, [Active] [int] NOT NULL, [PartitionNameSuffix] [varchar](255) NOT NULL, [PartitionStartDate] [date] NOT NULL, [PartitionEndDate] [date] NOT NULL, [PartitionStartDateID] AS (CONVERT([int],CONVERT([char](8),[PartitionStartDate],(112)))) PERSISTED, [PartitionEndDateID] AS (CONVERT([int],CONVERT([char](8),[PartitionEndDate],(112)))), [EnvironmentID] [int] NOT NULL, CONSTRAINT [PK_PartitionList] PRIMARY KEY CLUSTERED( [PartitionID] ASC)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]) ON [PRIMARY]
GO
ALTER TABLE [PartitionScheme].[PartitionList] ADD CONSTRAINT [DF_PartitionList_CreatedDate] DEFAULT (getdate()) FOR [CreatedDate]GO
ALTER TABLE [PartitionScheme].[PartitionList] ADD CONSTRAINT [DF_PartitionList_Active] DEFAULT ((1)) FOR [Active]GO
ALTER TABLE [PartitionScheme].[PartitionList] WITH CHECK ADD CONSTRAINT [FK_PartitionList_Environment] FOREIGN KEY([EnvironmentID])REFERENCES [PartitionScheme].[Environment] ([EnvironmentID])GOALTER TABLE [PartitionScheme].[PartitionList] CHECK CONSTRAINT [FK_PartitionList_Environment]GO
OK, now I don’t know about y’all, but I personally don’t particularly want to be data capturing each of the partitions, so I’m going to be generating my INSERT statements. I am going to be using “RANGE RIGHT” i.e. an exclusive right for the End Date - when working with integer dates as in warehousing, this isn’t a concern, but when working with dates, using the last day of the month could drop data if you don’t remember to add the 23:59:59.999 as the time. DECLARE @dynamicSQL varchar(max), @EnvironmentID varchar(8)
SET @dynamicSQL = ''/*Get the Environment ID from the table*/SELECT @EnvironmentID = cast (EnvironmentID as varchar(10)) FROM PartitionScheme.Environment where EnvironmentName = 'Production'SELECT @dynamicSQL = @dynamicSQL +'INSERT INTO [PartitionScheme].[PartitionList] ([PartitionNameSuffix] ,[PartitionStartDate] ,[PartitionEndDate] ,[EnvironmentID]) VALUES (''' + cast(Year(bth.TransactionDate) as char(4)) + ' - ' + RIGHT('0' + cast(Month(bth.TransactionDate) as varchar(2)),2) + ''' , ''' + convert(varchar(50),bth.TransactionDate,111) + ''' , ''' + convert(varchar(50),dateadd(month, 1, bth.transactiondate),111) +''' ,' + @EnvironmentID + ')
' from dbo.bigTransactionHistory bthwhere day(bth.TransactionDate) = 1GROUP BY bth.TransactionDateORDER BY bth.TransactionDate
/*select @dynamicSQL*/exec (@dynamicSQL)
So now we have a list of partitions matching the data currently in our main table. Let’s set the latest partition for use in Dev:
UPDATE PartitionScheme.PartitionListSET EnvironmentID = (SELECT EnvironmentID FROM PartitionScheme.Environment Where EnvironmentName like 'Development%')WHERE PartitionEndDate = (SELECT Max(PartitionEndDate) FROM PartitionScheme.PartitionList )
A quick look at the resultant data set:
We’re going to finish off by creating a view to supply the data, first an abstract view just to include the active filter in a single place:
CREATE SCHEMA tabularGOCREATE VIEW tabular.PartitionListASSELECT PL.PartitionID ,PL.[PartitionNameSuffix] ,PL.[PartitionStartDate] ,PL.[PartitionEndDate] ,PL.[PartitionStartDateID] ,PL.[PartitionEndDateID] ,PL.[EnvironmentID] FROM PartitionScheme.PartitionList PL INNER JOIN PartitionScheme.Environment En ON PL.EnvironmentID = En.EnvironmentID AND En.Active = 1 WHERE PL.Active = 1GO
Then a view for our Transaction Table
CREATE VIEW tabular.bigTransactionHistoryasSELECT [TransactionID] ,[ProductID] ,[TransactionDate] ,[Quantity] ,[ActualCost] ,PL.PartitionNameSuffix,PL.PartitionID FROM [dbo].[bigTransactionHistory] bth
INNER JOIN tabular.PartitionList PL ON bth.TransactionDate BETWEEN PL.PartitionStartDate AND PartitionEndDate
And let’s finish off with a view for the product table: a little different, as we aren’t partitioning it, it’s a fairly small table. *You may want to limit the table to only those products for which there are products*. CREATE VIEW tabular.bigProduct asSELECT [ProductID] ,[Name] ,[ProductNumber] ,[MakeFlag] ,[FinishedGoodsFlag] ,[Color] ,[SafetyStockLevel] ,[ReorderPoint] ,[StandardCost] ,[ListPrice] ,[Size] ,[SizeUnitMeasureCode] ,[WeightUnitMeasureCode] ,[Weight] ,[DaysToManufacture] ,[ProductLine] ,[Class] ,[Style] ,[ProductSubcategoryID] ,[ProductModelID] ,[SellStartDate] ,[SellEndDate] ,[DiscontinuedDate] FROM [dbo].[bigProduct] bP
/* will increase query cost on the server, but will make the interface much friendlier during testing*/
where exists (select bth.TransactionID from tabular.bigTransactionHistory bth where bP.ProductID = bth.ProductID)
We now have the data set up for the tabular model, lets dive into that. Open up Visual Studio, create a new tabular model, and connect to your Adventure Works database. Add the two new views to your model:
You’ll notice that this takes quite some time!
Comes up with quite a horrible query plan as well
On my machine, this query runs for 6:45, so we’re going to fix that right up :
ALTER TABLE dbo.bigTransactionHistory ADD TransactionDateID AS (CONVERT([int],CONVERT([char](8),[TransactionDate],(112))))GOCREATE NONCLUSTERED INDEX [IX_bigTransactionHistory] ON [dbo].[bigTransactionHistory]( [TransactionDateID] DESC) INCLUDE ([TransactionID],[ProductID],[TransactionDate],[Quantity],[ActualCost])GO
We’re adding a computed column to convert the date to an INT, then indexing it, and next up we’ll change the view to use that column:
ALTER VIEW tabular.bigTransactionHistoryasSELECT [TransactionID] ,[ProductID] ,[TransactionDate] ,[Quantity] ,[ActualCost] ,PL.PartitionNameSuffix FROM [dbo].[bigTransactionHistory] bth INNER JOIN tabular.PartitionList PL ON bth.TransactionDateID BETWEEN PL.PartitionStartDateID AND PartitionEndDateID
Much better! 4 seconds instead of 6:45.
Back to our model: Add a relationship from bigTransactionHistory to bigProduct
Right, so now, simply by flipping the “Active” flag in the environment table, we can include all the data.
And it would process for a long time..
But there we haven’t done any partitions - again, I’m against doing anything as manual and tedious as creating these manually, so we’re going to be generating them: Start by deploying your project to the SSAS server.
Now, we are going to use a stored proc I created for this purpose:
Firstly, kudos to Matt Horn (Twitter: @Maxui) who helped me with the original SSAS version. This one has been updated for Tabular, so it’s bit different
The concept is simple: SQL jobs can create partions on Analysis Services, so we will have a stored procedure that creates and executes a job to get this done. We’ll then have a second proc that iterates through the PartitionList table, and creates the partitions that don’t exist. You’ll need to make sure that the account your SQL Agent service is running under has rights to the Cube database, or setup a proxy account, as documented here : http://www.bidn.com/blogs/DonnyJohns/ssas/1705/sql-server-agent-proxy-accounts
Here is the code to Create a Partition in Tabular:
ALTER PROC PartitionScheme.CreatePartitions@ServerName varchar(255),@DatabaseID varchar(255), @CubeID varchar(255), @MeasureGroupID varchar(255), @PartitionID varchar(255), @PartitionName varchar(255), @DataSourceID varchar(255), @PartitionQuery varchar(max),@UserName varchar(255)ASBEGIN TRANSACTIONDECLARE @ReturnCode INTDECLARE @JobName varchar(255)SET @JobName = 'CreatePartition' + @PartitionNameSELECT @ReturnCode = 0IF NOT EXISTS (SELECT name FROM msdb.dbo.syscategories WHERE name=N'[PartitionScheme]' AND category_class=1)BEGINEXEC @ReturnCode = msdb.dbo.sp_add_category @class=N'JOB', @type=N'LOCAL', @name=N'[PartitionScheme]'IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollbackENDIF NOT EXISTS (SELECT name FROM msdb.dbo.sysjobs WHERE name= @JobName)BEGIN DECLARE @jobId BINARY(16) EXEC @ReturnCode = msdb.dbo.sp_add_job @job_name=@JobName, @enabled=1, @notify_level_eventlog=0, @notify_level_email=0, @notify_level_netsend=0, @notify_level_page=0, @delete_level=0, @description=N'Create Tabular Partition', @category_name=N'[PartitionScheme]', @owner_login_name=@UserName, @job_id = @jobId OUTPUT IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback Declare @PartitionCmd varchar(max) SET @PartitionCmd = '<Alter AllowCreate="true" ObjectExpansion="ExpandFull" xmlns="http://schemas.microsoft.com/analysisservices/2003/engine"> <Object> <DatabaseID>'+ @DatabaseID +'</DatabaseID> <CubeID>'+ @CubeID + '</CubeID> <MeasureGroupID>' + @MeasureGroupID + '</MeasureGroupID> <PartitionID>' + @PartitionID + '</PartitionID> </Object> <ObjectDefinition> <Partition xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ddl2="http://schemas.microsoft.com/analysisservices/2003/engine/2" xmlns:ddl2_2="http://schemas.microsoft.com/analysisservices/2003/engine/2/2" xmlns:ddl100_100="http://schemas.microsoft.com/analysisservices/2008/engine/100/100" xmlns:ddl200="http://schemas.microsoft.com/analysisservices/2010/engine/200" xmlns:ddl200_200="http://schemas.microsoft.com/analysisservices/2010/engine/200/200" xmlns:ddl300="http://schemas.microsoft.com/analysisservices/2011/engine/300" xmlns:ddl300_300="http://schemas.microsoft.com/analysisservices/2011/engine/300/300"> <ID>' + @PartitionID + '</ID> <Name>' + @PartitionName + '</Name> <Source xsi:type="QueryBinding"> <DataSourceID>' + @DataSourceID +'</DataSourceID> <QueryDefinition>' + @PartitionQuery + '</QueryDefinition> </Source> <StorageMode valuens="ddl200_200">InMemory</StorageMode> <ProcessingMode>Regular</ProcessingMode> <ddl300_300:DirectQueryUsage>InMemoryOnly</ddl300_300:DirectQueryUsage> </Partition> </ObjectDefinition> </Alter> ' select @PartitionCmd EXEC @ReturnCode = msdb.dbo.sp_add_jobstep @job_id=@jobId, @step_name=N'Create', @step_id=1, @cmdexec_success_code=0, @on_success_action=1, @on_success_step_id=0, @on_fail_action=2, @on_fail_step_id=0, @retry_attempts=0, @retry_interval=0, @os_run_priority=0, @subsystem=N'ANALYSISCOMMAND', @command = @PartitionCmd, @server= @ServerName, @database_name=N'master', @flags=0 IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollbackEXEC @ReturnCode = msdb.dbo.sp_update_job @job_id = @jobId, @start_step_id = 1IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollbackEXEC @ReturnCode = msdb.dbo.sp_add_jobserver @job_id = @jobId, @server_name = N'(local)'IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollbackEND exec msdb.dbo.sp_start_job @job_name = @JobNameCOMMIT TRANSACTIONGOTO EndSaveQuitWithRollback: IF (@@TRANCOUNT > 0) ROLLBACK TRANSACTIONEndSave:
Now then, a second proc that iterates through the table and creates the partitions: First off, we know that we don’t want to create partitions for every table, so we’ll create ourselves a place to store the list of tables we want to do partitions for. Now, a little trick is that the table name we have in our model isn’t the internal name, so I’ve coded up checks to go fetch that ID for us. We also want to store a Base Query for each one
CREATE TABLE [PartitionScheme].[TableList]( [TableID] [int] IDENTITY(1,1) NOT NULL, [CreatedDate] [datetime] NOT NULL, [TableName] [varchar](255) NOT NULL, [InternalTableName] [varchar](255) NULL, [BaseQuery] [varchar](max) fsNULL, [Partition] [varchar](255) NOT NULL, [Active] [int] NOT NULL, CONSTRAINT [IX_Table] UNIQUE CLUSTERED( [TableID] ASC)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]) ON [PRIMARY]GOALTER TABLE [PartitionScheme].[TableList] ADD CONSTRAINT [DF_Table_CreatedDate] DEFAULT (getdate()) FOR [CreatedDate]GO
ALTER TABLE [PartitionScheme].[TableList] ADD CONSTRAINT [DF_Table_Active] DEFAULT ((1)) FOR [Active]GOALTER TABLE [PartitionScheme].[TableList] ADD CONSTRAINT [DF_Table_Partition] DEFAULT ((1)) FOR [Partition]GO
There is a piece of manual work that you’ll need to do now - setting the partition attribute to 0 for bigProduct, and adding a “Base Query” which will be used for the new partitions. You’d get this query from your
insert into [PartitionScheme].[TableList] ([TableName], Partition)VALUES ('bigProduct', 0)
insert into [PartitionScheme].[TableList] ([TableName], Partition, BaseQuery)VALUES ('bigProduct', 0, 'SELECT * FROM tabular.bigTransactionHistory')
And now we can finish off with the procedure that does all the creation:
CREATE PROC [PartitionScheme].[CreatePartitions]@ServerName varchar(255) ,@DatabaseID varchar(255) , @CubeID varchar(255), @MeasureGroupID varchar(255), @PartitionID varchar(255), @PartitionName varchar(255), @DataSourceID varchar(255), @PartitionQuery varchar(max),@UserName varchar(255)AS
/****** Object: Job [CreatePartition] Script Date: 08/02/2011 09:04:45 ******/BEGIN TRANSACTIONDECLARE @ReturnCode INTDECLARE @JobName varchar(255)
SET @JobName = 'CreatePartition' + @PartitionName
SELECT @ReturnCode = 0/****** Object: JobCategory [[Uncategorized (Local)]]] Script Date: 08/02/2011 09:04:45 ******/IF NOT EXISTS (SELECT name FROM msdb.dbo.syscategories WHERE name=N'[PartitionScheme]' AND category_class=1)BEGINEXEC @ReturnCode = msdb.dbo.sp_add_category @class=N'JOB', @type=N'LOCAL', @name=N'[PartitionScheme]'IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollbackEND
IF NOT EXISTS (SELECT name FROM msdb.dbo.sysjobs WHERE name= @JobName)BEGIN DECLARE @jobId BINARY(16) EXEC @ReturnCode = msdb.dbo.sp_add_job @job_name=@JobName, @enabled=1, @notify_level_eventlog=0, @notify_level_email=0, @notify_level_netsend=0, @notify_level_page=0, @delete_level=0, @description=N'Create Tabular Partition', @category_name=N'[PartitionScheme]', @owner_login_name=@UserName, @job_id = @jobId OUTPUT IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback /****** Object: Step [CreateFeb] Script Date: 08/02/2011 09:04:45 ******/
Declare @PartitionCmd varchar(max) SET @PartitionCmd = '<Alter AllowCreate="true" ObjectExpansion="ExpandFull" xmlns="http://schemas.microsoft.com/analysisservices/2003/engine"> <Object> <DatabaseID>'+ @DatabaseID +'</DatabaseID> <CubeID>'+ @CubeID + '</CubeID> <MeasureGroupID>' + @MeasureGroupID + '</MeasureGroupID> <PartitionID>' + @PartitionID + '</PartitionID> </Object> <ObjectDefinition> <Partition xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ddl2="http://schemas.microsoft.com/analysisservices/2003/engine/2" xmlns:ddl2_2="http://schemas.microsoft.com/analysisservices/2003/engine/2/2" xmlns:ddl100_100="http://schemas.microsoft.com/analysisservices/2008/engine/100/100" xmlns:ddl200="http://schemas.microsoft.com/analysisservices/2010/engine/200" xmlns:ddl200_200="http://schemas.microsoft.com/analysisservices/2010/engine/200/200" xmlns:ddl300="http://schemas.microsoft.com/analysisservices/2011/engine/300" xmlns:ddl300_300="http://schemas.microsoft.com/analysisservices/2011/engine/300/300"> <ID>' + @PartitionID + '</ID> <Name>' + @PartitionName + '</Name> <Source xsi:type="QueryBinding"> <DataSourceID>' + @DataSourceID +'</DataSourceID> <QueryDefinition>' + @PartitionQuery + '</QueryDefinition> </Source> <StorageMode valuens="ddl200_200">InMemory</StorageMode> <ProcessingMode>Regular</ProcessingMode> <ddl300_300:DirectQueryUsage>InMemoryOnly</ddl300_300:DirectQueryUsage> </Partition> </ObjectDefinition> </Alter> ' select @PartitionCmd
EXEC @ReturnCode = msdb.dbo.sp_add_jobstep @job_id=@jobId, @step_name=N'Create', @step_id=1, @cmdexec_success_code=0, @on_success_action=1, @on_success_step_id=0, @on_fail_action=2, @on_fail_step_id=0, @retry_attempts=0, @retry_interval=0, @os_run_priority=0, @subsystem=N'ANALYSISCOMMAND', @command = @PartitionCmd, @server= @ServerName, @database_name=N'master', @flags=0
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollbackEXEC @ReturnCode = msdb.dbo.sp_update_job @job_id = @jobId, @start_step_id = 1IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollbackEXEC @ReturnCode = msdb.dbo.sp_add_jobserver @job_id = @jobId, @server_name = N'(local)'IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollbackEND exec msdb.dbo.sp_start_job @job_name = @JobName
COMMIT TRANSACTIONGOTO EndSaveQuitWithRollback: IF (@@TRANCOUNT > 0) ROLLBACK TRANSACTIONEndSave:
And there you go: fully developed list of partitions, which you can import into Visual Studio or just use in production. Easy to switch individual partitions on and off - currently there are a bunch of empty partitions, which will populated once you flip the productionisation switch and reprocess.