Being a DBA is like being a train conductor. One of the biggest responsibilities is making sure all jobs are running as expected, or making sure "all the trains are running on time" so to speak. As my partner-in-crime Devin Knight (Blog | Twitter) posted earlier, we have come up with a solution to identify and alert for when SQL Agent jobs are running longer than expected. The need for this solution came from the fact that despite my having alerts for failed agent jobs, we had a process pull a Palin and went rogue on us. The job was supposed to process a cube but since it never failed, we (admins) weren't notified. The only way we got notified was when a user finally alerted us and said "the cube hasn't been updated in a couple days, what's up?". Sad trombone. As Devin mentioned in his post the code/solution below is very much a version 1 product so if you have any modifications/suggestions then have at it. We've documented in-line so you can figure out what the code is doing. Some caveats here:
CODE (WARNING: This code is currently beta and subject to change as we improve it) Last script update: 7/12/2012 Change log: 7/12/2012 - Updated code to deal with "phantom" jobs that weren't really running. Improved logic to handle this. Beware, uses undocumented stored procedure xp_sqlagent_enum_jobs Download script link - Click here
--Create Long Running Jobs tableUSE [DBAdmin]GOIF OBJECT_ID('dbo.LongRunningJobs') IS NOT NULL DROP TABLE dbo.LongRunningJobsCREATE TABLE [dbo].[LongRunningJobs]( [ID] [int] IDENTITY(1,1) NOT NULL, [JobName] [sysname] NOT NULL, [JobID] [uniqueidentifier] NOT NULL, [StartExecutionDate] [datetime] NULL, [AvgDurationMin] [int] NULL, [DurationLimit] [int] NULL, [CurrentDuration] [int] NULL, [RowInsertDate] [datetime] NOT NULL) ON [PRIMARY]GOALTER TABLE [dbo].[LongRunningJobs] ADD CONSTRAINT [DF_LongRunningJobs_Date] DEFAULT (getdate()) FOR [RowInsertDate]GO--Create Stored Procedure usp_LongRunningJobsUSE [DBAdmin]GO/****** Object: StoredProcedure [dbo].[usp_LongRunningJobs] Script Date: 07/12/2012 08:16:01 ******/IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[usp_LongRunningJobs]') AND type in (N'P', N'PC'))DROP PROCEDURE [dbo].[usp_LongRunningJobs]GOUSE [DBAdmin]GO/****** Object: StoredProcedure [dbo].[usp_LongRunningJobs] Script Date: 07/12/2012 08:16:01 ******/SET ANSI_NULLS ONGOSET QUOTED_IDENTIFIER ONGO-- =============================================-- Author: Devin Knight and Jorge Segarra-- Create date: 7/6/2012-- Description: Monitors currently running SQL Agent jobs and -- alerts admins if runtime passes set threshold-- Updates: 7/11/2012 Changed Method for capturing currently running jobs to use master.dbo.xp_sqlagent_enum_jobs 1, ''-- -- =============================================CREATE PROCEDURE [dbo].[usp_LongRunningJobs]AS--Set Mail ProfileDECLARE @MailProfile VARCHAR(50)SET @MailProfile = ( SELECT @@SERVERNAME ) --Replace with your mail profile name--Set Email RecipientsDECLARE @MailRecipients VARCHAR(50)SET @MailRecipients = 'DBAGroup@adventureworks.com'--Set limit in minutes (applies to all jobs)--NOTE: Percentage limit is applied to all jobs where average runtime greater than 5 minutes--else the time limit is simply average + 10 minutesDECLARE @JobLimitPercentage FLOATSET @JobLimitPercentage = 150 --Use whole percentages greater than 100 -- Create intermediate work tables for currently running jobsDECLARE @currently_running_jobs TABLE ( job_id UNIQUEIDENTIFIER NOT NULL ,last_run_date INT NOT NULL ,last_run_time INT NOT NULL ,next_run_date INT NOT NULL ,next_run_time INT NOT NULL ,next_run_schedule_id INT NOT NULL ,requested_to_run INT NOT NULL ,-- BOOL request_source INT NOT NULL ,request_source_id SYSNAME COLLATE database_default NULL ,running INT NOT NULL ,-- BOOL current_step INT NOT NULL ,current_retry_attempt INT NOT NULL ,job_state INT NOT NULL ) -- 0 = Not idle or suspended, 1 = Executing, 2 = Waiting For Thread, 3 = Between Retries, 4 = Idle, 5 = Suspended, [6 = WaitingForStepToFinish], 7 = PerformingCompletionActions--Capture Jobs currently workingINSERT INTO @currently_running_jobsEXECUTE master.dbo.xp_sqlagent_enum_jobs 1,''--Temp table exists checkIF OBJECT_ID('tempdb..##RunningJobs') IS NOT NULL DROP TABLE ##RunningJobsCREATE TABLE ##RunningJobs ( [JobID] [UNIQUEIDENTIFIER] NOT NULL ,[JobName] [sysname] NOT NULL ,[StartExecutionDate] [DATETIME] NOT NULL ,[AvgDurationMin] [INT] NULL ,[DurationLimit] [INT] NULL ,[CurrentDuration] [INT] NULL )INSERT INTO ##RunningJobs ( JobID ,JobName ,StartExecutionDate ,AvgDurationMin ,DurationLimit ,CurrentDuration )SELECT jobs.Job_ID AS JobID ,jobs.NAME AS JobName ,act.start_execution_date AS StartExecutionDate ,AVG(FLOOR(run_duration / 100)) AS AvgDurationMin ,CASE --If job average less than 5 minutes then limit is avg+10 minutes WHEN AVG(FLOOR(run_duration / 100)) <= 5 THEN (AVG(FLOOR(run_duration / 100))) + 10 --If job average greater than 5 minutes then limit is avg*limit percentage ELSE (AVG(FLOOR(run_duration / 100)) * (@JobLimitPercentage / 100)) END AS DurationLimit ,DATEDIFF(MI, act.start_execution_date, GETDATE()) AS [CurrentDuration]FROM @currently_running_jobs crjINNER JOIN msdb..sysjobs AS jobs ON crj.job_id = jobs.job_idINNER JOIN msdb..sysjobactivity AS act ON act.job_id = crj.job_id AND act.stop_execution_date IS NULL AND act.start_execution_date IS NOT NULLINNER JOIN msdb..sysjobhistory AS hist ON hist.job_id = crj.job_id AND hist.step_id = 0WHERE crj.job_state = 1GROUP BY jobs.job_ID ,jobs.NAME ,act.start_execution_date ,DATEDIFF(MI, act.start_execution_date, GETDATE())HAVING CASE WHEN AVG(FLOOR(run_duration / 100)) <= 5 THEN (AVG(FLOOR(run_duration / 100))) + 10 ELSE (AVG(FLOOR(run_duration / 100)) * (@JobLimitPercentage / 100)) END < DATEDIFF(MI, act.start_execution_date, GETDATE())--Checks to see if a long running job has already been identified so you are not alerted multiple timesIF EXISTS ( SELECT RJ.* FROM ##RunningJobs RJ WHERE CHECKSUM(RJ.JobID, RJ.StartExecutionDate) NOT IN ( SELECT CHECKSUM(JobID, StartExecutionDate) FROM dbo.LongRunningJobs ) ) --Send email with results of long-running jobs EXEC msdb.dbo.sp_send_dbmail @profile_name = @MailProfile ,@recipients = @MailRecipients ,@query = 'USE DBAdmin; Select RJ.*From ##RunningJobs RJWHERE CHECKSUM(RJ.JobID,RJ.StartExecutionDate) NOT IN (Select CHECKSUM(JobID,StartExecutionDate) From dbo.LongRunningJobs) ' ,@body = 'View attachment to view long running jobs' ,@subject = 'Long Running SQL Agent Job Alert' ,@attach_query_result_as_file = 1;--Populate LongRunningJobs table with jobs exceeding established limitsINSERT INTO [DBAdmin].[dbo].[LongRunningJobs] ( [JobID] ,[JobName] ,[StartExecutionDate] ,[AvgDurationMin] ,[DurationLimit] ,[CurrentDuration] ) ( SELECT RJ.* FROM ##RunningJobs RJ WHERE CHECKSUM(RJ.JobID, RJ.StartExecutionDate) NOT IN ( SELECT CHECKSUM(JobID, StartExecutionDate) FROM dbo.LongRunningJobs ) )GO
Got any feedback/comments/criticisms? Let me hear them in the comments!