HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Linking job steps from a single run together in sysjobhistory

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
linkingtogethersysjobhistorysinglefromjobstepsrun

Problem

We have several SQL Servers running SQL Agent jobs, recording history in their own sysjobhistory tables. I am attempting to set up one centralized server with a job that collects the history from all the other servers, formats it a bit, and puts it into a table called AllJobHistory. As part of this process, I would like one column to indicate that multiple steps of a job were part of the same job run. They are already marked as being part of the same job through the job_id column, but I want to know that specific rows came from the 3:00 run of the job versus the 4:00 run. Being able to filter based on this column would make troubleshooting jobs that much easier for us, but I don’t see anything that links those steps together in any existing system table or DMV, is there?

My first attempt at rolling my own was to use the run_date, run_time, and run_duration columns. For each step, if I subtract the total run_duration up to this point from the run_time, it should bring me back to a time that is unique compared to all other runs of this job. This looked like it was working until I found a case where it wasn’t (likely because SQL Server is rounding run_time and run_duration at the precision of seconds). Here is my attempt at the query (with extra columns removed).

```
WITH JobDetails AS
(
SELECT
QUOTENAME(UPPER('ServerName')) AS [Server],
j.job_id AS [JobID],
j.name AS [JobName],
s.step_id AS [Step],
msdb.dbo.agent_datetime(run_date, run_time) AS [RunDate],
(run_duration/100003600 + (run_duration/100)%10060 + run_duration%100) AS [RunDurationSeconds]
FROM msdb.dbo.sysjobhistory h
INNER JOIN msdb.dbo.sysjobs j ON j.job_id = h.job_id
LEFT OUTER JOIN msdb.dbo.sysjobsteps s ON s.job_id = h.job_id AND s.step_id = h.step_id
WHERE h.step_id != 0
), GroupedDetails AS (
SELECT
jd.[Server],
jd.[JobID],
jd.JobName,
jd.Step,
jd.RunDate,
jd.RunDurationSeco

Solution

Note that sysjobhistory has an ID column (instance_id). at least one entry should hit the table for each step of a job that was completed, followed by an entry with step_id = 0, recording the outcome of the job. Each step also records the time (run_date and run_time) the step was started, which will be equal to or greater than the time the job was started. So, the step_id = 0 row for a given run has a higher instance_id than the related steps, but a lower (or equal) run time.

So, try doing an initial pull of the data from the rows where step_id = 0 into a temp table (or equivalent). Then, all rows from sysjobhistory with the same job_id, a lower instance_id, and a higher or equal start time (from run_date and run_time) should belong to the job run you're looking for.

I used something much like this for a failed job report I used to run, at a former employer.

Here's a stripped down, modified version of that code. It ran for me in a quick test on a SQL Server 2016 box, just now. However, I don't have any jobs that run frequently enough that multiple runs would have the same run times.

USE msdb;

DECLARE @start_date varchar(8) = '20171001';

IF(OBJECT_ID('tempdb..#failhist') IS NOT NULL) DROP TABLE #failhist;

CREATE TABLE #failhist
(
    [job_name] [sysname] NOT NULL,
    [run_datetime] [datetime] NULL,
    [run_ended] [datetime] NULL,
    [instance_id] [int] NOT NULL,
    [job_id] [uniqueidentifier] NOT NULL,
    [run_date] [int] NOT NULL,
    [run_time] [int] NOT NULL
);

INSERT INTO #failhist
select *
  FROM
(SELECT j.name as job_name
       ,CONVERT(DATETIME, RTRIM(run_date)) + (run_time * 9 + run_time % 10000 * 6 + run_time % 100 * 10) / 216e4
        as run_datetime
       ,CONVERT(DATETIME, RTRIM(run_date)) + (run_time * 9 + run_time % 10000 * 6 + run_time % 100 * 10) / 216e4 + (run_duration * 9 + run_duration % 10000 * 6 + run_duration % 100 * 10) / 216e4
        as run_ended
       ,h.instance_id
       ,h.job_id
       ,h.run_date
       ,h.run_time
   from msdb..sysjobhistory h INNER JOIN msdb..sysjobs j ON h.job_id = j.job_id
  WHERE h.step_id = 0
    and h.run_date >= @start_date
) x
;

SELECT
       t.job_name AS [Job/Step]
      ,' ' + CONVERT(varchar(19), t.run_datetime, 121) + ' to ' + CONVERT(varchar(19), t.run_ended, 121) AS Run_Msg
      ,0 as HdrDetail
      ,t.job_name
      ,t.run_datetime
      ,t.instance_id
      ,-1 as step_id
      ,jh.instance_id as actual_instance_id
  FROM #failhist t
         INNER JOIN msdb..sysjobhistory jh ON (    t.job_id = jh.job_id
                                               AND t.instance_id >= jh.instance_id
                                               AND (   t.run_date = jh.instance_id
                                               AND (   t.run_date < jh.run_date
                                                    OR (    t.run_date = jh.run_date
                                                        AND t.run_time <= jh.run_time
                                                       )
                                                   )
                                              )
 ORDER BY run_datetime, instance_id, step_id


If you actually have a job that runs more that once a second, you might have to use a windowing function, to ensure that you don't pick up job steps from an earlier run of the job with the same run_time value.

Caveat: If the limit on the number of rows each job can have in sysjobhistory is hit, you may get odd/incomplete results. Also, I occasionally saw jobs that failed without generating a job outcome (usually a temporary failure to authenticate the Windows user running the job).

Code Snippets

USE msdb;

DECLARE @start_date varchar(8) = '20171001';

IF(OBJECT_ID('tempdb..#failhist') IS NOT NULL) DROP TABLE #failhist;

CREATE TABLE #failhist
(
    [job_name] [sysname] NOT NULL,
    [run_datetime] [datetime] NULL,
    [run_ended] [datetime] NULL,
    [instance_id] [int] NOT NULL,
    [job_id] [uniqueidentifier] NOT NULL,
    [run_date] [int] NOT NULL,
    [run_time] [int] NOT NULL
);


INSERT INTO #failhist
select *
  FROM
(SELECT j.name as job_name
       ,CONVERT(DATETIME, RTRIM(run_date)) + (run_time * 9 + run_time % 10000 * 6 + run_time % 100 * 10) / 216e4
        as run_datetime
       ,CONVERT(DATETIME, RTRIM(run_date)) + (run_time * 9 + run_time % 10000 * 6 + run_time % 100 * 10) / 216e4 + (run_duration * 9 + run_duration % 10000 * 6 + run_duration % 100 * 10) / 216e4
        as run_ended
       ,h.instance_id
       ,h.job_id
       ,h.run_date
       ,h.run_time
   from msdb..sysjobhistory h INNER JOIN msdb..sysjobs j ON h.job_id = j.job_id
  WHERE h.step_id = 0
    and h.run_date >= @start_date
) x
;

SELECT
       t.job_name AS [Job/Step]
      ,' ' + CONVERT(varchar(19), t.run_datetime, 121) + ' to ' + CONVERT(varchar(19), t.run_ended, 121) AS Run_Msg
      ,0 as HdrDetail
      ,t.job_name
      ,t.run_datetime
      ,t.instance_id
      ,-1 as step_id
      ,jh.instance_id as actual_instance_id
  FROM #failhist t
         INNER JOIN msdb..sysjobhistory jh ON (    t.job_id = jh.job_id
                                               AND t.instance_id >= jh.instance_id
                                               AND (   t.run_date < jh.run_date
                                                    OR (    t.run_date = jh.run_date
                                                        AND t.run_time <= jh.run_time
                                                       )
                                                   )
                                              )
 WHERE jh.step_id = 0
UNION ALL
SELECT
       '        '
      +CASE
         WHEN jh.step_id = 0
           THEN 'Job Summary'
         WHEN jh.run_status BETWEEN 0 AND 1  -- summary
           THEN RIGHT(CAST(100 + jh.step_id as varchar),2) + ' - ' + jh.step_name
         ELSE   '  ' + RIGHT(CAST(100 + jh.step_id as varchar),2) + ' (add''l info)'
       END
      ,'        ' + CAST(jh.message as varchar(max))
      ,1
      ,t.job_name
      ,t.run_datetime
      ,t.instance_id
      ,jh.step_id
      ,jh.instance_id
  FROM #failhist t
         INNER JOIN msdb..sysjobhistory jh ON (    t.job_id = jh.job_id
                                               AND t.instance_id >= jh.instance_id
                                               AND (   t.run_date < jh.run_date
                                                    OR (    t.run_date = jh.run_date
                                                        AND t.run_time <= jh.run_time
                                                       )
                                                   )
                  

Context

StackExchange Database Administrators Q#187672, answer score: 5

Revisions (0)

No revisions yet.