patternsqlMinor
Linking job steps from a single run together in sysjobhistory
Viewed 0 times
linkingtogethersysjobhistorysinglefromjobstepsrun
Problem
We have several SQL Servers running SQL Agent jobs, recording history in their own sysjobhistory tables. I am attempting to set up one centralized server with a job that collects the history from all the other servers, formats it a bit, and puts it into a table called AllJobHistory. As part of this process, I would like one column to indicate that multiple steps of a job were part of the same job run. They are already marked as being part of the same job through the job_id column, but I want to know that specific rows came from the 3:00 run of the job versus the 4:00 run. Being able to filter based on this column would make troubleshooting jobs that much easier for us, but I don’t see anything that links those steps together in any existing system table or DMV, is there?
My first attempt at rolling my own was to use the run_date, run_time, and run_duration columns. For each step, if I subtract the total run_duration up to this point from the run_time, it should bring me back to a time that is unique compared to all other runs of this job. This looked like it was working until I found a case where it wasn’t (likely because SQL Server is rounding run_time and run_duration at the precision of seconds). Here is my attempt at the query (with extra columns removed).
```
WITH JobDetails AS
(
SELECT
QUOTENAME(UPPER('ServerName')) AS [Server],
j.job_id AS [JobID],
j.name AS [JobName],
s.step_id AS [Step],
msdb.dbo.agent_datetime(run_date, run_time) AS [RunDate],
(run_duration/100003600 + (run_duration/100)%10060 + run_duration%100) AS [RunDurationSeconds]
FROM msdb.dbo.sysjobhistory h
INNER JOIN msdb.dbo.sysjobs j ON j.job_id = h.job_id
LEFT OUTER JOIN msdb.dbo.sysjobsteps s ON s.job_id = h.job_id AND s.step_id = h.step_id
WHERE h.step_id != 0
), GroupedDetails AS (
SELECT
jd.[Server],
jd.[JobID],
jd.JobName,
jd.Step,
jd.RunDate,
jd.RunDurationSeco
My first attempt at rolling my own was to use the run_date, run_time, and run_duration columns. For each step, if I subtract the total run_duration up to this point from the run_time, it should bring me back to a time that is unique compared to all other runs of this job. This looked like it was working until I found a case where it wasn’t (likely because SQL Server is rounding run_time and run_duration at the precision of seconds). Here is my attempt at the query (with extra columns removed).
```
WITH JobDetails AS
(
SELECT
QUOTENAME(UPPER('ServerName')) AS [Server],
j.job_id AS [JobID],
j.name AS [JobName],
s.step_id AS [Step],
msdb.dbo.agent_datetime(run_date, run_time) AS [RunDate],
(run_duration/100003600 + (run_duration/100)%10060 + run_duration%100) AS [RunDurationSeconds]
FROM msdb.dbo.sysjobhistory h
INNER JOIN msdb.dbo.sysjobs j ON j.job_id = h.job_id
LEFT OUTER JOIN msdb.dbo.sysjobsteps s ON s.job_id = h.job_id AND s.step_id = h.step_id
WHERE h.step_id != 0
), GroupedDetails AS (
SELECT
jd.[Server],
jd.[JobID],
jd.JobName,
jd.Step,
jd.RunDate,
jd.RunDurationSeco
Solution
Note that
So, try doing an initial pull of the data from the rows where
I used something much like this for a failed job report I used to run, at a former employer.
Here's a stripped down, modified version of that code. It ran for me in a quick test on a SQL Server 2016 box, just now. However, I don't have any jobs that run frequently enough that multiple runs would have the same run times.
If you actually have a job that runs more that once a second, you might have to use a windowing function, to ensure that you don't pick up job steps from an earlier run of the job with the same run_time value.
Caveat: If the limit on the number of rows each job can have in
sysjobhistory has an ID column (instance_id). at least one entry should hit the table for each step of a job that was completed, followed by an entry with step_id = 0, recording the outcome of the job. Each step also records the time (run_date and run_time) the step was started, which will be equal to or greater than the time the job was started. So, the step_id = 0 row for a given run has a higher instance_id than the related steps, but a lower (or equal) run time.So, try doing an initial pull of the data from the rows where
step_id = 0 into a temp table (or equivalent). Then, all rows from sysjobhistory with the same job_id, a lower instance_id, and a higher or equal start time (from run_date and run_time) should belong to the job run you're looking for.I used something much like this for a failed job report I used to run, at a former employer.
Here's a stripped down, modified version of that code. It ran for me in a quick test on a SQL Server 2016 box, just now. However, I don't have any jobs that run frequently enough that multiple runs would have the same run times.
USE msdb;
DECLARE @start_date varchar(8) = '20171001';
IF(OBJECT_ID('tempdb..#failhist') IS NOT NULL) DROP TABLE #failhist;
CREATE TABLE #failhist
(
[job_name] [sysname] NOT NULL,
[run_datetime] [datetime] NULL,
[run_ended] [datetime] NULL,
[instance_id] [int] NOT NULL,
[job_id] [uniqueidentifier] NOT NULL,
[run_date] [int] NOT NULL,
[run_time] [int] NOT NULL
);
INSERT INTO #failhist
select *
FROM
(SELECT j.name as job_name
,CONVERT(DATETIME, RTRIM(run_date)) + (run_time * 9 + run_time % 10000 * 6 + run_time % 100 * 10) / 216e4
as run_datetime
,CONVERT(DATETIME, RTRIM(run_date)) + (run_time * 9 + run_time % 10000 * 6 + run_time % 100 * 10) / 216e4 + (run_duration * 9 + run_duration % 10000 * 6 + run_duration % 100 * 10) / 216e4
as run_ended
,h.instance_id
,h.job_id
,h.run_date
,h.run_time
from msdb..sysjobhistory h INNER JOIN msdb..sysjobs j ON h.job_id = j.job_id
WHERE h.step_id = 0
and h.run_date >= @start_date
) x
;
SELECT
t.job_name AS [Job/Step]
,' ' + CONVERT(varchar(19), t.run_datetime, 121) + ' to ' + CONVERT(varchar(19), t.run_ended, 121) AS Run_Msg
,0 as HdrDetail
,t.job_name
,t.run_datetime
,t.instance_id
,-1 as step_id
,jh.instance_id as actual_instance_id
FROM #failhist t
INNER JOIN msdb..sysjobhistory jh ON ( t.job_id = jh.job_id
AND t.instance_id >= jh.instance_id
AND ( t.run_date = jh.instance_id
AND ( t.run_date < jh.run_date
OR ( t.run_date = jh.run_date
AND t.run_time <= jh.run_time
)
)
)
ORDER BY run_datetime, instance_id, step_idIf you actually have a job that runs more that once a second, you might have to use a windowing function, to ensure that you don't pick up job steps from an earlier run of the job with the same run_time value.
Caveat: If the limit on the number of rows each job can have in
sysjobhistory is hit, you may get odd/incomplete results. Also, I occasionally saw jobs that failed without generating a job outcome (usually a temporary failure to authenticate the Windows user running the job).Code Snippets
USE msdb;
DECLARE @start_date varchar(8) = '20171001';
IF(OBJECT_ID('tempdb..#failhist') IS NOT NULL) DROP TABLE #failhist;
CREATE TABLE #failhist
(
[job_name] [sysname] NOT NULL,
[run_datetime] [datetime] NULL,
[run_ended] [datetime] NULL,
[instance_id] [int] NOT NULL,
[job_id] [uniqueidentifier] NOT NULL,
[run_date] [int] NOT NULL,
[run_time] [int] NOT NULL
);
INSERT INTO #failhist
select *
FROM
(SELECT j.name as job_name
,CONVERT(DATETIME, RTRIM(run_date)) + (run_time * 9 + run_time % 10000 * 6 + run_time % 100 * 10) / 216e4
as run_datetime
,CONVERT(DATETIME, RTRIM(run_date)) + (run_time * 9 + run_time % 10000 * 6 + run_time % 100 * 10) / 216e4 + (run_duration * 9 + run_duration % 10000 * 6 + run_duration % 100 * 10) / 216e4
as run_ended
,h.instance_id
,h.job_id
,h.run_date
,h.run_time
from msdb..sysjobhistory h INNER JOIN msdb..sysjobs j ON h.job_id = j.job_id
WHERE h.step_id = 0
and h.run_date >= @start_date
) x
;
SELECT
t.job_name AS [Job/Step]
,' ' + CONVERT(varchar(19), t.run_datetime, 121) + ' to ' + CONVERT(varchar(19), t.run_ended, 121) AS Run_Msg
,0 as HdrDetail
,t.job_name
,t.run_datetime
,t.instance_id
,-1 as step_id
,jh.instance_id as actual_instance_id
FROM #failhist t
INNER JOIN msdb..sysjobhistory jh ON ( t.job_id = jh.job_id
AND t.instance_id >= jh.instance_id
AND ( t.run_date < jh.run_date
OR ( t.run_date = jh.run_date
AND t.run_time <= jh.run_time
)
)
)
WHERE jh.step_id = 0
UNION ALL
SELECT
' '
+CASE
WHEN jh.step_id = 0
THEN 'Job Summary'
WHEN jh.run_status BETWEEN 0 AND 1 -- summary
THEN RIGHT(CAST(100 + jh.step_id as varchar),2) + ' - ' + jh.step_name
ELSE ' ' + RIGHT(CAST(100 + jh.step_id as varchar),2) + ' (add''l info)'
END
,' ' + CAST(jh.message as varchar(max))
,1
,t.job_name
,t.run_datetime
,t.instance_id
,jh.step_id
,jh.instance_id
FROM #failhist t
INNER JOIN msdb..sysjobhistory jh ON ( t.job_id = jh.job_id
AND t.instance_id >= jh.instance_id
AND ( t.run_date < jh.run_date
OR ( t.run_date = jh.run_date
AND t.run_time <= jh.run_time
)
)
Context
StackExchange Database Administrators Q#187672, answer score: 5
Revisions (0)
No revisions yet.