Tuesday, June 29, 2010

Copy and Restore Job Errors with Log Shipping

Hopefully this can help some other people out, I couldn't find much on this error when I got it, and after a bit of digging I was able to sort out the problem.

I set up log shipping for a database in SQL 2008, very simple set up, just following the SSMS GUI. The backup job would run fine, but the copy and restore jobs on the secondary server kept failing.

This was the output of the copy job:

Microsoft (R) SQL Server Log Shipping Agent
[Assembly Version = 10.0.0.0, File Version = 10.0.1600.22 ((SQL_PreRelease).080709-1414 )]
Microsoft Corporation. All rights reserved.

2010-06-29 09:41:52.53 ----- START OF TRANSACTION LOG COPY -----
2010-06-29 09:41:52.80 *** Error: Could not retrieve copy settings for secondary ID '[removed]'.(Microsoft.SqlServer.Management.LogShipping) ***
2010-06-29 09:41:52.81 *** Error: The specified agent_id [removed] or agent_type 1 do not form a valid pair for log shipping monitoring processing.(.Net SqlClient Data Provider) ***
2010-06-29 09:41:52.82 *** Error: Could not log history/error message.(Microsoft.SqlServer.Management.LogShipping) ***
2010-06-29 09:41:52.82 *** Error: The specified agent_id [removed] or agent_type 1 do not form a valid pair for log shipping monitoring processing.(.Net SqlClient Data Provider) ***
2010-06-29 09:41:52.83 *** Error: Could not cleanup history.(Microsoft.SqlServer.Management.LogShipping) ***
2010-06-29 09:41:52.84 *** Error: The specified agent_id [removed] or agent_type 1 do not form a valid pair for log shipping monitoring processing.(.Net SqlClient Data Provider) ***
2010-06-29 09:41:52.84 ----- END OF TRANSACTION LOG COPY -----

Exit Status: 1 (Error)

As you can see, the same error message was the Error 32016: The specified agent_id %s or agent_type %d do not form a valid pair for log shipping monitoring processing.

I ended up doing a trace on the primary server whilst running the job, and was able to see that the error was being thrown by this code:

if (sys.fn_MSvalidatelogshipagentid(@agent_id, @agent_type) = 0)
begin
select @agent_idstring = cast(@agent_id as sysname)
raiserror(32016, 16, 1, @agent_idstring, @agent_type)
return 1
end

The function sys.fn_MSvalidatelogshipagentid returns either 1 or 0 using the following code:

return
case
when
((@agent_type = 0) and
exists (select * from msdb.dbo.log_shipping_monitor_primary
where primary_id = @agent_id))
then 1
when
((@agent_type in (1,2)) and
exists (select * from msdb.dbo.log_shipping_monitor_secondary
where secondary_id = @agent_id))
then 1
else 0
end

Now as I knew the agent type was either 1 (copy) or 2 (restore), so I looked at the table msdb.dbo.log_shipping_monitor_secondary on the primary server, which was empty, hence the function returning 0.

After a bit of banging my head on the desk as to why this function was running on the primary server, as that table is only meant to be populated on the secondary server, I had a look at the job to see how it was calling the sqllogship.exe program:

"D:\Program Files\Microsoft SQL Server\100\Tools\Binn\sqllogship.exe" -Copy [ID_removed] -server SRV01

where SRV01 is the name of the primary server.

So I changed the -server parameter over to the secondary server name (SRV02), re-ran the job, and it worked!

Reading the BOL about sqllogship.exe, it states for the -server parameter:

For -copy or -restore, instance_name must be the name of a secondary server in a log shipping configuration.

Which makes sense why the jobs now ran OK as they were now using the correct parameter value.

To summarize: If you are getting error 32016 The specified agent_id %s or agent_type %d do not form a valid pair for log shipping monitoring processing check the command of the Copy and Restore jobs, that the -server parameter is set to the secondary server name.

The weird thing is that log shipping was setup via SSMS, I just filled in the details and the jobs were created automatically - so SQL Server itself had put the parameter of SRV01 there on its own accord. I can't see how something I did made that parameter be the wrong value. Maybe I did do something, but I did recreate the log shipping several times and double-checked what information I had put in.

I'd be interested to hear if anyone else has had this issue where the primary server name ends up in the copy and restore jobs instead of the secondary server name, and how you set up your log shipping.