Issue5268

Title clock skew on cmsdcache03?
Priority normal Status resolved
Superseder Nosy List ajit, dan, dasu, rader, radtke, wcmaier
Assigned To Topic Linux
Group IT

Created 2008.05.17 09:41 by dan.
Last changed 2008.05.19 06:29 by wcmaier.

Messages
msg14215 (view) From: wcmaier To: ajit, dan, dasu, rader, radtke, wcmaier Date: 2008.05.19 06:29
On Mon, May 19, 2008 at 06:24:55AM -0500, Will Maier via UW-HEP Help System wrote:
> Those messages often happen when the network interface changes and
> ntpd is not restarted. I've restarted ntpd and haven't seen any of
> those since the restart; I'll keep my eye on it today.

I see successful synchronization events -- this appears to be fixed.

-- 

o--------------------------{ Will Maier }--------------------------o
| jabber:...wcmaier@xmpp.lfod.us | email:..will.maier@hep.wisc.edu |
| office:...........608.263.9692 | cell:..............608.438.6162 |
*--------------------[ UW High Energy Physics ]--------------------*
msg14214 (view) From: wcmaier To: ajit, dan, dasu, rader, radtke, wcmaier Date: 2008.05.19 06:24
On Sat, May 17, 2008 at 09:45:42AM -0500, Dan Bradley via UW-HEP Help System wrote:
> I see in the system logs on cmsdcache03 a lot of messages like this:
> 
> May 17 09:01:56 cmsdcache03 ntpd[2668]: sendto(128.104.28.199): Invalid 
> argument
> May 17 09:02:34 cmsdcache03 ntpd[2668]: sendto(144.92.9.22): Invalid 
> argument
> May 17 09:15:49 cmsdcache03 ntpd[2668]: sendto(128.105.39.11): Invalid 
> argument
> 
> That probably explains why ntpd wasn't synchronizing the clock.  What's 
> causing this?

Those messages often happen when the network interface changes and
ntpd is not restarted. I've restarted ntpd and haven't seen any of
those since the restart; I'll keep my eye on it today.

-- 

o--------------------------{ Will Maier }--------------------------o
| jabber:...wcmaier@xmpp.lfod.us | email:..will.maier@hep.wisc.edu |
| office:...........608.263.9692 | cell:..............608.438.6162 |
*--------------------[ UW High Energy Physics ]--------------------*
msg14213 (view) From: dan To: ajit, dan, dasu, rader, radtke, wcmaier Date: 2008.05.17 09:45
I see in the system logs on cmsdcache03 a lot of messages like this:

May 17 09:01:56 cmsdcache03 ntpd[2668]: sendto(128.104.28.199): Invalid 
argument
May 17 09:02:34 cmsdcache03 ntpd[2668]: sendto(144.92.9.22): Invalid 
argument
May 17 09:15:49 cmsdcache03 ntpd[2668]: sendto(128.105.39.11): Invalid 
argument

That probably explains why ntpd wasn't synchronizing the clock.  What's 
causing this?

--Dan

Dan Bradley via UW-HEP Help System wrote:
> The dCache health check failed on all SRM tests with the following:
>
> SRMClientV1 : org.globus.common.ChainedIOException: Authentication failed [Caused by: Defective credential detected [Caused by: [JGLOBUS-96] Certificate "DC=org,DC=doegrids,OU=Services,CN=cmsdcache03.hep.wisc.edu" expired]]
>
>
>
> The only thing I could find to explain this was that the clock on 
> cmsdcache03 was about 4 minutes fast.  I reset it with 'date' but that 
> didn't immediately fix the problem.  I then restarted dcache-srm and 
> things started working again.
>
> I don't know for sure that the clock skew was responsible, but clock 
> skew can cause this sort of authentication problem.  I don't understand 
> why ntpd allowed the clock to drift so much.  Maybe I just wasn't 
> patient enough to wait for it to synchronize.
>
> --Dan
>
> ----------
> group: IT
> messages: 14212
> nosy: ajit, dan, dasu, rader, radtke, wcmaier
> priority: triage
> status: unread
> title: clock skew on cmsdcache03?
>
> ______________________________________
> UW-HEP Help System <help@hep.wisc.edu>
> <https://help.hep.wisc.edu/issue5268>
> ______________________________________
>
msg14212 (view) From: dan To: ajit, dan, dasu, rader, radtke, wcmaier Date: 2008.05.17 09:41
The dCache health check failed on all SRM tests with the following:

SRMClientV1 : org.globus.common.ChainedIOException: Authentication failed [Caused by: Defective credential detected [Caused by: [JGLOBUS-96] Certificate "DC=org,DC=doegrids,OU=Services,CN=cmsdcache03.hep.wisc.edu" expired]]



The only thing I could find to explain this was that the clock on 
cmsdcache03 was about 4 minutes fast.  I reset it with 'date' but that 
didn't immediately fix the problem.  I then restarted dcache-srm and 
things started working again.

I don't know for sure that the clock skew was responsible, but clock 
skew can cause this sort of authentication problem.  I don't understand 
why ntpd allowed the clock to drift so much.  Maybe I just wasn't 
patient enough to wait for it to synchronize.

--Dan
History
Date User Action Args
2008-05-19 06:29:04wcmaiersetstatus: chatting -> resolved
topic: + Linux
assignedto: wcmaier
messages: + msg14215
priority: triage -> normal
2008-05-19 06:24:55wcmaiersetmessages: + msg14214
2008-05-17 09:45:42dansetstatus: unread -> chatting
messages: + msg14213
2008-05-17 09:41:01dancreate