Created 2007.11.06 04:11 by bdahmes. Last changed 2007.11.08 08:04 by wcmaier.
| msg13115 (view) |
From: rader |
To: ajit, dan, dasu, rader, radtke, wcmaier |
Date: 2007.11.07 09:31 |
|
> The sequence of events seems simple enough, but I don't think it
> suggests a cause for the failure. What could cause a volume to
> (seemingly) spontaneously require a salvage?
I don't know. FWIW, vos backupsys hanging and leaving
volumes needing salvaging has happened a few times in
the more distant past.
steve
--
|
| msg13112 (view) |
From: wcmaier |
To: ajit, dan, dasu, rader, radtke, wcmaier |
Date: 2007.11.06 12:05 |
|
On Tue, Nov 06, 2007 at 06:32:02AM -0600, Will Maier via UW-HEP Help System wrote:
> I'm not familiar enough with afs2nsr to quickly understand what
> might have gone wrong.
I don't see anything in the nrs-afs log (/var/tmp/afsasm.log),
probably because I killed the backup process before it could log.
The clone of osg.app.cmssoft started just fine:
Mon Nov 5 23:00:13 2007 1 Volser: Clone: Recloning volume 536874400 to volume 536874402
~90 minutes later, the volserver noticed that the transfer wasn't
going anywhere:
Tue Nov 6 00:19:51 2007 trans 2407674 on volume 536874402 is older than 300 seconds
This continued until ~00:30. At ~0400, the volserver started to
complain that osg.app.cmssoft needed salvaging:
Tue Nov 6 03:53:38 2007 VAttachVolume: volume salvage flag is ON for /vicepa/V0536874400.vol; volume needs salvage
Tue Nov 6 03:53:38 2007 1 Volser: ListVolumes: Could not attach volume 536874400 (V0536874400.vol) error=101
This continued until I started the salvage at ~0630.
The sequence of events seems simple enough, but I don't think it
suggests a cause for the failure. What could cause a volume to
(seemingly) spontaneously require a salvage?
--
o--------------------------{ Will Maier }--------------------------o
| jabber:...wcmaier@xmpp.lfod.us | email:..will.maier@hep.wisc.edu |
| office:...........608.263.9692 | cell:..............608.438.6162 |
*--------------------[ UW High Energy Physics ]--------------------*
|
| msg13097 (view) |
From: wcmaier |
To: ajit, bdahmes, dan, dasu, rader, radtke, wcmaier |
Date: 2007.11.06 06:32 |
|
On Tue, Nov 06, 2007 at 11:40:35AM +0100, Sridhara Dasu wrote:
> Indeed AFS cmssoft area is out of commission. Unfortunately, we
> have to wait till one of our sysadmins wakes up and fixes this.
For some reason, one of the backup processes had grabbed onto the
volume and didn't let go:
/usr/bin/nsrfile -s - afsasm - /usr/bin/afs2nsr -s -p \
'/v/o/osg.app.cmssoft.backup/' % -p \
/v/o/osg.app.cmssoft.backup/ full
I killed this process and (on Steve's suggestion) started a salvage
of the cmssoft volume. The salvage completed a minute ago, and
access to the volume's data appears to be normal.
I'm not familiar enough with afs2nsr to quickly understand what
might have gone wrong. I imagine Steve will take a look at it when
he gets in.
--
o--------------------------{ Will Maier }--------------------------o
| jabber:...wcmaier@xmpp.lfod.us | email:..will.maier@hep.wisc.edu |
| office:...........608.263.9692 | cell:..............608.438.6162 |
*--------------------[ UW High Energy Physics ]--------------------*
|
| msg13096 (view) |
From: dasu |
To: ajit, bdahmes, dan, dasu, rader, radtke, wcmaier |
Date: 2007.11.06 04:40 |
|
Indeed AFS cmssoft area is out of commission. Unfortunately, we have
to wait till one of our sysadmins wakes up and fixes this.
Sridhara
---------------------------------------------------------------------
Prof. Sridhara Rao Dasu Department of Physics
dasu@hep.wisc.edu University of Wisconsin
http://www.hep.wisc.edu/~dasu 4289 Chamberlin Hall
608-262-3678 ( Office ) 1150 University Avenue
408-829-6625 (Wireless) Madison, WI 53706, USA
On Nov 6, 2007, at 11:11 AM, Bryan Michael DAHMES via UW-HEP Help
System wrote:
>
> Hello,
>
> When logging into wisconsin, I see the following:
>
> Welcome to login05.hep.wisc.edu!
>
> Scientific Linux 4.4 UW-HEP 31Oct07.01 on a 2.4 GHz Pentium4
>
> ##################################################################
>
> NOTICE: Do not use this system as a compute server.
> Any CPU-intensive processes running on this machine
> will be killed without warning or notification!
> Please use the Condor batch system. For more details
> see http://www.hep.wisc.edu/computing/condor.html or
> contact <condor-help@hep.wisc.edu>
>
> ##################################################################
> -bash: /afs/hep.wisc.edu/osg/app/cmssoft/cms/cmsset_default.sh:
> Connection timed out
>
> From there I can't seem to do any CMSSW related work.
>
> Is there a problem?
>
> Thanks,
> bryan
>
> ----------
> group: IT
> messages: 13095
> nosy: ajit, bdahmes, dan, dasu, rader, radtke, wcmaier
> priority: triage
> status: unread
> title: Connection timeout problem
>
> ______________________________________
> UW-HEP Help System <help@hep.wisc.edu>
> <https://help.hep.wisc.edu/issue4936>
> ______________________________________
|
| msg13095 (view) |
From: bdahmes |
To: ajit, bdahmes, dan, dasu, rader, radtke, wcmaier |
Date: 2007.11.06 04:11 |
|
Hello,
When logging into wisconsin, I see the following:
Welcome to login05.hep.wisc.edu!
Scientific Linux 4.4 UW-HEP 31Oct07.01 on a 2.4 GHz Pentium4
##################################################################
NOTICE: Do not use this system as a compute server.
Any CPU-intensive processes running on this machine
will be killed without warning or notification!
Please use the Condor batch system. For more details
see http://www.hep.wisc.edu/computing/condor.html or
contact <condor-help@hep.wisc.edu>
##################################################################
-bash: /afs/hep.wisc.edu/osg/app/cmssoft/cms/cmsset_default.sh:
Connection timed out
From there I can't seem to do any CMSSW related work.
Is there a problem?
Thanks,
bryan
|
|
| Date |
User |
Action |
Args |
| 2007-11-08 08:04:30 | wcmaier | set | status: chatting -> resolved superseder:
+ AFS server for user home directories |
| 2007-11-07 09:31:01 | rader | set | messages:
+ msg13115 |
| 2007-11-06 12:05:13 | wcmaier | set | messages:
+ msg13112 |
| 2007-11-06 11:06:33 | wcmaier | set | nosy:
- bdahmes |
| 2007-11-06 10:21:44 | wcmaier | set | priority: triage -> urgent topic:
+ AFS assignedto: wcmaier -> rader |
| 2007-11-06 06:32:02 | wcmaier | set | assignedto: wcmaier messages:
+ msg13097 |
| 2007-11-06 04:40:55 | dasu | set | status: unread -> chatting messages:
+ msg13096 |
| 2007-11-06 04:11:02 | bdahmes | create | |
|