Created 2008.01.05 15:51 by wcmaier. Last changed 2008.01.07 10:01 by rader.
| msg13413 (view) |
From: rader |
To: ajit, dan, dasu, rader, radtke, wcmaier |
Date: 2008.01.07 10:01 |
|
Okay, garlic is out of production now.
The plan is to make wasabi into garlic and use the new hardware
for wasabi. (Matt: talk to me about that after you're done
with the Stringer desktoops.)
steve
--
> ---- Original Message ----
> From: Steve Rader via UW-HEP Help System <help@hep.wisc.edu>
>
> I just finished the "c-section" of garlic's old disk into
> a different case and motherboard.
>
> AFAICT, /afs/hep/osg/data recovered nicely on all systems.
>
> I'll start moving root.osg here in a sec.
>
> Will will order new hardware.
>
> steve
> --
>
> ______________________________________
> UW-HEP Help System <help@hep.wisc.edu>
> <https://help.hep.wisc.edu/issue5030>
> ______________________________________
|
| msg13412 (view) |
From: rader |
To: ajit, dan, dasu, rader, radtke, wcmaier |
Date: 2008.01.07 08:16 |
|
I just finished the "c-section" of garlic's old disk into
a different case and motherboard.
AFAICT, /afs/hep/osg/data recovered nicely on all systems.
I'll start moving root.osg here in a sec.
Will will order new hardware.
steve
--
|
| msg13409 (view) |
From: rader |
To: ajit, dan, dasu, rader, radtke, wcmaier |
Date: 2008.01.06 14:41 |
|
Yes, I kicked garlic at 1135ish and it crashed AGAIN at
1205ish. So I kicked it again just now.
Both times (four total now) I've seen console msgs about
clock skew and ntpd being disabled. So I also put in
a new battery.
Started yet another move of root.osg.
Fingers crossed.
steve
--
> ---- Original Message ----
> From: Will Maier via UW-HEP Help System <help@hep.wisc.edu>
>
> On Sun, Jan 06, 2008 at 04:29:51PM +0000, Will Maier via UW-HEP Help System wrote:
> > On Sat, Jan 05, 2008 at 06:00:22PM -0600, rader@ginseng.hep.wisc.edu wrote:
> > > It's up now.
> >
> > And back down again. When I arrived, there were messages about hda
> > on the console, so now we appear to have disk problems, too.
>
> And back down yet again. This is insane.
>
> Steve was able to move all volumes but osg.data off of garlic before
> it crashed, though there appears to be some lingering oddness with
> the VLDB. He'll go in and kick the box soon.
>
> --
>
> o--------------------------{ Will Maier }--------------------------o
> | jabber:...wcmaier@xmpp.lfod.us | email:..will.maier@hep.wisc.edu |
> | office:...........608.263.9692 | cell:..............608.438.6162 |
> *--------------------[ UW High Energy Physics ]--------------------*
>
> ______________________________________
> UW-HEP Help System <help@hep.wisc.edu>
> <https://help.hep.wisc.edu/issue5030>
> ______________________________________
|
| msg13408 (view) |
From: wcmaier |
To: ajit, dan, dasu, rader, radtke, wcmaier |
Date: 2008.01.06 10:31 |
|
On Sun, Jan 06, 2008 at 04:29:51PM +0000, Will Maier via UW-HEP Help System wrote:
> On Sat, Jan 05, 2008 at 06:00:22PM -0600, rader@ginseng.hep.wisc.edu wrote:
> > It's up now.
>
> And back down again. When I arrived, there were messages about hda
> on the console, so now we appear to have disk problems, too.
And back down yet again. This is insane.
Steve was able to move all volumes but osg.data off of garlic before
it crashed, though there appears to be some lingering oddness with
the VLDB. He'll go in and kick the box soon.
--
o--------------------------{ Will Maier }--------------------------o
| jabber:...wcmaier@xmpp.lfod.us | email:..will.maier@hep.wisc.edu |
| office:...........608.263.9692 | cell:..............608.438.6162 |
*--------------------[ UW High Energy Physics ]--------------------*
|
| msg13407 (view) |
From: wcmaier |
To: ajit, dan, dasu, rader, radtke, wcmaier |
Date: 2008.01.06 10:29 |
|
(Accidentally sent only to steve earlier; resending...)
On Sat, Jan 05, 2008 at 06:00:22PM -0600, rader@ginseng.hep.wisc.edu wrote:
> It's up now.
And back down again. When I arrived, there were messages about hda
on the console, so now we appear to have disk problems, too.
[...]
> Will, Matt: if either one of you comes in to kick it again, open
> er up and email around the battery type/spec.
Battery info:
Toshiba Lithium Battery
CR2032
3V
Japan
Rather than swap batteries, though, I think we need to move all data
off this machine ASAP and replace it with a new box (with LSI RAID)
as soon as John can ship us one.
--
o--------------------------{ Will Maier }--------------------------o
| jabber:...wcmaier@xmpp.lfod.us | email:..will.maier@hep.wisc.edu |
| office:...........608.263.9692 | cell:..............608.438.6162 |
*--------------------[ UW High Energy Physics ]--------------------*
|
| msg13406 (view) |
From: rader |
To: ajit, dan, dasu, help, rader, radtke, wcmaier |
Date: 2008.01.06 09:17 |
|
> And back down again. When I arrived, there were messages about hda
> on the console, so now we appear to have disk problems, too.
Garlic came, up but it's AFS was still hosed. And I noticed dfafs's
"vos examine" was hanging rather randomly on other hosts/volumes too.
I tracked that down to VLDB weirdness: "udebug anise 7003" said "I am
currently managing write trans 0.18" and "There are write locks held"
so I did "bos restart anise -all".
I'll start moving volumes to rosemary
steve
--
|
| msg13405 (view) |
From: rader |
To: ajit, dan, dasu, rader, radtke, wcmaier |
Date: 2008.01.05 18:11 |
|
It's up now. The oops trace was something about gettime
so I wonder if it's got a bad system clock battery. I recall
tracking down very mysterious problems with rosemary that
we resolved with a new battery.
Will, Matt: if either one of you comes in to kick it again,
open er up and email around the battery type/spec.
steve
--
> Steve went in and gave it a kick. Via IM, we agreed to copy the
> volumes off of garlic and onto rosemary (which has the most space at
> the moment). garlic's not giving us any extra reliability or
> load balancing now, so moving the volumes onto something more stable
> is a Good Thing. Steve will do this in a little bit.
|
| msg13404 (view) |
From: wcmaier |
To: ajit, dan, dasu, rader, radtke, wcmaier |
Date: 2008.01.05 17:37 |
|
On Sat, Jan 05, 2008 at 11:25:12PM +0000, Will Maier via UW-HEP Help System wrote:
> And it's down again.
Steve went in and gave it a kick. Via IM, we agreed to copy the
volumes off of garlic and onto rosemary (which has the most space at
the moment). garlic's not giving us any extra reliability or
load balancing now, so moving the volumes onto something more stable
is a Good Thing. Steve will do this in a little bit.
--
o--------------------------{ Will Maier }--------------------------o
| jabber:...wcmaier@xmpp.lfod.us | email:..will.maier@hep.wisc.edu |
| office:...........608.263.9692 | cell:..............608.438.6162 |
*--------------------[ UW High Energy Physics ]--------------------*
|
| msg13403 (view) |
From: wcmaier |
To: ajit, dan, dasu, rader, radtke, wcmaier |
Date: 2008.01.05 17:25 |
|
On Sat, Jan 05, 2008 at 10:10:53PM +0000, Will Maier via UW-HEP Help System wrote:
> garlic's been kicked.
And it's down again.
Matt, Steve: can either of you make it in? It seems like it might be
a good idea to move the volumes off of garlic, too.
--
o--------------------------{ Will Maier }--------------------------o
| jabber:...wcmaier@xmpp.lfod.us | email:..will.maier@hep.wisc.edu |
| office:...........608.263.9692 | cell:..............608.438.6162 |
*--------------------[ UW High Energy Physics ]--------------------*
|
| msg13402 (view) |
From: wcmaier |
To: ajit, dan, dasu, rader, radtke, wcmaier |
Date: 2008.01.05 16:10 |
|
On Sat, Jan 05, 2008 at 09:51:49PM +0000, Will Maier via UW-HEP Help System wrote:
> garlic crashed a few minutes ago. I'm going to go in and restart
> it.
garlic's been kicked.
> Steve, Matt: can you try to figure out what's bringing it down?
When I got here, there was a trace on the console. The system
appears to have crashed while swapping; the stack included cpu_idle,
too. Either way, garlic almost certainly has hardware issues, and
I'd guess memory/CPU.
--
o--------------------------{ Will Maier }--------------------------o
| jabber:...wcmaier@xmpp.lfod.us | email:..will.maier@hep.wisc.edu |
| office:...........608.263.9692 | cell:..............608.438.6162 |
*--------------------[ UW High Energy Physics ]--------------------*
|
| msg13401 (view) |
From: wcmaier |
To: ajit, dan, dasu, rader, radtke, wcmaier |
Date: 2008.01.05 15:51 |
|
garlic crashed a few minutes ago. I'm going to go in and restart it.
Steve, Matt: can you try to figure out what's bringing it down?
--
o--------------------------{ Will Maier }--------------------------o
| jabber:...wcmaier@xmpp.lfod.us | email:..will.maier@hep.wisc.edu |
| office:...........608.263.9692 | cell:..............608.438.6162 |
*--------------------[ UW High Energy Physics ]--------------------*
|
|
| Date |
User |
Action |
Args |
| 2008-01-07 10:01:01 | rader | set | messages:
+ msg13413 |
| 2008-01-07 08:16:28 | rader | set | messages:
+ msg13412 |
| 2008-01-07 06:34:42 | wcmaier | link | issue5027 superseder |
| 2008-01-07 06:34:09 | wcmaier | set | assignedto: rader |
| 2008-01-06 14:41:02 | rader | set | messages:
+ msg13409 |
| 2008-01-06 10:31:33 | wcmaier | set | messages:
+ msg13408 |
| 2008-01-06 10:29:47 | wcmaier | set | messages:
+ msg13407 |
| 2008-01-06 09:17:44 | rader | set | messages:
+ msg13406 |
| 2008-01-05 18:11:01 | rader | set | messages:
+ msg13405 |
| 2008-01-05 17:37:41 | wcmaier | set | messages:
+ msg13404 |
| 2008-01-05 17:25:12 | wcmaier | set | messages:
+ msg13403 |
| 2008-01-05 16:10:53 | wcmaier | set | status: unread -> chatting messages:
+ msg13402 |
| 2008-01-05 15:51:49 | wcmaier | create | |
|