Issue5373

Title g14n*_0 partitions overfull
Priority urgent Status resolved
Superseder Nosy List ajit, dan, dasu, rader, radtke, wcmaier
Assigned To Topic dCache
Group IT

Created 2008.07.23 14:39 by dan.
Last changed 2008.07.28 13:31 by wcmaier.

Messages
msg14593 (view) From: wcmaier To: ajit, dan, dasu, rader, radtke, wcmaier Date: 2008.07.28 13:30
On Mon, Jul 28, 2008 at 11:24:51AM -0500, Will Maier via UW-HEP Help System wrote:
> I'm shutting dCache down on the bulk of the g14 machines now so I
> can run the rebalancer script. Things should be back up in an
> hourish.

The pools have been running for an hour or so. I also reverted the
s/write-pools/read-pools/ change, so they should get some new data
as well.

The script I used is at /cms/sw/dcache/bin/dcache_rebalance; I ran
it like so:

    /cms/sw/dcache/bin/dcache_rebalance /data0 /data1 /data2 /data3

That causes the script to move data from /data0 to any of the other
data partitions until it reaches a reasonable threshold.

-- 

o--------------------------{ Will Maier }--------------------------o
| jabber:...wcmaier@xmpp.lfod.us | email:..will.maier@hep.wisc.edu |
| office:...........608.263.9692 | cell:..............608.438.6162 |
*--------------------[ UW High Energy Physics ]--------------------*
msg14592 (view) From: wcmaier To: ajit, dan, dasu, rader, radtke, wcmaier Date: 2008.07.28 11:24
On Mon, Jul 28, 2008 at 08:52:32AM -0500, Will Maier via UW-HEP Help System wrote:
> I just finished a script to rebalance the data from a full pool to
> less full pools on the same host. I'm nearly done testing it on
> g14n01; when that's ready, I'll move onto the other hosts. I
> expect to be done in an hour or few.

I'm shutting dCache down on the bulk of the g14 machines now so I
can run the rebalancer script. Things should be back up in an
hourish.

-- 

o--------------------------{ Will Maier }--------------------------o
| jabber:...wcmaier@xmpp.lfod.us | email:..will.maier@hep.wisc.edu |
| office:...........608.263.9692 | cell:..............608.438.6162 |
*--------------------[ UW High Energy Physics ]--------------------*
msg14581 (view) From: wcmaier To: ajit, dan, dasu, rader, radtke, wcmaier Date: 2008.07.28 08:52
On Thu, Jul 24, 2008 at 12:04:12PM -0500, Will Maier via UW-HEP Help System wrote:
> Oh, yikes. I must've overlooked that when we set the g14s up. I
> don't have much network access up here, but I'll certainly help with
> the cleanup this weekend or Monday (if there's any left to do).

I just finished a script to rebalance the data from a full pool to
less full pools on the same host. I'm nearly done testing it on
g14n01; when that's ready, I'll move onto the other hosts. I expect
to be done in an hour or few.

-- 

o--------------------------{ Will Maier }--------------------------o
| jabber:...wcmaier@xmpp.lfod.us | email:..will.maier@hep.wisc.edu |
| office:...........608.263.9692 | cell:..............608.438.6162 |
*--------------------[ UW High Energy Physics ]--------------------*
msg14568 (view) From: wcmaier To: ajit, dan, dasu, rader, radtke, wcmaier Date: 2008.07.24 12:04
On Wed, Jul 23, 2008 at 02:39:30PM -0500, Dan Bradley via UW-HEP Help System wrote:
> The g14 _0 partitions are 100% full and this is causing write errors.  
> The problem is that dcache has been configured to expect the same amount 
> of space on the _0 partitions as the others.
> 
> I'll start fixing this now.

Oh, yikes. I must've overlooked that when we set the g14s up. I
don't have much network access up here, but I'll certainly help with
the cleanup this weekend or Monday (if there's any left to do).

Sigh...

-- 

o--------------------------{ Will Maier }--------------------------o
| jabber:...wcmaier@xmpp.lfod.us | email:..will.maier@hep.wisc.edu |
| office:...........608.263.9692 | cell:..............608.438.6162 |
*--------------------[ UW High Energy Physics ]--------------------*
msg14565 (view) From: dan To: ajit, dan, dasu, rader, radtke, wcmaier Date: 2008.07.23 14:39
The g14 _0 partitions are 100% full and this is causing write errors.  
The problem is that dcache has been configured to expect the same amount 
of space on the _0 partitions as the others.

I'll start fixing this now.

--Dan
History
Date User Action Args
2008-07-28 13:31:02wcmaiersetstatus: chatting -> resolved
2008-07-28 13:30:42wcmaiersetmessages: + msg14593
2008-07-28 11:24:51wcmaiersetmessages: + msg14592
2008-07-28 08:52:32wcmaiersetmessages: + msg14581
2008-07-28 06:57:41wcmaiersetpriority: triage -> urgent
topic: + dCache
assignedto: wcmaier
2008-07-24 12:04:12wcmaiersetstatus: unread -> chatting
messages: + msg14568
2008-07-23 14:39:30dancreate