Message13651

Author wcmaier
Recipients ajit, dan, dasu, frankjp, rader, radtke, wcmaier
Date 2008.02.13 11:37
Content
Hi, Frank-

On Wed, Feb 13, 2008 at 01:56:10PM +0000, Frank Petriello via UW-HEP Help System wrote:
> I am submitting a question about Condor in hopes that perhaps
> Sridhara or Dan Bradley can help, please let me know if there is
> someone else in Physics I can send this to.

Dan or Sridhara may yet chime in, but my first suggestion would be
to direct your jobs' output somewhere besides AFS. Writing job
output to AFS puts a significant strain on our resources and can
cause difficult-to-debug failures.

Beyond that, I don't have access to the machines you mention, but I
don't see anything odd in their ClassAds. Dan spot checked a few of
them, and wasn't able to find any memory or similar problems,
either.

[...]
> ----------
> 001 (046.010.000) 02/13 02:03:20 Job executing on host: 
> <128.105.167.40:47354>
> 
> 005 (046.010.000) 02/13 02:03:23 Job terminated.
> 	(0) Abnormal termination (signal 11)
> 	(1) Corefile in: 
> /afs/hep.wisc.edu/user/frankjp/Work/ttZ/NLOV_qqb_minimalLarin/condor_submits/core.46.10

This core file doesn't seem to exist any more. Did you clean it up?
If possible, a preserved core file it might help us debug your
problem.

Thanks!

-- 

o--------------------------{ Will Maier }--------------------------o
| jabber:...wcmaier@xmpp.lfod.us | email:..will.maier@hep.wisc.edu |
| office:...........608.263.9692 | cell:..............608.438.6162 |
*--------------------[ UW High Energy Physics ]--------------------*
History
Date User Action Args
2008-02-13 11:37:33wcmaiersetrecipients: + wcmaier, rader, dan, dasu, ajit, radtke, frankjp
2008-02-13 11:37:33wcmaierlinkissue5109 messages
2008-02-13 11:37:33wcmaiercreate