Dear Devdatta,
> The job submission looks ok. I mean the jobs are submitted alright,
> although they are taking a bit too long to start running.
> The job ends and I get the .err, .log, .out file in my job drectory,
> which is under /scratch. But I do not get any root output file in my
> /pnfs area.
> I am pasting below the different output files for a particular job:
>
> ==============================
> gen209BaurWp_lhcVeryLowCuts-0000.err
> ==============================
> %MSG-s CMSException: BaurWgamInterface:source{*ctor*} 13-Jul-2008
> 15:19:09 CDT pre-events
> cms::Exception caught in cmsRun
> ---- Configuration BEGIN
> Error occured while creating source BaurWgamInterface
> ---- Configuration BEGIN
> OpenBaurWgamFileError Cannot open BaurWgam input file, check file name
> and path.
> ---- Configuration END
> ---- Configuration END
The above entry in the error log says that "it couldn't open the BaurWgam input
file". So I would guess that the job is simply dying there and not moving any
further. Is that giving you any hint of what's happening OR the print out is
normal ?
> %MSG
>
> =====================
> report.log
> =====================
> SENDING with Task:devdatta-login01.hep.wisc.edu-36763 Job:TaskMeta
> params : {'exe': 'cmsRun', 'taskId':
> 'devdatta-login01.hep.wisc.edu-36763', 'tool': 'farmout', 'jobId':
> 'TaskMeta', 'application': 'CMSSW_2_0_9', 'user': 'devdatta',
> 'scheduler': 'local-condor', 'taskType': 'simulation', 'GridName':
> '/DC=ch/DC=cern/OU=Organic
> Units/OU=Users/CN=devdatta/CN=670738/CN=Devdatta Majumder/CN=proxy',
> 'vo': 'cms'}
> SENDING with Task:devdatta-login01.hep.wisc.edu-36763 Job:0
> params : {'exe': 'cmsRun', 'taskId':
> 'devdatta-login01.hep.wisc.edu-36763', 'tool': 'farmout', 'sid':
> 'https://login01.hep.wisc.edu#1215974091#36763.0', 'jobId': '0',
> 'application': 'CMSSW_2_0_9', 'user': 'devdatta', 'scheduler':
> 'local-condor', 'taskType': 'simulation', 'GridName':
> '/DC=ch/DC=cern/OU=Organic
> Units/OU=Users/CN=devdatta/CN=670738/CN=Devdatta Majumder/CN=proxy',
> 'vo': 'cms'}
> SENDING with Task:devdatta-login01.hep.wisc.edu-36763 Job:0
> params : {'SyncCE': 'cmsgrid02.hep.wisc.edu
> <http://cmsgrid02.hep.wisc.edu>', 'SyncGridJobId':
> 'https://login01.hep.wisc.edu#1215974091#36763.0'}
> SENDING with Task:devdatta-login01.hep.wisc.edu-36763 Job:0
> params : {'ExeTime': '12', 'ExeExitCode': '65'}
> SENDING with Task:devdatta-login01.hep.wisc.edu-36763 Job:0
> params : {'JobExitCode': '65'}
>
> ===============================
> gen209BaurWp_lhcVeryLowCuts-0000.out
> ===============================
> Parameters sent to Dashboard.
> Parameters sent to Dashboard.
> Parameters sent to Dashboard.
> Error: Cannot open BaurWgam input file
The last line (above) is also indicating that the job can find/open the BaurWgam
input file. Are you sure that you have the file getting xfered to the Condor's
job directory (with the correct path etc.) in the Worker node and available to
the job ?
> ls -ltr
> total 28
> -rw------- 1 devdatta devdatta 7419 Jul 13 15:15 x509up_u1129
> -rw-rw-r-- 1 devdatta devdatta 1809 Jul 13 15:15
> gen209BaurWp_lhcVeryLowCuts-0000.cfg
> -rwxr-xr-x 1 devdatta devdatta 3724 Jul 13 15:15 condor_exec.exe
> -rw-r--r-- 1 devdatta devdatta 1036 Jul 13 15:18 report.log
> -rw-r--r-- 1 devdatta devdatta 137 Jul 13 15:19
> gen209BaurWp_lhcVeryLowCuts-0000.out
> -rw-r--r-- 1 devdatta devdatta 363 Jul 13 15:19
> gen209BaurWp_lhcVeryLowCuts-0000.err
> End of ls output
> cmsRun exited with status 65
The above line indicates that the job (i.e. cmsRun is exiting with code 65) as a
result of the BaurWgam input file open error.
> I would like to know something further. Is there a quota in the scratch
> areas? I had to delete the run areas of some earlier jobs to accommodate
> my ascii files, which are rather huge.
The /scratch is a 500GB disk and there is no quota and it is a *temporary space*
for users to store the input/output files while running jobs. But the
responsibility lies on the users to clean up regularly, otherwise jobs will
start to fail when the space is filled up completely.
> Besides, can I move my datafiles from /scratch to /pnfs? This is
> important as I do not have any backup copy of those and I believe the
> /scratch area is automatically cleaned periodically. Please let me know
> how I can directly copy any kind of files to mu /pnfs area.
Are you talking about the output root files that your job is producing OR the
input ascii files ?
If it's the output root files, then assuming that you are using the FarmOut
scripts to submit your jobs to condor thee root files should be ending up in
/pnfs directly, but not in the /scratch. Isn't that the case ? If not, we need
to figure what's going wrong.
In case of the input ascii files, you can put them in your AFS home directory
here. How big are the files and what's the total size ?
Thanks,
- Ajit
> 2008/7/14 Ajit Mohapatra <ajit@hep.wisc.edu <mailto:ajit@hep.wisc.edu>>:
>
> Dear Devdatta,
>
> It looks like you are submitting jobs from login01, right ? Can you
> please provide some details about the nature of problems you are
> experiencing i.e. the issue is with job submission/running/crashing
> etc. and the log files that contains relevant info ? That would help
> us debug/pinpoint the issue and help resolve.
>
> Thanks,
> - Ajit
>
> DearAjit,
>
> I am facing some problems in submitting jobs to the Wisconsin
> grid. I am
> trying to run gen+sim+digi+raw+hlt chain and then the rawToReco
> chain on
> some ascii datafile.
>
> Can I have some help in getting the jobs done? I have run the
> jobs in
> interactive modes, and the outputs are fine, but whenever I try to
> submit the jobs to the Grid using condor scripts, I run into
> trouble.
> This has been happening only in the past few days,. Before that
> it was fine.
>
>
> My datafile is in :
> machine: login01:
> /scratch/devdatta/BaurAsciiFiles/lhc_Wp_VeryLowCuts/baurWp_lhcVeryLowCuts.asci
>
> gen+sim+digi+raw+hlt cfg file :
> /home/devdatta/CMSSW_2_0_9/src/GeneratorInterface/BaurWgamInterface/test/genSimDigiRawHLTBaurWmunuGamma.cfg
>
> rawToreco file :
> /home/devdatta/CMSSW_2_0_9/src/GeneratorInterface/BaurWgamInterface/test/raw2recoBaurWmunuGam.cfg
>
> So the gen+sim+digi+raw+hlt cfg file runs on
> baurWp_lhcVeryLowCuts.ascii
> and the output is the input to raw2recoBaurWmunuGam.cfg whose
> output is
> the final processed data.
>
> Sridhara asked me to contact you quite some back if I needed
> help, but I
> was managing fine with small datasets. If you could please help, it
> would be very nice.
>
> Best wishes,
>
> Devdatta.
>
> ----------
> group: IT
> messages: 14499
> nosy: ajit, dan, dasu, rader, radtke, wcmaier
> priority: triage
> status: unread
> title: Problem with condor job submission]
>
> ______________________________________
> UW-HEP Help System <help@hep.wisc.edu <mailto:help@hep.wisc.edu>>
> <https://help.hep.wisc.edu/issue5355>
> ______________________________________
>
>
> |