Distributed Computing User Tutorial
Estimated reading time: 4 minutesPreface
This is a tutorial for CEPC-DIRAC grid computing system. Preliminary knowledge of Linux and CEPC v1 are need. If you need help, please contact ZHAO Xianghu.
Step 1. Request a certificate and join CEPC VO
Go to webpage https://cagrid.ihep.ac.cn to request a certificate from IHEP CA. Here attached a detailed operation guide with screenshot How to Request Certificate.pdf.
It will take 2 or 3 days waiting for approval.
When your application is approved, you will receive an email from ihepca@ihep.ac.cn with the serial number and DN. Then you can import the certificate to your browser, and then save it to local disk as a .p12 file (e.g. yourCertificate.p12). Here attached a detailed operation guide with screenshot How to Get Certificate after Approval.pdf.
Use the following command to generate userkey.pem and usercert.pem from the .p12 file and place it at $HOME/.globus,
$ openssl pkcs12 -in yourCertificate.p12 -out userkey.pem -nocerts
$ openssl pkcs12 -in yourCertificate.p12 -out usercert.pem -nokeys -clcerts
and then change their permision to 400 and 600.
When you save the .p12 file and generate .pem files, you will be informed to set password. Please write down your password somewhere for future use.
The PEM password you set during generation of userkey.pem and usercert.pem, will be used later when you setup grid environment for submitting job.
With the certificate in webbroser, you can go to https://voms.ihep.ac.cn:8443/voms/cepc/ and following the guidelines to join the CEPC VO.
Step 2. Setup Environment
Run the following command to setup the environment:
source /cvmfs/dcomputing.ihep.ac.cn/frontend/dsub/setup/env_dsub.sh
If you use tcsh, the command should be
source /cvmfs/dcomputing.ihep.ac.cn/frontend/dsub/setup/env_dsub.csh
You will be informed to input the PEM password.
Step 3. Prepare a job configuration file
A job configuration(cfg) file is a normal text file contain paramters definition for your job. You can take a look at the example job cfg file by:
vim $DSUBDOC/dsub-example/job.cfg
Here printed the context of this file without comments:
job_type = cepc_sr
repo_dir = ./repo
work_dir = ./work
input_filelist = ./stdhep.list
output_dir = test_001
evtmax = 10
job_group = 150116_CEPC_test_001
The comments in this file will explain the meanings of these parameters and how to modify their values to fit your situation.
Step 4. Submit jobs
Once you have parepared a job cfg file. Run
dsub job.cfg
to submit jobs.
As an simple example, you can do the first test according to the following steps:
cp -r $DSUBDOC/dsub-example .
cd dsub-example
dsub job.cfg
It will submit 5 jobs with 10 events/job, the input stdhep file is listed in the file stdhep.list.
Step 5. Monitor job status
By using the web brower with your certificate, go to https://dirac.ihep.ac.cn/, from the menu bar choose “Applications”–> “Task Manager” to show your tasks. For detailed jobs choose menu “Job Monitor” to show your jobs. You can use the job group to select you concerned jobs.
Step 6. Get output data
Currently, the output data will be written to
/cefs/tmp_storage/yant/gridfs/cepc/user/<initial>/<username>/<output_dir>/sim
/cefs/tmp_storage/yant/gridfs/cepc/user/<initial>/<username>/<output_dir>/rec
It’s readable by all AFS users in physics or higgs group. You can directly read them in your analysis jobs. However, old files in these directory will be removed regularly for saving disk space. So, please copy the data to your bakup directory in time.
Step 7. Debugging and Get job logs
When job is failed, you can select it and click “Reschedule” button on up-right corner. Occasional error will be solved by rescheduling.
If you see “job.py Exited with status
- 10 preparation error
- 11 cvmfs not found
- 20 simulation error
- 21 DB connection failed
- 30 reconstruction error
You can use the following command to get the job logs
getlog <jobID>
getlog <jobID1> <jobID2> <jobID3> ...
getlog -g <job_group>
The logs are useful during debugging. For example, if simulation error occurs, you can check simu.log.