Running ABAQUS Explicit Analysis using Clusters

 

Dear All,

I was running a long explicit analysis using cluster. My simulation may require 400 hrs to complete the whole analysis but I can set maximum walltime of 150 hrs in cluster. After waiting for 150 hrs I found that, analsyis is getting stopped due to unavailable cluster time. I would like to continue the explicit analsyis from 150 hrs, where it had acually stopped due to unavailable cluster time. How can I continue
simulation?
I tried to run the same explicit analysis from command promt using suspend and resume command and it works fine. I also tried the same but using different node e.g suspend the job from one node and resume the the job from another node but it doesn't work. I found in abaqus documentation that recover command is generally used for restarting the same job but somehow this doesn't work for me. Here i have attached my script file for running an abaqus job

start.qsub script file

#!/bin/sh -login
#PBS -l nodes=1:ppn=1,walltime=00:10:00
#PBS -j oe
#PBS -W x=gres:explicit:5%abaqus:5

cd $PBS_O_WORKDIR

inputfile="Job-Dynamic-Model"

# Automatically calculate the number of processors
np=$(cat $PBS_NODEFILE | wc -l)

module unload mvapich
module load abaqus_parallel

#Make a temporary scratch space (this should be on /mnt/scratch)
scratch=/mnt/scratch/${USER}/${PBS_JOBID}
export TMPDIR=$scratch
mkdir -p $scratch

# Change to the working directory
cd ${PBS_O_WORKDIR}

# Run abaqus
abaqus job=$inputfile recover cpus=$np interactive &
PID=$!
sleep 600

# Remove scratch space
rm -rf $scratch

 

restart.qsub script file

#!/bin/sh -login
#PBS -l nodes=1:ppn=1,walltime=00:10:00
#PBS -j oe
#PBS -W x=gres:explicit:5%abaqus:5

cd $PBS_O_WORKDIR

inputfile="Job-Dynamic-Model"

# Automatically calculate the number of processors
np=$(cat $PBS_NODEFILE | wc -l)

module unload mvapich
module load abaqus_parallel

#Make a temporary scratch space (this should be on /mnt/scratch)
scratch=/mnt/scratch/${USER}/${PBS_JOBID}
export TMPDIR=$scratch
mkdir -p $scratch

# Change to the working directory
cd ${PBS_O_WORKDIR}

# Run abaqus
# abaqus restartjoin  originalodb=odb-file-name
#                     restartodb=odb-file-name
#                     [copyoriginal] [history] [compressresult]

abaqus job=${inputfile} recover
echo "sleeping"
sleep 600
echo "done sleeping"
#abaqus terminate job=$inputfile

#qsub restart.qsub

# Remove scratch space
rm -rf $scratch

Any suggestion is greatly appreciated. Thank you in advance.

Thanks

Dr. Parimal Maity

Michigan State University

ME