博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
RA users Guide
阅读量:4197 次
发布时间:2019-05-26

本文共 24776 字,大约阅读时间需要 82 分钟。

RA users Guide

This guide has information about running on the CSM

HPC resource RA.MINES.EDU.

RA user Agreement

RA users agree to the following:

  1. I will not give my password to anyone.
  2. I will not store it on any machine in plain text.
  3. I will not use a password that I use on another machine.
  4. I will not reuse the password or ssh keys that I had before September 10, 2009
  5. I will not use a blank passphrase for any ssh key that I create to go to/from RA.
  6. I will not allow anyone else to use my account.
  7. I will not login to RA from someone else's account.
  8. I understand that I am responsible for my own data backups.

Copies of this agreement can be found at

and

Usage Policy

Ra is that is a collection of nodes with each node containing 8 computing cores. The 8 compute cores on one node share the same memory. Memory is not shared across nodes.

Ra is designed to primarily run distributed memory applications. In distributed memory applications there are a collection of processes or tasks running on individual computing cores or processors. That is, each task in an application runs a separate copy of the same program and has its own memory. The various tasks of the the application communicate via message passing. The normal method for tasks to pass messages is to use calls from the Message Passing Interface () library. The link has a nice list with documentation for the calls of the MPI library.

It is also possible to write programs on Ra that exploit the feature that the 8 compute cores on a node share memory. One method of writing such applications is to use threads. The OpenMP package is available on Ra to facilitate writing threaded applications. " is a portable, scalable model that gives shared-memory parallel programmers a simple and flexible interface for developing parallel applications for platforms ranging from the desktop to the supercomputer."

While it is possible to force restrictions on users as far as memory and process count such restrictions can cause undesired effects.

We have been relying on users to observe the following rules:

  • Do not run parallel applications on the RA frontend, either OpenMP or MPI.
  • If you need to run parallel interactive jobs reserve a node for doing so as discussed in the RA User's Guide in the Running Parallel Interactive Jobs section.
  • Do not run memory intensive or long applications on the front end.
  • The front end is designed primarily for edits and compiles.
  • As a general guideline, if we notice your application running than it might be a problem. If we notice your application repeatedly then it is a problem. Any application that is taking too much resources will be killed.
  • For the benefit of all users, repeat offenders will lose access.
  • Do not create large data sets in your home directory. Large data sets should only be created in /lustre/scratch

File System Overview and Usage

There are two parallel file systems on RA, /lustre/home and /lustre/scratch. As the name implies, /lustre/home, contains users home directories. Every user also has a directory in /lustre/scratch. Both file systems are available across all nodes.

/lustre/home
Small application builds, scripts, and small data sets
Community codes in /lustre/home/apps
/lustre/scratch
The primary location for running applications and storing larger data sets

/lustre/scratch is about 10 times larger than /lustre/home. So /lustre/scratch should be used for running applications. /lustre/scratch is also potentially faster.

Backups

/lustre/scratch is not backed up. It is too large for backups to be practical. Users are responsible for backing up their own data.

Access to RA

Access to Ra is via the command:

ssh ra.mines.edu

The only way to access Ra is by using ssh. Unix and Unix like operating systems, (OSX, Linux, Unicos...) have ssh built in. If you are using a Windows based machine to access RA then you must use a terminal package that support ssh, such as putty availabe from http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html.

We have a description of how to connect to RA using ssh key-based access from both Unix and Windows based machines at: http://geco.mines.edu/ssh. Ssh key-based access will enable you to log into RA all day while only typing a phase phrase one time.

Quick Start
Testing your environment

Testing your environment is on RA is just a matter of building and running parallel program. The method is the same as for RA and Mio. For a Copy & Paste guide to building and running parallel mpi applications see: . Unlink RA, Mio does not require account numbers as part of your script.

If at any time you feel your environment is not working properly the first test is to try is to run through the quick start guide given above.

Building MPI programs

The commands for compiling MPI programs are:

C programs:
mpicc
C++ programs:
mpiCC
Fortran 90 programs:
mpif90
Fortran 77 programs:
mpif77

The MPI compilers are actually scripts that call "normal" C, C++, or Fortran compilers adding in the MPI include files and libraries. Thus, any compile options that you would normally pass to the regular compilers can be added to the MPI compile lines. For example, -O3, provides a good level of optimization.

The guide has a make file that shows how the compilers are called within "make."

Running MPI Programs

The machine "RA" is actually a collection of roughly 268 individual nodes. Each node is a complete computer in its own right. In turn, each node contains 8 compute cores. The compute cores perform the computation. All of the cores in a node see the same memory. Cores in different nodes do not share memory. Instead, they communicate with each other by passing messages using the Message Passing Interface (MPI) library.

When you login to RA you are logging into the "head" node. The head node should not be used to run parallel applications. Instead, you run a script that requests one or more of the other "compute" nodes of RA. The script then runs your program on the requested nodes.

All parallel applications, including MPI programs, must be run on compute nodes using the batch queuing system. Do not run a MPI application on the RA head node. This could hang the node, requiring a reboot, potentially killing other peoples jobs. This will annoy others. People running parallel on the head node may have their accounts suspended.

The guide has a fairly complete example of compiling and running a parallel MPI program. We will summarize here.

MPI programs are built using the MPI compilers as discussed above. After they are build the programs are run by using a script. A script tells the number of nodes required and the program to run and the maximum run time for the program. The script is then submitted to the batch queuing system. The batch queuing system schedules a job to run on the number of nodes requested. The parallel job will not run until there are a sufficient number of nodes available to run the program. Often, small node count jobs with a short maximum run time will run sooner than large, long jobs. The scheduler may "fit in" a number of small short jobs on nodes it is setting aside for larger long jobs.

Batch scripts are just normal shell scripts and can have almost any normal shell script command. The lines in the batch scripts that begin with #PBS are special comments that are interpreted by the job scheduler.

Here is a simple example script. It requests a single node that contains 8 cores, nodes=1:ppn=8, for 10 minutes, walltime=00:10:00. The MPI program hello_mpi contained in the same directory as the script will be run on 8 processors, mpiexec -np 8.

#!/bin/bash#PBS -l nodes=1:ppn=8#PBS -l walltime=00:10:00 #PBS -W x=NACCESSPOLICY:SINGLEJOB#PBS -N testIO #PBS -o stdout #PBS -e stderr #PBS -r n #PBS -V #----------------------------------------------------- #Go to the directory that contains this scriptcd $PBS_O_WORKDIR mpiexec -np 8 ./hello_mpi

The option, #PBS -V is important. It causes the environment that you have defined on RA to be exported to the compute nodes. Without this set most programs will not work properly.

The option #PBS -N testIO gives the job a name which can be seen in the commands that check job status.

Under some circumstances the job scheduler will try to put multiple jobs on a single node. The option #PBS -W x=NACCESSPOLICY:SINGLEJOB prevents this.

After the program runs the output will be put in the file stdout and any error information will be put in the file stderr.

Assuming the name of the script is myscript you submit it using the command

qsub myscript

This will return a job number. You can see the status of your job by running the command

qstat "job number"

Where "job number" is the numerical part of the output of the qsub command. Qstat will show:

Q
Waiting to run
R
The job is running
C
The job is finished
E
This is not seen very often but indicates
that a job is in between one of the other
states. It does not indicate an error.

Building OpenMP (Threaded) Programs

OpenMP is the de-facto standard for parallel programming on shared memory systems using threading. The cores on individual nodes of RA and Mio share memory so OpenMP can be used to do node level parallelism, that is across the 8 (or up to 12 on Mio) cores on a node. For more information on OpenMP see: .

Compiling an OpenMP program requires a command line option that is specific to the compiler vendor as shown below.

Intel:

All Intel compilers use the -openmp option to enable OpenMP. For example:

C
icc -openmp
C++
icpc -openmp
Fortran
ifort -openmp

Portland Group:

All Portland Group compilers use the -mp option to enable OpenMP. For example:

C
pgcc -mp
C++
pgCC -mp
Fortran 90
pgf90 -mp
Fortran 95
pgf95 -mp

Running OpenMP (Threaded)

All parallel applications, including OpenMP programs, must be run on compute nodes using the batch queuing system. Do not run a threaded application on the RA head node. This could hang the node, requiring a reboot, potentially killing other peoples jobs. This will annoy others. People running parallel on the head node may have their accounts suspended.

The environmental variable OMP_NUM_THREADS controls the number of threads used by an OpenMP program. This variable should be set in your script. For example the following script can be used to run the OpenMP program hello_omp using 4 threads.

#!/bin/bash#PBS -l nodes=1:ppn=8#PBS -l walltime=00:10:00 #PBS -W x=NACCESSPOLICY:SINGLEJOB#PBS -N testIO #PBS -o stdout #PBS -e stderr #PBS -r n #PBS -V #----------------------------------------------------- #Go to the directory that contains this scriptcd $PBS_O_WORKDIR #Set the number of threads to use to 4export OMP_NUM_THREADS=4#Run my program using 4 threads./hello_omp

Note that we do not use the mpiexec command to run OpenMP programs. mpiexec is normally only used for MPI programs.

Here is a "Hello World" program written in Fortran using OpenMP. The program writes the "Thread Number" for each thread.

program hello    implicit none    integer OMP_GET_MAX_THREADS,OMP_GET_THREAD_NUM!$OMP PARALLEL!$OMP CRITICAL    write(*,fmt="(a,i2,a,i2)")" thread= ",OMP_GET_THREAD_NUM(), &                              " of ",     OMP_GET_MAX_THREADS()!$OMP END CRITICAL!$OMP END PARALLELend program

The compile line for this program would be:

ifort -openmp hello_omp.f90 -o hello_omp

Next we have a slightly more complicated version of the script shown above. This script will run the program 4 times using 1, 2, 4, and 8 threads, setting OMP_NUM_THREADS in a loop and then running the program.

#!/bin/bash#PBS -l nodes=1:ppn=8:compute#PBS -l walltime=00:10:00 #PBS -W x=NACCESSPOLICY:SINGLEJOB#PBS -N testIO #PBS -o stdout #PBS -e stderr #PBS -r n #PBS -V #----------------------------------------------------- #Go to the directory that contains this scriptcd $PBS_O_WORKDIR #Save a nicely sorted list of nodes sort -u $PBS_NODEFILE  > mynodes.$PBS_JOBID #Run my program 4 times using 1, 2, 4, and 8 threadsfor NT in 1 2 4 8 ; do  export OMP_NUM_THREADS=$NT  echo OMP_NUM_THREADS=$OMP_NUM_THREADS#Run my program using threads  ./hello  echodone

We would use the qsub command to run this script. For example

qsub myscript

After the script runs we would get the output:

OMP_NUM_THREADS=1 thread=  0 of  1OMP_NUM_THREADS=2 thread=  0 of  2 thread=  1 of  2OMP_NUM_THREADS=4 thread=  0 of  4 thread=  1 of  4 thread=  2 of  4 thread=  3 of  4OMP_NUM_THREADS=8 thread=  0 of  8 thread=  1 of  8 thread=  2 of  8 thread=  3 of  8 thread=  4 of  8 thread=  7 of  8 thread=  5 of  8 thread=  6 of  8

Mapping of parallel tasks to nodes

The scripts discussed under the and sections assume that you want to have a single MPI task running on each core. In some cases you might have other mappings of tasks to cores for example you might want:

  • Only 2 or 4 MPI tasks on a node
  • Different numbers of tasks on each node
  • Different MPI source programs running on different cores (MPMD)
  • A hybrid MPI/OpenMP program with less than N MPI tasks per node

This type of operation is supported but doing the mappings from within a script can be a bit tricky. We have created a script match which makes such mappings of tasks to cores easier. The script is documented .

Additional advanced scripting techniques are discussed in the next section.

Some Advanced Scripts

TBD

We have a presentation on some advanced scripting techniques . It will be expanded shortly.

Requesting specific types of nodes

Ra has 184 nodes with 16 Gbytes per node and 84 nodes with 32 Gbytes of memory. To run only on the nodes that have 32 Gbytes add the option :fat to line in your script that contains the number of nodes you are requesting. For example if you want two nodes with 32 Gbytes the line in your batch script:

#PBS -l nodes=2:ppn=8

becomes

#PBS -l nodes=2:ppn=8:fat

Also, there are two types of "fat" nodes pe1950 and pe6850. They have different processor types. (All of the thin nodes are pe1950 nodes but with less memory.) Most programs will run on either the pe1950 and pe6850 nodes. If you receive an error message similar to:

Fatal Error: This program was not built to run on the processor in your system. The allowed processors are: Intel(R) Core(TM) Duo processors and compatible Intel processors with supplemental Streaming SIMD Extensions 3(SSSE3) instruction support.

then the program must be run on the pe1950 nodes. To force your program to run on pe1950 nodes the line in your batch script would be of the form:

#PBS -l nodes=2:ppn=8:pe1950

or

#PBS -l nodes=2:ppn=8:pe1950:fat

Local Disk Space

Users should not write to the /tmp directory. Each one of the compute nodes has a local disk which is writable by all /state/partition1. Please use /state/partition1 instead of /tmp for temporary files. These temporary files should be deleted at the end of your job as part of your pbs script. Note that you can not see these temporary files from Ra. So if you actually want to keep these files they must also be copied from the compute nodes to Ra as part of your pbs script. Click to see a program and pbs script that creates files in /state/partition1 and then moves them to the working directory

The amount of space in /state/partition1 depends on the type of node as shown in the chart below.

Node
type
Size (Gbytes) of
/state/partition1
pbs option to select node
thin 1950 37 #PBS -l nodes=2:ppn=8:pe1950:thin
fat 1950 21 #PBS -l nodes=2:ppn=8:pe1950:fat

Queue Times

There are several available queues on Ra. You do not normally specify a particular queue in which to run your jobs. This is done automatically by the amount of memory you request and the time limit. The limits are set in your runs script. The maximum time you can request for a job is 6 days or 144 hours. However, there is a limit on the number of jobs that can be running on the machine with requested time over 2 days or 48 hours. So if you submit a job for over 48 hours and there are already a number of jobs running with requested times over 48 hours your job may not run until the other jobs finish. It it normally better to not submit jobs for over 48 hours.

Queue Related Commands

The web pages:

show the state of the nodes on RA and the jobs in the queue.

Command Description
qsub submit jobs
canceljob cancel job
qdel delete/cancel batch jobs
checkjob provide detailed status report for specified job
checkjob -v show why a job will not run on specific nodes
releasehold release job defers and holds
releaseres release reservations
sethold set job holds
showq show queued jobs
showres show existing reservations
showstart show estimates of when job can/will start
showstate show current state of resources
tracejob trace job actions and states recorded in batch logs
pbsnodes view/modify batch status of compute nodes
qalter modify queued batch jobs
qhold hold batch jobs
qrls release batch job holds
qsig send a signal to a batch job
qstat view queues and jobs

Compilers Documentation

Compiler Command Turn off Optimization "Good" Optimization Turn on OpenMP
Intel Fortran -O0 -O3 -openmp
Intel C -O0 -O3 -openmp
Intel C++ -O0 -O3 -openmp
Portland Group Fortran -O0 -fast -mp
Portland Group C -O0 -fast -mp
Portland Group C++ -O0 -fast -mp
Click on the "command" to see a HTML version of the man page
Intel Compilers Full Documentation
Portland Group Compilers Full Documentation

The default Intel and Portland Group compilers on RA are rather old. You can set your environment to use the newest compiler versions by adding the following lines to your .bashrc file or .cshrc file.

Bash shell users add to .bashrc
source /opt/pgi/linux86-64/2012/pgi.sh
source /lustre/home/apps/compilers/intel/bin/compilervars.sh intel64
C shell users add to .cshrc
source /opt/pgi/linux86-64/2012/pgi.csh
source /lustre/home/apps/compilers/intel/bin/compilervars.csh intel64

Command Line Options for Debugging

There is an article that describes a number of options that are available for debugging programs without using debuggers. This includes compiler and subroutine options for tracebacks and run time checking.

Source Level Debugging

The standard Unix debugger gdb is available in /lustre/home/apps/gdb-6.8

The Intel and Portland Group debuggers are also available on Ra. The Portland Group is pgdbg and the name of the Intel debugger is idb. See the compiler documentation pages for more .

Runtime Error Messages

We also have a direct link to the Intel Fortran Run-time error messages .

A direct link to the OpenMPI runtime error codes is also available .

Exclusive access to nodes

The default behavior of the queueing system on Ra is to fill unused processors. If you submit a job that uses less than 8 processors per node than additional jobs might be scheduled on your nodes. To force your exclusive access to your nodes add one of the following options to your batch script:

#PBS -W x=NACCESSPOLICY:SINGLEJOB
Allows only one job on a node
#PBS -W x=NACCESSPOLICY:SINGLEUSER
Allows more than one job on a node but only by a single user

Running Parallel Interactive jobs

Running a parallel interactive job is a two step process. You first run a qsub command to request interactive nodes. After some time you will be connected (logged in) to an interactive node. You then "cd" to the directory that contains your executable and run it with an mpiexec command.

We have an example below. The text in red is what is typed into the terminal window. We will run a simple "Hello World" MPI example. The source for the example can be obtained from geco.mines.edu/guide/guideFiles/c_ex00.c

In our qsub command we request 1 node. This will give us 8 computational cores to use while running our parallel program. After we enter the qsub command we will get back a ready message. Next, we then enter a mpiexec command, specifying the number of MPI tasks using the "-n" option. After our job finished we are free to run additional jobs. Note in the second case we specified 16 MPI tasks, even though we only have asked for 8 computational cores. This is legal and it might be useful in cases where you are just checking the correctness of an algorithm and don't care about performance.

After we are done with our runs we type exit to logout and release the nodes.

[tkaiser@ra ~/guide]$qsub -q INTERACTIVE -I -V -l nodes=1:ppn=8 \-W x=NACCESSPOLICY:SINGLEUSER -l walltime=00:15:00qsub: waiting for job 1280.ra.mines.edu to startqsub: job 1280.ra.mines.edu ready[tkaiser@compute-9-8 ~]$cd guide[tkaiser@compute-9-8 ~/guide]$mpiexec -n 4 c_ex00Hello from 0 of 4 on compute-9-8.localHello from 1 of 4 on compute-9-8.localHello from 2 of 4 on compute-9-8.localHello from 3 of 4 on compute-9-8.local[tkaiser@compute-9-8 ~/guide]$mpiexec -n 16 c_ex00Hello from 1 of 16 on compute-9-8.localHello from 0 of 16 on compute-9-8.localHello from 3 of 16 on compute-9-8.localHello from 4 of 16 on compute-9-8.localHello from 5 of 16 on compute-9-8.localHello from 8 of 16 on compute-9-8.localHello from 9 of 16 on compute-9-8.localHello from 10 of 16 on compute-9-8.localHello from 7 of 16 on compute-9-8.localHello from 11 of 16 on compute-9-8.localHello from 12 of 16 on compute-9-8.localHello from 13 of 16 on compute-9-8.localHello from 14 of 16 on compute-9-8.localHello from 15 of 16 on compute-9-8.localHello from 2 of 16 on compute-9-8.localHello from 6 of 16 on compute-9-8.local[tkaiser@compute-9-8 ~/guide]$exitlogoutqsub: job 1280.ra.mines.edu completed[tkaiser@ra ~/guide]$

You can also request more than on node. Then cat $PBS_NODEFILE to see which nodes you wer given. For example:

[tkaiser@ra ~/guide]$qsub -q INTERACTIVE -I -V -l nodes=2:ppn=8 \ -W x=NACCESSPOLICY:SINGLEUSER -l walltime=00:15:00qsub: waiting for job 91420.ra5.local to startqsub: job 91420.ra5.local ready[tkaiser@fatcompute-12-2 ~]$cat $PBS_NODEFILE | sort -ufatcompute-12-1fatcompute-12-2[tkaiser@fatcompute-12-2 ~]$exit

The options -W x=NACCESSPOLICY:SINGLEUSER -l walltime=00:15:00 in the above commands ensure that you have sole access to the node for 15 minutes.

Tutorials Link

The link leads to a number of recent tutorials on HPC and running on RA

Changing your login shell

The default login shell on Ra is /bin/bash. If you would like to change your shell you can use the command chsh. To see a list of the available shells type chsh --list-shells. To change your shell type chsh. You will be prompted for your password and the path to your new shell.

Common Problems and Questions

To be completed

Changing you MPI Version

There are several versions of MPI available on RA. The default version, OpenMPI 1.41, built with the Intel 11.1 compiler is actually rather old. You can expect better performance using the newer version, OpenMPI 1.6, built with version 12.1 of the Intel compiler.

The command mpi-selector can be used to set your MPI version. Using the mpi-selector command is a simple, but multistep process. First, edit your .bashrc file to remove any reference to old versions of MPI. Then run the command mpi-selector --list. You should see a list similar to the one shown below.

[joeuser@ra5 bin]$ mpi-selector --listintel_4.0.3_intel_12.1mvapich2_gnu-1.4.1mvapich2_intel-1.4.1mvapich2_pgi-1.4.1openmpi-1.3.2-gcc-i386openmpi-1.3.2-gcc-x86_64openmpi_1.6_intel_12.1ra5_openmpi_gnu-1.4.1ra5_openmpi_intel-1.4.1ra5_openmpi_intel_debug-1.4.1ra5_openmpi_pgi-1.4.1ra5_openmpi_pgi_debug-1.4.1[joeuser@ra5 bin]$

This gives a list of the versions of MPI available. Select one. Unless there is a reason not to do so, the version should be ra5_openmpi_intel-1.4.1 or openmpi_1.6_intel_12.1. To select the new version of MPI Run the command mpi-selector --set openmpi_1.6_intel_12.1 as shown below.

[joeuser@ra5 bin]$ mpi-selector --set openmpi_1.6_intel_12.1Defaults already exist; overwrite them? (y/N) y[joeuser@ra5 bin]$

Then log out. The next time you log back in rerun the which mpicc command to check that you now have the MPI environment available. Your parallel programs should be rebuilt using your new version of MPI.

If you select openmpi_1.6_intel_12.1 as your MPI version you will automatically get the Intel version 12 compiler. You can check your MPI and base compiler versions as shown below.

[joeuser@ra5 ~]$ mpicc --showme:versionmpicc: Open MPI 1.6 (Language: C)[joeuser@ra5 ~]$ icc -VIntel(R) C Intel(R) 64 Compiler XE for applications running on Intel(R) 64,Version 12.1.4.319 Build 20120410Copyright (C) 1985-2012 Intel Corporation.  All rights reserved.[joeuser@ra5 ~]$ mpif90 --showme:versionmpif90: Open MPI 1.6 (Language: Fortran 90)[joeuser@ra5 ~]$ ifort -VIntel(R) Fortran Intel(R) 64 Compiler XE for applications running on Intel(R) 64,Version 12.1.4.319 Build 20120410Copyright (C) 1985-2012 Intel Corporation.  All rights reserved.

Programs linked with the old version of MPI will not work with the new version of mpiexec. You can run old programs if you specify the full path to old mpiexec command, /opt/ra5_openmpi_intel/1.4.1/bin/mpiexec.

转载地址:http://brkli.baihongyu.com/

你可能感兴趣的文章
yii2 redis add password 密码验证
查看>>
php 链接副本集
查看>>
java - hello world
查看>>
Linux CentOS Tomcat修改默认端口 -
查看>>
安装tomcat
查看>>
亿级Web系统搭建——单机到分布式集群
查看>>
mongodb 复制集
查看>>
yii2 mongodb 连接 mongo 副本集模式(复制集) 配置
查看>>
Strace 追踪 php 模拟页面执行,打印log
查看>>
yii2 strace 追踪 某个执行的url
查看>>
yii2 strace 追踪正在执行的进程
查看>>
安装pear
查看>>
php 5.4 安装 pthreads
查看>>
php pthreads 获取货运号代码实例
查看>>
Workerman
查看>>
swoole
查看>>
MySQL Proxy
查看>>
redis 消息队列
查看>>
MySQL集群的几种方案
查看>>
MySQL的information_schema的介绍
查看>>