Oscar User Manual

Welcome to CCV's user manual!

This manual is primarily a guide for using Oscar, a large compute cluster maintained by CCV for use by Brown researchers. All new users are recommended to read through the "Getting Started" page.

Conventions

We use angular brackets to denote command-line options that you should replace with an appropriate value. For example, the placeholders <user> and <group> should be replaced with your own user name and group name. The $ sign at the beginning of commands represents the command prompt. Do not copy that while copying commands to your shell.

Getting Started

This guide assumes you have an Oscar username and password. To request an account see create an account

Oscar

Oscar is the CCV cluster. It has two login nodes and several hundred compute nodes. The login nodes are shared between all users of the system. Running computationally intensive or memory intensive programs on the login node slows down the system for all users. Any processes taking up too much CPU or memory on a login node will be killed.

You can use the login nodes to compile your code, manage files, and launch jobs on the compute nodes. Do not run Matlab on the login nodes.

Oscar runs the Linux operating system. General Linux documentation is available from The Linux Documentation Project. We recommend you read up on basic Linux commands before using Oscar. CCV users can mount the Oscar filesystem on their own Windows, Mac, or Linux system using CIFS.

Connecting to Oscar for the first time

To log in to Oscar you need Secure Shell (SSH) on your computer. Mac and Linux machines normally have SSH available, Windows users need to install an SSH client. We recommend PuTTY, a free SSH client for Windows. To login in to Oscar, open a terminal and type

ssh <username>@ssh.ccv.brown.edu

where is your CCV account user name. The first time you connect to Oscar you will see a message like:

The authenticity of host 'ssh.ccv.brown.edu (138.16.172.8)' can't be established.
RSA key fingerprint is SHA256:Nt***************vL3cH7A.
Are you sure you want to continue connecting (yes/no)? 

You can type yes . You will be prompted for your password (the one you received via text message when you set up the CCV account). Note nothing will show up on the screen when you type in your password, just type it in and press enter. You will now be in your home directory on Oscar. In your terminal you will see something like this:

mycomputer:~ mhamilton$ ssh mhamilton@ssh.ccv.brown.edu
mhamilton@ssh.ccv.brown.edu's password: 
Last login: Thu Nov  3 15:41:04 2016 from ssh1-int.oscar.ccv.brown.edu
Welcome to Oscar! This login node is shared among many users: please be
courteous and DO NOT RUN large-memory or compute-intensive programs here!
In particular, do not run MATLAB jobs here. They will be automatically killed.
Instead, submit a batch job or start an interactive session with 'interact'.

For help using this system, please search our documentation at
http://www.brown.edu/ccv/doc  (or contact 'support@ccv.brown.edu')

The per-user limit for the Oscar queue is currently:
 192 cores for premium accounts
  16 cores for exploratory accounts



module: loading 'centos-libs/6.5'
module: loading 'centos-updates/6.3'
module: loading 'intel/2013.1.106'
module: loading 'java/7u5'
module: loading 'python/2.7.3'
module: loading 'perl/5.18.2'
module: loading 'matlab/R2014a'
[mhamilton@login001 ~]$ 

Congratulations, you are now on one of the Oscar login nodes.

Note: Please do not run computations or simulations on the login nodes, because they are shared with other users. You can use the login nodes to compile your code, manage files, and launch jobs on the compute nodes.

Passwords

The password texted to you when the account is created is for both your Oscar login (SSH) and CIFS .
You can use the commands below to change your passwords. Note: When you change one of your passwords, the other password is not affected.

To change your Oscar login password, use the command:

$ passwd

You will be asked to enter your old password, then your new password twice.

To change your CIFS password, use the command:

$ smbpasswd

Note if you ask for a password reset from CCV, both the SSH and CIFS password will be reset.

Password reset rules:

  • minimum length: 8 characters
  • should have characters from all 4 classes: upper-case letters, lower-case letters, numbers and special characters
  • a character cannot appear more than twice in a row
  • cannot have more than 3 upper-case, lower-case, or number characters in a row
  • at least 3 characters should be different from the previous password
  • cannot be the same as username
  • should not include any of the words in the user's "full name"

File system

Users on Oscar have three places to store files.

  • home
  • scratch
  • data

Note guest and class accounts may not have a data directory. Users who are members of more than one research group may have access to multiple data directories.

To see how much space you have you can use the command myquota. Below is an example output

                   Block Limits                              |           File Limits              
Type    Filesystem           Used    Quota   HLIMIT    Grace |    Files    Quota   HLIMIT    Grace
-------------------------------------------------------------|--------------------------------------
USR     home               8.401G      10G      20G        - |    61832   524288  1048576        -
USR     scratch              332G     512G      12T        - |    14523   323539  4194304        -
FILESET data+apollo        11.05T      20T      24T        - |   459764  4194304  8388608        -

A good practice is to configure your application to read any initial input data from ~/data and write all output into ~/scratch. Then, when the application has finished, move or copy data you would like to save from ~/scratch to ~/data. For more information on which directories are backed up and best practices for reading/writing files, see Managing Files. You can go over your quota up to the hard limit for a grace period (14days). This grace period is to give you time to manage your files. When the grace period expires you will be unable to write any files until you are back under quota.


Software modules

CCV uses the PyModules package for managing the software environment on OSCAR. To see the software available on Oscar use the command module avail. The command module list shows what modules you have loaded. Below is an example of checking which versions of the module 'workshop' are available and loading a given version.

[mhamilton@login001 ~]$ module avail workshop
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ name: workshop*/* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
workshop/1.0  workshop/2.0  
[mhamilton@login001 ~]$ module load workshop/2.0
module: loading 'workshop/2.0'
[mhamilton@login001 ~]$ 

For a list of all PyModule commands see Software Modules. If you have a request for software to be installed on Oscar, email support@ccv.brown.edu.


Using a Desktop on Oscar

You can connect remotely to a graphical desktop environment on Oscar using CCV's VNC client. The CCV VNC client integrates with the scheduling system on Oscar to create dedicated, persistent VNC sessions that are tied to a single user.

Using VNC, you can run applications and scripting languages like Matlab, Mathematica, Maple, python, perl, or the R statistical package (shown on the left) on our high-performance and large-memory systems. You also have fast access to CCV's local high performance file system so that you can access and analyze large data files without having to copy them to your own system.

For download and installation instructions click here


Running Jobs

You are on Oscar's login nodes when you log in through SSH. You should not (and would not want to) run your programs on these nodes as these are shared by all active users to perform tasks like managing files and compiling programs.

With so many active users, an HPC cluster has to use a software called a "job scheduler" to assign compute resources to users for running programs on the compute nodes. When you submit a job (a set of commands) to the scheduler along with the resources you need, it puts your job in a queue. The job is run when the required resources (cores, memory, etc.) become available. Note that as Oscar is a shared resource, you must be prepared to wait for your job to start running and it can't be expected to start running straight away.

Oscar uses the SLURM job scheduler. Batch jobs is the preferred mode of running jobs, where all commands are mentioned in a "batch script" along with the required resources (number of cores, wall-time, etc.). However, there is also a way to run programs interactively. The more resources you request, the longer your wait time in the queue.

For all the information on how to submit jobs on Oscar, see Running Jobs. There is also extensive documentation on the web on using SLURM. For instance, here is a quick start guide.


Where to get help

CIFS

CCV users can access their home, data and scratch directories as a local mount on their own Windows, Mac, or Linux system using the Common Internet File System (CIFS) protocol (also called Samba). There are two requirements for using this service:

  • An Oscar account with CIFS access enabled (accounts created since 2010 are automatically enabled).
  • Local campus connectivity. Off-campus users can connect after obtaining a campus IP with Brown's Virtual Private Network client, but performance may be degraded.

First users should ensure that the time and date is set correctly on their machine, e.g. for mac 'set time and date automatically'

Use SSH to connect to Oscar to set your CIFS password. Once logged in, run the command:

$ smbpasswd

You will first be prompted for your "old" password, which is the temporary password you were given by CCV when your account was created. Then, enter a new CIFS password twice. You may choose to use the same password here as for your Oscar account.

Note: This command does not change your SSH login password... and changing the SSH login password does not change the CIFS password. The password texted to you when the account is created or when password is reset, is applicable to both the SSH login as well as CIFS but they have to be changed separately henceforth.

Now you are ready to mount your CCV directories locally using the following instructions based on your operating system:


Windows

  • Right-click "Computer" and select "Map Network Drive".
  • Select an unassigned drive letter.
  • Enter \\oscarcifs.ccv.brown.edu\<user> as the Folder.
  • Check "Connect using different credentials"
  • Click "Finish"
  • Enter your CCV user name as "ccv\username" (no quotes)
  • Enter your CIFS password and click "OK".

You can now access your home directory through Windows Explorer with the assigned drive letter. Your data and scratch directories are available as the subdirectories (~/data and ~/scratch) of your home directory.


Mac OS X

  • In the Finder, press "Command + K" or select "Connect to Server..." from the "Go" menu.
  • For "Server Address", enter smb://oscarcifs.ccv.brown.edu/<user> and click "Connect".
  • Enter your username and password.
  • You may choose to add your login credentials to your keychain so you will not need to enter this again.

Optional. If you would like to automatically connect to the share at startup:

  • Open "System Preferences" (leave the Finder window open).
  • Go to "Accounts" > "(your account name)".
  • Select "Login Items".
  • Drag your data share from the "Finder" window to the "Login Items" window.

Linux

  • Install the cifs-utils package:

    CentOS/RHEL:   $ sudo yum install cifs-utils
    Ubuntu:        $ sudo apt-get install cifs-utils
    
  • Make a directory to mount the share into:

    $ sudo mkdir /mnt/rdata
    
  • Create a credentials file and add your CCV account information (use the CIFS password):

    $ sudo gedit /etc/cifspw
    
    username=<user>
    password=<password>
    
  • Allow only root access to the credentials files:

    $ sudo chmod 0600 /etc/cifspw
    
  • Add an entry to the fstab:

    $ sudo gedit /etc/fstab
    

    The fstab entry is the single line:

     `//oscarcifs.ccv.brown.edu/<user> /mnt/rdata cifs credentials=/etc/cifspw,vers=1.0,nounix,uid=<localUser> 0 0`
    

    Change <localUser> to the login used on your Linux workstation.

  • Mount the share:

    $ mount -a 
    

X Forwarding

If you have an installation of X11 on your local system, you can access Oscar with X forwarding enabled, so that the windows, menus, cursor, etc. of any X applications running on Oscar are all forwarded to your local X11 server. Here are some resources for setting up X11:

Once your X11 server is running locally, open a terminal and use

$ ssh -Y <user>@ssh.ccv.brown.edu

to establish the X forwarding connection. Then, you can launch GUI applications from Oscar and they will be displayed locally on your X11 server.

For Windows users using PuTTY, enable X forwarding under Connections->SSH->X11:

Note: the login nodes are shared resources and are provided for debugging, programming, and managing files. Please do not use them for production runs (for example, executing a long-running script in a GUI instance of Matlab). You can use the batch system to submit production runs if your application can be run without a GUI (for example, with matlab -nodisplay).

One limitation of X forwarding is its sensitivity to your network connection's latency. We advise against using X forwarding from a connection outside of the Brown campus network, since you will likely experience lag between your actions and their response in the GUI.

Common problems


VNC problems

Solution:

  1. Note that you have to use the same username and password that you use for SSH'ing to Oscar.
  2. Make sure you are using the latest version from the VNC page: https://web1.ccv.brown.edu/technologies/vnc
  3. Check the Java version on your computer. We recommend java 8 versions. The VNC does not work with java 9 versions.
  4. SSH to Oscar (not through VNC) and run the command "myq" to see if there are any VNC jobs running. If there are, cancel those jobs using the command "scancel <jobID>". Now try using VNC.
  5. SSH to Oscar and run the command "vnc-cleanup" to delete stale VNC files. Now try running VNC.
  6. Make sure you are not over disk quota using the command "myquota".
  7. As a last resort, move your configuration files to a "backup" directory by using commands as follows. And then try using VNC.
mkdir ~/backup                 # Create a new directory to move the files
mv ~/.config/gtk* ~/backup     # Move these config files to the new directory
mv ~/.config/xfce* ~/backup
mv ~/.vnc ~/backup             # They can be moved one at a time while trying to connect through VNC
rm ~/.ICEauthority             # These files can be removed
rm ~/.Xauthority

Disk quota exceeded

This means that either your home, data or scratch directory is over its disk quota limit and the grace period has also expired (or you have hit the highest possible limit too).

The command to monitor your disk quota usage is:

myquota

Example output:

                   Block Limits                              |           File Limits              
Type    Filesystem           Used    Quota   HLIMIT    Grace |    Files    Quota   HLIMIT    Grace
-------------------------------------------------------------|--------------------------------------
USR     home               8.401G      10G      20G        - |    61832   524288  1048576        -
USR     scratch              332G     512G      12T        - |    14523   323539  4194304        -
FILESET data+apollo        11.05T      20T      24T        - |   459764  4194304  8388608        -

There are limits on the amount of data as well as number of files. "Grace" is the grace period you have left after you exceed the quota. After the grace period expires, you will not be allowed to write to files and a "disk quota exceeded" error will be shown. "HLIMIT" is the highest limit which can't be breached even in the grace period.

If you find you are above quota limits, you can delete or compress files to bring down your usage. If your home directory is over limits, you can move files to your data directory. You can also simply ask your PI to get additional storage allocation. Here are the rates.


Your account has expired

We set an expiry date on all student accounts and guest accounts. This is to avoid misuse of those accounts and prevent cluttering up of the system after those users have become inactive or are no longer at Brown.

In case you want to continue using CCV resources and face this error while logging in, simply send us an e-mail at support@ccv.brown.edu and we will reactivate your account. If you have an estimated graduation date, let us know so that we can set the expiry around that time.


GLIBCXX_<version> not found OR cc1plus: error: unrecognized command line option "-std=c++11"

Solution: Run your program after loading a more recent gcc module. For example,

 module load gcc/4.9.2

The default version of gcc on Oscar is 4.4.7 which comes installed as part of the OS.


GLIBC_2.14 not found

Solution: This might happen if you are using pre-compiled binaries which were compiled against glibc version 2.14. The operating system on Oscar (CentOS 6.7) being a bit old, you will have to compile the program again on Oscar. Or, you can ask the developers to provide binaries for CentOS 6.x. If this happens with a program installed by us as a module, kindly contact us and we will update the installation.


Prompted twice for password - Deleted files in ~/.ssh directory by mistake

Solution: Run setup_ssh from command line when logged in to Oscar. This will restore your SSH key files.


How to restore files deleted by mistake?

Solution: See the "Restoring Files" section on the file systems page.


Error while loading shared libraries - Cannot open shared object file - Library not found

Solution: Many programs are compiled with dynamic linking to libraries on which they depend. The dynamic linker called ld looks for these dependencies during run time and complains if it can't find them.

Example:

libmpichcxx.so not found

Here, it is looking for a library which is part of MPI implementations. You can load a module like mvapich2 so that the path to this library is added to LD_LIBRARY_PATH and can be found by the run time linker. It might happen that only a particular version is required. A web search on the library name also might reveal what software package it belongs to.

If you want to know what all shared libraries are required during run time by an executable, you can use the ldd utility. For example:

    $ ldd /gpfs/runtime/opt/paraview/4.2.0/bin/paraview
  linux-vdso.so.1 =>  (0x00007fffecfff000)
  /usr/local/lib/libslurm.so (0x00007f70f8348000)
  libc.so.6 => /lib64/libc.so.6 (0x0000003e18800000)
  libdl.so.2 => /lib64/libdl.so.2 (0x0000003e19000000)
  libpthread.so.0 => /lib64/libpthread.so.0 (0x0000003e19400000)
  /lib64/ld-linux-x86-64.so.2 (0x0000003e18400000)

We generally make it a point to add information about these dependencies to the modules such that the info is displayed while loading modules. If you find a package for which dependencies are not displayed while loading modules or you are not able to figure out where to find the library file, contact us at support@ccv.brown.edu and we will correct the mistake.


Perl module not found

Solution: In most cases, you can check whether a module exists using the command:

 perldoc -l Module::Name

For example,

$ perldoc -l Math::CDF
/gpfs/runtime/opt/perl/5.18.2/lib/site_perl/5.18.2/x86_64-linux-thread-multi/Math/CDF.pm

If it does not exist, you can email us.

Also, chances are that you are using the "wrong" Perl to execute your script. Check the first line of the script. Does it say:

 #!/usr/bin/perl

If yes, it is using the Perl executable /usr/bin/perl instead of the one under "perl/5.18.2" (the module for this is loaded by default on Oscar). This line is called a "shebang". This should be fine as such, but sometimes it may produce errors due to conflict between the Perl versions.

Change this to:

 #!/usr/bin/env perl

so that it uses the first Perl executable in your PATH environment variable which should be the 5.18.2 version.


All other errors

Generally, by doing a web search on the error, you can find out the issue, if not solve it. Do a web search on the actual error message (usually starts with "Error:"). Other information displayed with the message is also important, but including that in the web search might make it too specific and filter out useful results.

Finally, feel free to send us an email at support@ccv.brown.edu if you are not able to resolve the issue.

Managing Files

CCV offers a high-performance storage system for research data called RData, which is accessible as the /gpfs/data file system on all CCV systems. It can also be mounted from any computer on Brown’s campus network using CIFS.

You can transfer files to Oscar and RData through a CIFS mount, or by using command-line tools like scp or rsync.

There are also GUI programs for transfering files using the scp protocol, like WinSCP for Windows and Fugu or Cyberduck for Mac.

Oscar has a Globus endpoint brownccv#transfer. To use Globus on Oscar see Globus Online

You can transfer files between Department File Servers (Isilon)and Oscar with smbclient. To use see Copying files from Department File Servers

Note: RData is not designed to store confidential data (information about an individual or entity). If you have confidential data that needs to be stored please contact support@ccv.brown.edu.


File systems

CCV uses IBM's General Parallel File System (GPFS) for users' home directories, data storage, scratch/temporary space, and runtime libraries and executables. A separate GPFS file system exists for each of these uses, in order to provide tuned performance. These file systems are mounted as:

~
→ /gpfs/home/<user>
Your home directory:
optimized for many small files (<1MB)
nightly backups (30 days)
10GB quota
~/data
→ /gpfs/data/<group>
Your data directory
optimized for reading large files (>1MB)
nightly backups (30 days)
quota is by group (usually >=256GB)
~/scratch
→ /gpfs/scratch/<user>
Your scratch directory:
optimized for reading/writing large files (>1MB)
NO BACKUPS
purging: files older than 30 days may be deleted
512GB quota: contact us to increase on a temporary basis

A good practice is to configure your application to read any initial input data from ~/data and write all output into ~/scratch. Then, when the application has finished, move or copy data you would like to save from ~/scratch to ~/data.

Note: class or temporary accounts may not have a ~/data directory!

To see how much space you have on Oscar you can use the command myquota. Below is an example output

                   Block Limits                              |           File Limits              
Type    Filesystem           Used    Quota   HLIMIT    Grace |    Files    Quota   HLIMIT    Grace
-------------------------------------------------------------|--------------------------------------
USR     home               8.401G      10G      20G        - |    61832   524288  1048576        -
USR     scratch              332G     512G      12T        - |    14523   323539  4194304        -
FILESET data+apollo        11.05T      20T      24T        - |   459764  4194304  8388608        -

You can go over your quota up to the hard limit for a grace period (14days). This grace period is to give you time to manage your files. When the grace period expires you will be unable to write any files until you are back under quota.


File transfer

To transfer files from your computer to Oscar, you can use:

  1. command line functions like scp and rsync, or
  2. GUI software

If you need to transfer large amounts of data you can use the transfer nodes on Oscar which will speed up the process. For that, use transfer.ccv.brown.edu instead of ssh.ccv.brown.edu as the host address.

If you have access to a terminal like on a Mac or Linux computer, you can conveniently use scp to transfer files. For example to copy a file from your computer to Oscar:

 scp /path/to/source/file <username>@transfer.ccv.brown.edu:/path/to/destination/file

To copy a file from Oscar to your computer:

 scp <username>@transfer.ccv.brown.edu:/path/to/source/file /path/to/destination/file

On Windows, if you have PuTTY installed, you can use it's pscp function from the terminal.

There are also GUI programs for transfering files using the scp or sftp protocol, like WinSCP for Windows and Fugu or Cyberduck for Mac. FileZilla is another GUI software for FTP which is available on all platforms.

Globus Online provides a transfer service for moving data between institutions such as Brown and XSEDE facilities. You can transfer files using the Globus web interface or the command line interface.


Restoring files

Nightly snapshots of the file system are available for the trailing seven days.

Home directory snapshot

/gpfs_home/.snapshots/<date>/<username>/<path_to_file> 

Data directory snapshot

/gpfs/.snapshots/<date>/data/<groupname>/<path_to_file> 

Scratch directory snapshot

/gpfs/.snapshots/<date>/scratch/<username>/<path_to_file> 

Do not use the links in your home directory snapshot to try and retrieve snapshots of data and scratch. The links will always point to the current versions of these files. An easy way to check what a link is pointing to is to use ls -l

e.g.

ls -l /gpfs_home/.snapshots/April_03/ghopper/data 
lrwxrwxrwx 1 ghopper navy 22 Mar  1  2016 /gpfs_home/.snapshots/April_03/ghopper/scratch -> /gpfs/data/navy 

If files to be restored were modified/deleted more than 7 days (and less than 30 days) ago and were in the HOME or DATA directory, you may contact us to retrieve them from nightly backups by providing the full path. Note that home and data directory backups are saved for trailing 30 days only.


Best Practices for I/O

Efficient I/O is essential for good performance in data-intensive applications. Often, the file system is a substantial bottleneck on HPC systems, because CPU and memory technology has improved much more drastically in the last few decades than I/O technology.

Parallel I/O libraries such as MPI-IO, HDF5 and netCDF can help parallelize, aggregate and efficiently manage I/O operations. HDF5 and netCDF also have the benefit of using self-describing binary file formats that support complex data models and provide system portability. However, some simple guidelines can be used for almost any type of I/O on Oscar:

  • Try to aggregate small chunks of data into larger reads and writes. For the GPFS file systems, reads and writes in multiples of 512KB provide the highest bandwidth.
  • Avoid using ASCII representations of your data. They will usually require much more space to store, and require conversion to/from binary when reading/writing.
  • Avoid creating directory hierarchies with thousands or millions of files in a directory. This causes a significant overhead in managing file metadata.

While it may seem convenient to use a directory hierarchy for managing large sets of very small files, this causes severe performance problems due to the large amount of file metadata. A better approach might be to implement the data hierarchy inside a single HDF5 file using HDF5's grouping and dataset mechanisms. This single data file would exhibit better I/O performance and would also be more portable than the directory approach.

Globus Online

Globus Online provides a transfer service for moving data between institutions such as Brown and XSEDE facilities. You can also use Globus to transfer files between these institutions and your local machine. Files can be transferred using the Globus web interface or the command line interface.


Using Globus

To use Globus, first create a personal Globus ID account. You can then use either the web or command line interface to move files.

The instructions below demonstrate using the web interface to perform a transfer:

After logging in to Globus you will see the transfer page. To set up the transfer user the select the "to" and "from" end point to transfer files. For Oscar the endpoint is called brownccv#transfer. You will need to use your oscar username and password to connect to the Oscar endpoint. If you want to use Globus Online to move data to/from you own machine, you can install Globus Connect Personal. Installing Globus Connect Personal allows you to create an endpoint on your computer that you can use to transfer data to and from your computer. For installation details see: https://www.globus.org/globus-connect-personal

You can then select the files you want to transfer

You start the transfer using the button in blue shown below. You should see "Transfer request submitted successfully" and an id for the transfer.

If you have started a transfer between two remote machines you don't have to keep your computer connect to Globus. When the transfer is compete you will receive and email. Note if you are using Globus Connect Personal for a transfer to/from your machine, Globus personal will need to stay running on your machine for the transfer to complete.

smbclient

Copying files from Department File Servers

The department file servers (also known as Isilon) are:

  • \\files.brown.edu\dfs (departmental/personal shares)
  • \\files.brown.edu\research (research shares)
  • \\files.brown.edu{sharename} (miscellaneous shares)

You can transfer files between Department File Servers and Oscar using smbclient.

1) Log into our dedicated system for transferring files in/out of Oscar:

ssh transfer.ccv.brown.edu

2) Start a screen session. This will allow you reattach to your terminal window if you disconnect.

screen

3) Connect to Department File Servers. Replace SHARE_NAME, DIRECTORY_NAME, and BROWN_ID. DIRECTORY_NAME is an optional parameter.

smbclient "//files.brown.edu/SHARE_NAME" -D DIRECTORY_NAME -U "ad\BROWN_ID"

4) Upload/download your data using the FTP "put"/"get" commands. Replace DIRECTORY_NAME with the folder you'd like to upload.

put DIRECTORY_NAME

5) You can detach from the screen session with a "CTRL+A D" keypress. To reattach to your session:

screen -r

smbclient basics

  • put is upload to Department File Servers

Usage: put <local_file> [remote file name]

Copy <local_file> from Oscar to Deparment File Servers. The remote file name is optional (use if you want to rename the file)

  • get is download to Oscar

Usage: get <remote_file> [local file name] Copy <remote_file> from the Deparment File Servers to Oscar. The local file name is optional (use if you want to rename the file)

Moving more than one file:

To move more than one file at once use mput or mget. By default:

recurse is OFF. smbclient will not recurse into any subdirectories when copying files

prompt is ON. smbclient will ask for confirmation for each file in the subdirectories

You can toggle recursion ON/OFF with:

recurse

You can toggle prompt OFF/ON with:

prompt

Software

Many scientific and HPC software packages are already installed on Oscar, and additional packages can be requested by submitting a ticket to support@ccv.brown.edu. If you want a particular version of the software, do mention it in the email along with a link to the web page from where it can be downloaded.

CCV cannot, however, supply funding for the purchase of commercial software. This is normally attributed as a direct cost of research, and should be purchased with research funding. CCV can help in identifying other potential users of the software to potentially share the cost of purchase and maintenance. Several commercial software products that are licensed campus-wide at Brown are available on Oscar, however.

For software that requires a Graphical User Interface (GUI) we recommend running on the VNC nodes using CCV's VNC client rather than X-Forwarding.

All programs are installed under /gpfs/runtime/opt/<software-name>/<version>. Example files and other files can be copied to your home, scratch or data directory if needed.


Software modules

CCV uses the PyModules package for managing the software environment on OSCAR. The advantage of the modules approach is that it allows multiple versions of the same software to be installed at the same time. With the modules approach, you can "load'' and "unload'' modules to dynamically control your environment. You can also customize the default environment that is loaded when you login. Simply put the appropriate module commands in the .modules file in your home directory. For instance, if you edited your .modules file to contain

module load matlab

then the default module for Matlab will be available every time you log in.

Module commands

module list Lists all modules that are currently loaded in your software environment.
module avail Lists all available modules on the system. Note that a module can multiple version numbers: this allows us to maintain legacy versions of software or to try out beta or preview versions without disrupting the stable versions.
module help package Prints additional information about the given package.
module load package Adds a module to your current environment. If you load the generic name of a module, you will get the default version. To load a specific version, load the module using its full name with the version: $ module load gcc/4.7.2
module unload package Removes a module from your current environment.

Note that the module avail command allows searching modules based on partial names. For example:

 $ module avail bo

will list all available modules whose name starts with "bo".

Output:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ name: bo*/* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
boost/1.40.0         bowtie/0.12.8        bowtie2/2.1.0        
boost/1.49.0         bowtie/0.12.9        bowtie2/2.2.1        
boost/1.52.0         bowtie/1.1.2         bowtie2/2.2.3        
boost/1.55.0         bowtie2/2.0.0-beta7  bowtie2/2.2.5        
boost/1.55.0-intel   bowtie2/2.0.5        bowtie2/2.2.9        
boost/1.62.0         bowtie2/2.0.6        

This feature can be used for finding what versions of a module are available.

Moreover, the module load command supports auto-completion of the module name using the "tab" key. For example, writing "module load bo" on the shell prompt and hitting "tab" key a couple of times will show results similar to that shown above. Similarly, the module unload command also auto completes using the names of modules which are loaded.

What modules actually do: They simply set the relevant environment variables like PATH, LD_LIBRARY_PATH and CPATH. For example, PATH contains all the directory paths (colon separated) where executable programs are searched for. So, by setting PATH through a module, now you can execute a program from anywhere in the file-system. Otherwise, you would have to mention the full path to the executable program file to run it which is very inconvenient. Similarly, LD_LIBRARY_PATH has all the directory paths where the run time linker searches for libraries while running a program, and so on. To see the values in an environment variable, use the echo command. For instance, to see what's in PATH:

$ echo $PATH
/gpfs/runtime/opt/perl/5.18.2/bin:/gpfs/runtime/opt/python/2.7.3/bin:/gpfs/runtime/opt/java/7u5/bin:
/gpfs/runtime/opt/intel/2013.1.106/bin:/gpfs/runtime/opt/centos-updates/6.3/bin:/usr/lib64/qt-3.3/bin:
/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/opt/ibutils/bin:/gpfs/runtime/bin

MATLAB

MATLAB is available as a software module on Oscar. The default version of MATLAB is loaded automatically when you log in. Make sure you do not run Matlab on a login node.

We have a separate page on using Matlab on Oscar for more detailed info.

The command matlab is actually a wrapper that sets up MATLAB to run as a single-threaded, command-line program, which is the optimal way to pack multiple MATLAB scripts onto the Oscar compute nodes.

If you will only be running one MATLAB script per compute node, you can instead run MATLAB in threaded mode with:

$ matlab-threaded

MATLAB GUI

The VNC client provided by CCV is the best way to launch GUI applications on Oscar, including Matlab. You can also run the MATLAB GUI in an X-forwarded interactive session. For launching the GUI, you need to use the matlab-threaded command, which enables the display and JVM.

Example Batch Scripts

You can find an example batch script for running Matlab on an Oscar compute node in your home directory:

~/batch_scripts/matlab-serial.sh

SLURM job arrays can be used to submit multiple jobs using a single batch script. E.g. when a single Matlab script is to be used to run analyses on multiple input files or using different input parameters. An example batch script for submitting a Matlab job array can be found at:

~/batch_scripts/matlab-array.sh

Python packages

If you need a particular python package, chances are that it is already installed. Using command pip list will list all available packages. Try to import that package, seeing if it works.

However, if a particular package has many dependencies or many versions are required at the same time, we install it as a separate environment module instead of installing directly under python. So, use the module avail command to look for the package. You can contact us to have a particular package installed if it is not available.

Users will not be able to install packages globally using pip install. Although, you can always install them locally in your home directory and then set the PYTHONPATH environment variable accordingly.

Running Jobs

A "job" refers to a program running on the compute nodes of the Oscar cluster. Jobs can be run on Oscar in two different ways:

  • An interactive job allows you to interact with a program by typing input, using a GUI, etc. But if your connection is interrupted, the job will abort. These are best for small, short-running jobs where you need to test out a program, or where you need to use the program's GUI.
  • A batch job allows you to submit a script that tells the cluster how to run your program. Your program can run for long periods of time in the background, so you don't need to be connected to Oscar. The output of your program is continuously written to an output file that you can view both during and after your program runs.

Jobs are scheduled to run on the cluster according to your account priority and the resources you request (e.g. cores, memory and runtime). For batch jobs, these resources are specified in a script referred to as a batch script, which is passed to the scheduler using a command. When you submit a job, it is placed in a queue where it waits until the required computes nodes become available.

NOTE: please do not run CPU-intense or long-running programs directly on the login nodes! The login nodes are shared by many users, and you will interrupt other users' work.

We use the Simple Linux Utility for Resource Management (SLURM) from Lawrence Livermore National Laboratory as the job scheduler on Oscar. With SLURM, jobs that only need part of a node can share the node with other jobs (this is called "job packing"). When your program runs through SLURM, it runs in its own container, similar to a virtual machine, that isolates it from the other jobs running on the same node. By default, this container has 1 core and a portion of the node's memory.

The following sections have more details on how to run interactive and batch jobs through SLURM, and how to request more resources (either more cores or more memory).


Interactive jobs

To start an interactive session for running serial or threaded programs on an Oscar compute node, simply run the command interact from the login node:

$ interact

By default, this will create an interactive session that reserves 1 core, 4GB of memory, and 30 minutes of runtime.

You can change these default limits with the following command line options:

usage: interact [-n cores] [-t walltime] [-m memory] [-q queue]
                [-o outfile] [-X] [-f featurelist] [-h hostname] [-g ngpus]

Starts an interactive job by wrapping the SLURM 'salloc' and 'srun' commands.

options:
  -n cores        (default: 1)
  -t walltime     as hh:mm:ss (default: 30:00)
  -m memory       as #[k|m|g] (default: 4g)
  -q queue        (default: 'batch')
  -o outfile      save a copy of the session's output to outfile (default: off)
  -X              enable X forwarding (default: no)
  -f featurelist  CCV-defined node features (e.g., 'e5-2600'),
                  combined with '&' and '|' (default: none)
  -h hostname     only run on the specific node 'hostname'
                  (default: none, use any available node)
  -a account     user SLURM accounting account name

For example:

$ interact -n 20 -t 01:00:00 -m 10g

This will request 20 cores, 1 hour of time and 10 GB of memory (per node).

If you need access to GPUs, see https://www.ccv.brown.edu/doc/gpu.


MPI programs

To run an MPI program interactively, first create an allocation from the login nodes using the salloc command:

$ salloc -N <# nodes> -n <# MPI tasks> -p <partition> -t <minutes>

Once the allocation is fulfilled, it will place you in a new shell where you can run MPI programs with the srun command:

$ srun ./my-mpi-program ...

When you are finished running MPI commands, you can release the allocation by exiting the shell:

$ exit

For more info on MPI programs, see https://www.ccv.brown.edu/doc/mpi.


Batch jobs

To run a batch job on the Oscar cluster, you first have to write a script that describes what resources you need and how your program will run. Example batch scripts are available in your home directory on Oscar, in the directory:

~/batch_scripts

To submit a batch job to the queue, use the sbatch command:

$ sbatch <jobscript>

A batch script starts by specifying the bash shell as its interpreter, with the line:

#!/bin/bash

Next, a series of lines starting with #SBATCH define the resources you need, for example:

#SBATCH -n 4
#SBATCH -t 1:00:00
#SBATCH --mem=16G

Note that all the #SBATCH instructions must come before the commands you want to run. The above lines request 4 cores (-n), an hour of runtime (-t), and 16GB of memory per node (hence, for all cores) (--mem). By default, a batch job will reserve 1 core and a proportional amount of memory on a single node.

Alternatively, you could set the resources as command-line options to sbatch:

$ sbatch -n 4 -t 1:00:00 --mem=16G <jobscript>

The command-line options will override the resources specified in the script, so this is a handy way to reuse an existing batch script when you just want to change a few of the resource values.

The sbatch command will return a number, which is your job ID. You can view the output of your job in the file slurm-<jobid>.out in the directory where you ran the sbatch command. For instance, you can view the last 10 lines of output with:

$ tail -10 slurm-<jobid>.out

Alternatively, you can mention the file names where you want to dump the standard output and errors using the -o and -e flags.

Useful sbatch options:

-J Specify the job name that will be displayed when listing the job.
-n Number of cores.
-N Number of nodes.
-t Runtime, as HH:MM:SS.
--mem= Requested memory per node.
-p Request a specific partition.
-C Add a feature constraint (a tag that describes a type of node). You can view the available features on Oscar with the nodes command.

--mail-type=

Specify the events that you should be notified of by email: BEGIN, END, FAIL, REQUEUE, and ALL.

--mail-user=

Email ID where you should be notified.

You can read the full list of options at http://slurm.schedmd.com/sbatch.html or with the command:

$ man sbatch

Managing jobs

Canceling a job:

$ scancel <jobid>

Listing running and queued jobs:

The squeue command will list all jobs scheduled in the cluster. We have also written wrappers for squeue on Oscar that you may find more convenient:

myq List only your own jobs.
myq <user> List another user's jobs.
allq List all jobs, but organized by partition, and a summary of the nodes in use in the partition.
allq <partition> List all jobs in a single partition.
myjobinfo Get the time and memory used for your jobs.

Listing completed jobs

The sacct command will list all of your running, queued and completed jobs since midnight of the previous day. To pick an earlier start date, specify it with the -S option:

$ sacct -S 2012-01-01

To find out more information about a specific job, such as its exit status or the amount of runtime or memory it used, specify the -l ("long" format) and -j options with the job ID:

$ sacct -lj <jobid>

The myjobinfo command uses the sacct command to display "Elapsed Time", "Requested Memory" and "Maximum Memory used on any one Node" for your jobs. This can be used to optimize the requested time and memory to have the job started as early as possible. Make sure you request a conservative amount based on how much was used.

$ myjobinfo

Info about jobs for user 'mdave' submitted since 2017-05-19T00:00:00
Use option '-S' for a different date
 or option '-j' for a specific Job ID.

       JobID    JobName              Submit      State    Elapsed     ReqMem     MaxRSS
------------ ---------- ------------------- ---------- ---------- ---------- ----------
1861                ior 2017-05-19T08:31:01  COMPLETED   00:00:09     2800Mc      1744K
1862                ior 2017-05-19T08:31:11  COMPLETED   00:00:54     2800Mc     22908K
1911                ior 2017-05-19T15:02:01  COMPLETED   00:00:06     2800Mc      1748K
1912                ior 2017-05-19T15:02:07  COMPLETED   00:00:21     2800Mc      1744K

'ReqMem' shows the requested memory:
 A 'c' at the end of number represents Memory Per CPU, a 'n' represents Memory Per Node.
'MaxRSS' is the maximum memory used on any one node.
Note that memory specified to sbatch using '--mem' is Per Node.

Partitions

When submitting a job to the Oscar compute cluster, you can choose different partitions depending on the nature of your job. You can specify one of the partitions listed below either in your sbatch command:

$ sbatch -p <partition> <batch_script>

or as an SBATCH option at the top of your batch script:

#SBATCH -p <partition>

Partitions available on Oscar:

batch Default partition with most of the compute nodes: 8-, 12-, 16-, 20-core or SMP; 64GB to 128GB of memory (505GB on SMP); all Intel based except the SMP nodes.
gpu Specialized compute nodes (8-core, 24GB, Intel) each with 2 NVIDIA GPU accelerators.
debug Dedicated nodes for fast turn-around, but with a short time limit of 40 node-minutes.

You can view a list of all the Oscar compute nodes broken down by type with the command:

$ nodes

Job priority

The scheduler considers many factors when determining the run order of jobs in the queue. These include the:

  • size of the job;
  • requested walltime;
  • amount of resources you have used recently (e.g., "fair sharing");
  • priority of your account type.

The account priority has three tiers:

  • Low (Exploratory)
  • Medium (Premium)
  • High (Condo)

Both Exploratory and Premium accounts can be affiliated with a Condo, and the Condo priority only applies to a portion of the cluster equivalent in size to the Condo. Once the Condo affiliates have requested more nodes than available in the Condo, their priority drops down to either medium or low, depending on whether they are a Premium or Exploratory account.

Backfilling: When a large or long-running job is near the top of the queue, the scheduler begins reserving nodes for it. If you queue a smaller job with a walltime shorter than the time required for the scheduler to finish reserving resources, the scheduler can backfill the reserved resources with your job to better utilize the system. Here is an example:

  • User1 has a 64-node job with a 24 hour walltime waiting at the top of the queue.
  • The scheduler can't reserve all 64 nodes until other currently running jobs finish, but it has already reserved 38 nodes and will need another 10 hours to reserve the final 26 nodes.
  • User2 submits a 16-node job with an 8 hour walltime, which is backfilled into the pool of 38 reserved nodes and runs immediately.

By requesting a shorter walltime for your job, you increase its chances of being backfilled and running sooner. In general, the more accurately you can predict the walltime, the sooner your job will run and the better the system will be utilized for all users.


Condo priority

Users who are affiliated with a Condo group will automatically use that Condo's priority when submitting jobs with sbatch.

Users who are Condo members and also have Premium accounts will by default use their Premium priority when submitting jobs. This is because the core limit for a Premium account is per user, while the limit for a Condo is per group. Submitting jobs under the Premium account therefore leaves more cores available to the Condo group.

Since Premium accounts have slightly lower priority, a user in this situation may want to instead use the Condo priority. This can be accomplished with the --qos option, which stands for "Quality of Service" (the mechanism in SLURM that CCV uses to assign queue priority).

Condo QOS names are typically <groupname>-condo, and you can view a full list with the condos command on Oscar. The command to submit a job with Condo priority is:

$ sbatch --qos=<groupname>-condo ...

Alternatively, you could place the following line in your batch script:

#SBATCH --qos=<groupname>-condo

To be pedantic, you can also select the priority QOS with:

$ sbatch --qos=pri-<username> ...

although this is unnecessary, since it is the default QOS for all Premium accounts.


Job arrays

A job array is a collection of jobs that all run the same program, but on different values of a parameter. It is very useful for running parameter sweeps, since you don't have to write a separate batch script for each parameter setting.

To use a job array, add the option:

#SBATCH --array=<range>

in your batch script. The range can be a comma separated list of integers, along with ranges separated by a dash. For example:

1-20
1-10,12,14,16-20

A job will be submitted for each value in the range. The values in the range will be substituted for the variable $SLURM_ARRAY_TASK_ID in the remainder of the script. Here is an example of a script for running a serial Matlab script on 16 different parameters by submitting 16 different jobs as an array:

#!/bin/bash
#SBATCH -J MATLAB
#SBATCH -t 1:00:00
#SBATCH --array=1-16

# Use '%A' for array-job ID, '%J' for job ID and '%a' for task ID
#SBATCH -e arrayjob-%a.err
#SBATCH -o arrayjob-%a.out

echo "Starting job $SLURM_ARRAY_TASK_ID on $HOSTNAME"
matlab -r "MyMatlabFunction($SLURM_ARRAY_TASK_ID); quit;"

You can then submit the multiple jobs using a single sbatch command:

$ sbatch <jobscript>

For more info: https://slurm.schedmd.com/job_array.html


Common Questions

  • How is a job identified?
    By a unique JobID, e.g. 13180139

  • Which of my jobs are running/pending?
    Use the command myq

  • How do I check the progress of my running job?
    You can look at the output file. The default output file is slurm-%j.out" where %j is the JobID. If you specified and output file using #SBATCH -o output_filename and/or an error file #SBATCH -e error_filename you can check these files for any output from your job. You can view the contents of a text file using the program less , e.g.

    less output_filename
    

    Use the spacebar to move down the file, b to move back up the file, and q to quit.

  • My job is not running how I indented it too. How do I cancel the job?
    scancel <JobID> where <JobID> is the job allocation number, e.g. 13180139

  • How do I save a copy of an interactive session?
    You can use interact -o outfile to save a copy of the session's output to "outfile"

  • I've submitted a bunch of jobs. How do I tell which one is which?
    myq will list the running and pending jobs with their JobID and the name of the job. The name of the job is set in the batch script with #SBATCH -J jobname. For jobs that are in the queue (running or pending) you can use the command
    scontrol show job <JobID> where <JobID> is the job allocation number, e.g.13180139 to give you more detail about what was submitted.

  • How do I ask for a haswell node?

    Use the --constraint (or -C) option:

    #SBATCH --constraint=haswell
    
  • What are the nodes with names starting as "smp"? eg. "smp013"

    SMP stands for symmetric multiprocessing. These nodes are meant to be useful with jobs which use a large numbers of CPUs on the same node for shared memory parallelism. However, comparing sequentially they can be much slower because their architecture is quite old.

  • How do I avoid running on the SMP nodes?

    The SMP nodes are all AMD nodes. All others are Intel architecture. Hence, you can avoid SMP nodes by asking for just Intel nodes:

    #SBATCH --constraint=intel
    
  • Why won't my job start?
    When your job is pending (PD) in the queue, SLURM will display a reason why your job is pending. The table below shows some common reasons for which jobs are kept pending.

Reason What this means
(None) You may see this for a short time when you first submit a job
(Resources) There are not enough free resources to fulfill your request
(QOSGrpCpuLimit) All your condo cores are currently in use
(JobHeldUser) You have put a hold on the job. The job will not run until you lift the hold.
(Priority) Jobs with higher priority are using the resources
(ReqNodeNotAvail) The resources you have requested are not available. Note this normally means you have requested something impossible, e.g. 100 cores on 1 node, or a 24 core sandy bridge node. Double check your batch script for any errors. Your job will never run if you are requesting something that does not exist on Oscar.
(PartitionNodeLimit) You have asked for more nodes than exist in the partition. For example if you make a typo and have specified -N (nodes) but meant -n (tasks) and have asked for more than 64 nodes. Your job will never run. Double check your batch script.

GPU Computing

Oscar has 44 GPU nodes that are regular compute nodes with two NVIDIA Tesla M2050 GPUs (Fermi architecture) added. Each M2050 GPU has 448 CUDA cores and 3GB GDDR5 memory. To gain access to these nodes, please submit a support ticket and ask to be added to the 'gpu' group. Please note that these gpu nodes can only be used for single node jobs.


Interactive Use

To start an interactive session on a GPU node, use the interact command and specify the gpu partition. You also need to specify the requested number of GPUs using the -g option:

$ interact -q gpu -g 1

GPU Batch Job

For production runs with exclusive access to GPU nodes, please submit a batch job to the gpu partition. E.g. for using 1 GPU:

$ sbatch -p gpu --gres=gpu:1 <jobscript>

This can also be mentioned inside the batch script:

#SBATCH -p gpu --gres=gpu:1

You can view the status of the gpu partition with:

$ allq gpu

Sample batch script for CUDA program:

~/batch_scripts/cuda.sh

Getting started with GPUs

While you can program GPUs directly with CUDA, a language and runtime library from NVIDIA, this can be daunting for programmers who do not have experience with C or with the details of computer architecture.

You may find the easiest way to tap the computation power of GPUs is to link your existing CPU program against numerical libraries that target the GPU:

  • CUBLAS is a drop-in replacement for BLAS libraries that runs BLAS routines on the GPU instead of the CPU.
  • CULA is a similar library for LAPACK routines.
  • CUFFT, CUSPARSE, and CURAND provide FFT, sparse matrix, and randon number generation routines that run on the GPU.
  • MAGMA combines custom GPU kernels, CUBLAS, and a CPU BLAS library to use both the GPU and CPU to simaultaneously use both the GPU and CPU; it is available in the 'magma' module on Oscar.
  • Matlab has a GPUArray feature, available through the Parallel Computing Toolkit, for creating arrays on the GPU and operating on them with many built-in Matlab functions. The PCT toolkit is licensed by CIS and is available to any Matlab session running on Oscar or workstations on the Brown campus network.
  • PyCUDA is an interface to CUDA from Python. It also has a GPUArray feature and is available in the cuda module on Oscar.

Introduction to CUDA

CUDA is an extension of the C language, as well as a runtime library, to facilitate general-purpose programming of NVIDIA GPUs. If you already program in C, you will probably find the syntax of CUDA programs familiar. If you are more comfortable with C++, you may consider instead using the higher-level Thrust library, which resembles the Standard Template Library and is included with CUDA.

In either case, you will probably find that because of the differences between GPU and CPU architectures, there are several new concepts you will encounter that do not arise when programming serial or threaded programs for CPUs. These are mainly to do with how CUDA uses threads and how memory is arranged on the GPU, both described in more detail below.

There are several useful documents from NVIDIA that you will want to consult as you become more proficient with CUDA:

There are also many CUDA tutorials available online:

Threads in CUDA

CUDA uses a data-parallel programming model, which allows you to program at the level of what operations an individual thread performs on the data that it owns. This model works best for problems that can be expressed as a few operations that all threads apply in parallel to an array of data. CUDA allows you to define a thread-level function, then execute this function by mapping threads to the elements of your data array.

A thread-level function in CUDA is called a kernel. To launch a kernel on the GPU, you must specify a grid, and a decomposition of the grid into smaller thread blocks. A thread block usually has around 32 to 512 threads, and the grid may have many thread blocks totalling thousands of threads. The GPU uses this high thread count to help it hide the latency of memory references, which can take 100s of clock cycles.

Conceptually, it can be useful to map the grid onto the data you are processing in some meaningful way. For instance, if you have a 2D image, you can create a 2D grid where each thread in the grid corresponds to a pixel in the image. For example, you may have a 512x512 pixel image, on which you impose a grid of 512x512 threads that are subdivided into thread blocks with 8x8 threads each, for a total of 64x64 thread blocks. If your data does not allow for a clean mapping like this, you can always use a flat 1D array for the grid.

The CUDA runtime dynamically schedules the thread blocks to run on the multiprocessors of the GPU. The M2050 GPUs available on Oscar each have 14 multiprocessors. By adjusting the size of the thread block, you can control how much work is done concurrently on each multiprocessor.

Memory on the GPU

The GPU has a separate memory subsystem from the CPU. The M2050 GPUs have GDDR5 memory, which is a higher bandwidth memory than the DDR2 or DDR3 memory used by the CPU. The M2050 can deliver a peak memory bandwidth of almost 150 GB/sec, while a multi-core Nehalem CPU is limited to more like 25 GB/sec.

The trade-off is that there is usually less memory available on a GPU. For instance, on the Oscar GPU nodes, each M2050 has only 3 GB of memory shared by 14 multiprocessors (219 MB per multiprocessor), while the dual quad-core Nehalem CPUs have 24 GB shared by 8 cores (3 GB per core).

Another bottleneck is transferring data between the GPU and CPU, which happens over the PCI Express bus. For a CUDA program that must process a large dataset residing in CPU memory, it may take longer to transfer that data to the GPU than to perform the actual computation. The GPU offers the largest benefit over the CPU for programs where the input data is small, or there is a large amount of computation relative to the size of the input data.

CUDA kernels can access memory from three different locations with very different latencies: global GDDR5 memory (100s of cycles), shared memory (1-2 cycles), and constant memory (1 cycle). Global memory is available to all threads across all thread blocks, and can be transferred to and from CPU memory. Shared memory can only be shared by threads within a thread block and is only accessible on the GPU. Constant memory is accessible to all threads and the CPU, but is limited in size (64KB).


Compiling with CUDA

To compile a CUDA program on Oscar, first load the CUDA module with:

$ module load cuda

The CUDA compiler is called nvcc, and for compiling a simple CUDA program it uses syntax simlar to gcc:

$ nvcc -o program source.cu

Optimizations for Fermi

The Oscar GPU nodes feature NVIDIA M2050 cards with the Fermi architecture, which supports CUDA's "compute capability" 2.0. To fully utilize the hardware optimizations available in this architecture, add the -arch=sm_20 flag to your compile line:

$ nvcc -arch=sm_20 -o program source.cu

This means that the resulting executable will not be backwards-compatible with earlier GPU architectures, but this should not be a problem since CCV nodes only use the M2050.

Memory caching

The Fermi architecture has two levels of memory cache similar to the L1 and L2 caches of a CPU. The 768KB L2 cache is shared by all multiprocessors, while the L1 cache by default uses only 16KB of the available 64KB shared memory on each multiprocessor.

You can increase the amount of L1 cache to 48KB at compile time by adding the flags -Xptxas -dlcm=ca to your compile line:

$ nvcc -Xptxas -dlcm=ca -o program source.cu

If your kernel primarily accesses global memory and uses less than 16KB of shared memory, you may see a benefit by increasing the L1 cache size.

If your kernel has a simple memory access pattern, you may have better results by explicitly caching global memory into shared memory from within your kernel. You can turn off the L1 cache using the flags –Xptxas –dlcm=cg.


Mixing MPI and CUDA

Mixing MPI (C) and CUDA (C++) code requires some care during linking because of differences between the C and C++ calling conventions and runtimes. One option is to compile and link all source files with a C++ compiler, which will enforce additional restrictions on C code. Alternatively, if you wish to compile your MPI/C code with a C compiler and call CUDA kernels from within an MPI task, you can wrap the appropriate CUDA-compiled functions with the extern keyword, as in the following example.

These two source files can be compiled and linked with both a C and C++ compiler into a single executable on Oscar using:

$ module load mvapich2 cuda
$ mpicc -c main.c -o main.o
$ nvcc -c multiply.cu -o multiply.o
$ mpicc main.o multiply.o -lcudart

The CUDA/C++ compiler nvcc is used only to compile the CUDA source file, and the MPI C compiler mpicc is used to compile the C code and to perform the linking.

01. /* multiply.cu */
02. 
03. #include <cuda.h>
04. #include <cuda_runtime.h>
05. 
06. __global__ void __multiply__ (const float *a, float *b)
07. {
08. const int i = threadIdx.x + blockIdx.x * blockDim.x;
09.     b[i] *= a[i];
10. }
11. 
12. extern "C" void launch_multiply(const float *a, const *b)
13. {
14.     /* ... load CPU data into GPU buffers a_gpu and b_gpu */
15. 
16.     __multiply__ <<< ...block configuration... >>> (a_gpu, b_gpu);
17. 
18.     safecall(cudaThreadSynchronize());
19.     safecall(cudaGetLastError());
20. 
21.     /* ... transfer data from GPU to CPU */

Note the use of extern "C" around the function launch_multiply, which instructs the C++ compiler (nvcc in this case) to make that function callable from the C runtime. The following C code shows how the function could be called from an MPI task.

01. /* main.c */
02. 
03. #include <mpi.h>
04. 
05. void launch_multiply(const float *a, float *b);
06. 
07. int main (int argc, char **argv)
08. {
09.        int rank, nprocs;
10.     MPI_Init (&argc, &argv);
11.     MPI_Comm_rank (MPI_COMM_WORLD, &rank);
12.     MPI_Comm_size (MPI_COMM_WORLD, &nprocs);
13. 
14.     /* ... prepare arrays a and b */
15. 
16.     launch_multiply (a, b);
17. 
18.     MPI_Finalize();
19.        return 1;
20. }

OpenACC

OpenACC is a portable, directive-based parallel programming construct. You can parallelize loops and code segments simply by inserting directives - which are ignored as comments if OpenACC is not enabled while compiling. It works on CPUs as well as GPUs. We have the PGI compiler suite installed on Oscar which has support for compiling OpenACC directives. To get you started with OpenACC:


MATLAB

GPU Programming in Matlab

MATLAB

MATLAB is very popular as a scientific computing tool because of it's IDE, ease of programmability and comprehensive library of high level functions. It is used extensively on clusters for post processing of simulation results, analysis of large amounts of experimental data, etc.

Matlab is available as a software module on Oscar. The default version of Matlab is loaded automatically when you log in.

Kindly make sure you do not run Matlab on a login node.


matlab-threaded command

On Oscar, the command matlab is actually a wrapper that sets up MATLAB to run as a single-threaded, command-line program, which is the optimal way to pack multiple Matlab scripts onto the Oscar compute nodes.

To run the actual multi-threaded version with JVM and Display enable, use:

$ matlab-threaded

Similarly, to run this without the display enabled:

$ matlab-threaded -nodisplay

MATLAB GUI

VNC

The VNC client provided by CCV is the best way to launch GUI applications on Oscar, including Matlab. From the terminal emulator in VNC, first load the module corresponding to the intended version of Matlab. Then use the matlab-threaded command to launch the Matlab GUI. For example,

$ module load matlab/R2016a
$ matlab-threaded

Here is a snapshot of what it looks like:

X11 Forwarding

You can also run the MATLAB GUI in an X-forwarded interactive session. This requires installing an X server on your workstation/PC and logging in to Oscar with X forwarding enabled - https://www.ccv.brown.edu/doc/x-forwarding. Use the interact command to get interactive access to a compute node. Again, for launching the GUI, you need to use the matlab-threaded command, which enables the display and JVM. You may however experience a lag in response from the Matlab GUI in an X forwarded session. Note that if Matlab does not find the X window system available, it will launch in command line mode (next section).

CIFS

A workaround in some situations may be to use CIFS to mount the Oscar filesystem on your PC and using the Matlab installation on your computer. For example, if you have your simulation results residing on Oscar, this might be a quick way to do post-processing on the data instead of having to move the data to your computer or using the Matlab GUI on Oscar. Note that users can connect to CIFS only from Brown computers or on Brown WiFi.


Matlab Command Line

Instead of the GUI, Matlab’s interpreter can be launched interactively on command line (text based interface):

$ matlab-threaded -nodisplay

This way, the user does not have to worry about launching the display and sluggish response from the GUI. The startup time is also much less. It might take some time to get used to the command line interface. We recommend that unless users need to use tools like debugger, profiler which are more convenient on the GUI, or need to see live plots, they can use the command line version. Ultimately, it is a personal choice.

Notes:

Set the $EDITOR environment variable prior to launching Matlab to be able to use edit command, e.g.

$ export EDITOR=nano

nano is a basic command line editor. There are other command line editors like vim and emacs that users can choose.

From the Matlab command line (represented by the >> symbol below), you can directly type the command to run a script or function after changing the directory to where it is located:

>> cd path/to/work/dir
>> myscript

To check version, license info and list all toolboxes available with version:

>> ver

To run a Matlab function myfunc.m from the shell:

$ matlab-threaded –nodisplay –r “myfunc(arg1,arg2)”

Batch Jobs

GUI and command line interpreter may be suitable only for visualization and debugging or optimization. Batch jobs should be the preferred way of running programs (actual production runs) on a cluster. The reason being high wait times because of the amount of resources required and higher run times typical of these programs. Moreover, batch jobs are much more convenient for running many programs simultaneously. Batch scripts are used for submitting jobs to the scheduler (SLURM) on Oscar, which are described in detail here.

Example Batch Script

Here is an example batch script for running a serial Matlab program on an Oscar compute node:

#!/bin/bash

# Request an hour of runtime:
#SBATCH --time=1:00:00

# Default resources are 1 core with 2.8GB of memory.

# Use more memory (4GB):
#SBATCH --mem=4G

# Specify a job name:
#SBATCH -J MyMatlabJob

# Specify an output file
#SBATCH -o MyMatlabJob-%j.out
#SBATCH -e MyMatlabJob-%j.out

# Run a matlab function called 'foo.m' in the same directory as this batch script.
matlab -r "foo(1), exit"

This is also available in your home directory as the file:

~/batch_scripts/matlab-serial.sh

Note the exit command at the end which is very important to include either there or in the Matlab function/script itself. If you don't make Matlab exit the interpreter, it will keep waiting for the next command until SLURM cancels the job after running out of requested walltime. So for example, if you requested 4 hours of walltime and your actual program completes in 1 hour, the SLURM job will not complete until the designated 4 hours which results in idle cores and wastage of resources and also blocks up your other jobs.

If the name of your batch script file is matlab-serial.sh, the batch job can be submitted using the following command:

$ sbatch matlab-serial.sh

Job Arrays

SLURM job arrays can be used to submit multiple jobs using a single batch script. E.g. when a single Matlab script is to be used to run analyses on multiple input files or using different input parameters. An example batch script for submitting a Matlab job array:

#!/bin/bash

# Job Name
#SBATCH -J arrayjob

# Walltime requested
#SBATCH -t 0:10:00

# Provide index values (TASK IDs)
#SBATCH --array=1-4

# Use '%A' for array-job ID, '%J' for job ID and '%a' for task ID
#SBATCH -e arrayjob-%a.err
#SBATCH -o arrayjob-%a.out

# single core
#SBATCH -n 1

# Use the $SLURM_ARRAY_TASK_ID variable to provide different inputs for each job

echo "Running job array number: "$SLURM_ARRAY_TASK_ID

module load matlab/R2016a

matlab-threaded -nodisplay -nojvm -r "foo($SLURM_ARRAY_TASK_ID), exit"

Index values are assigned to each job in the array. The $SLURM_ARRAY_TASK_ID variable represents these values and can be used to provide a different input to each job in the array. Note that this variable can be accessed from Matlab too using the getenv function:

getenv('SLURM_ARRAY_TASK_ID')

The above script can be found in your home directory as the file:

~/batch_scripts/matlab-array.sh

Improving Performance & Memory Management

Matlab programs often suffer from poor performance and running out of memory. Among other things, you can refer the following web pages for best practices for an efficient code:

The first step to speeding up Matlab applications is identifying the part which takes up most of the run time. Matlab's "Profiling" tool can be very helpful in doing that:

Further reading from Mathworks:


Parallel Programming in Matlab

You can explore GPU computing through Matlab if you think your program can benefit from massively parallel computations:

Finally, parallel computing features like parfor and spmd can be used by launching a pool of workers on a node. Note that the Parallel Computing Toolbox by itself cannot span across multiple nodes. Hence, requesting more than one node for a job will result in wastage of resources.

HPCmatlab is a framework for fast prototyping of parallel applications in Matlab. If your application has enough parallelism to use multiple nodes, you can use the Message Passing Interface (MPI) through HPCmatlab to send and receive messages among different Matlab processes. It uses MEX functions to wrap the C language MPI functions. It is installed as a module on Oscar:

module load hpcmatlab/1.0

Version Control

Version Control refers to the management of changes made to source code or any such large amount of information in a robust manner by multiple collaborators. Version Control is a crucial part of software development as it ensures data integrity during code sharing and also acts as a safeguard against accidental loss of changes made to the code. Moreover it allows multiple people to contribute to a code base in a transparent manner, preventing any loss of information. The two main types of version control systems are:

  1. Centralized Version Control System (CVCS) - eg. SVN (Subversion)
  2. Distributed Version Control system (DVCS) - eg. Git, Mercurial

In the CVCS model, a single large repository is maintained on a server. All clients (read: contributors) check out files from this server and submit changes to it. SVN (Subversion) is a very popular tool for CVC. Examples of organizations where SVN is used: Facebook, Apache Software Foundation.

In the DVCS model, all clients have their own copy of the repository. They can maintain changes in their local repository and periodically sync with the central repository. A very popular tool is Git. Git has become more popular of late because of an emphasis on Open Source software. Online hosting sites like GitHub and BitBucket which are integrated with Git make it easier to distribute Open Source code. Moreover, they also allow better access control for different contributors.

What model or tool to choose depends on your requirements. Both SVN and Git are installed on Oscar as modules. Here are some helpful links to get started with version control:


SVN at Brown

CIS at Brown maintains a server where you can host your SVN repository. See this page for more information: https://it.brown.edu/services/type/version-control-subversion


Git Configuration

While using Git on Oscar, make sure that you configure Git to have your correct Name and Email ID to avoid confusion while working with remote repositories (eg. GitHub, BitBucket).

$ git config --global user.name "John Smith“
$ git config --global user.email john@example.com

XSEDE

XSEDE is an NSF-funded, nation-wide collection of supercomputing systems that are available to researchers through merit-based allocations. It replaces what used to be called the TeraGrid.

A user may apply for one of the following allocation types:

  • Startup: The fastest way to get started on XSEDE, Startup allocations require minimum documentation, are reviewed all year long, and are valid for one year.
  • Education: Also lasting one year, an Education allocation provides time for academic or training classes.
  • Research: Research allocation requests are reviewed quarterly and require more formal documentation. Research allocations will be granted for one year and may be renewed or extended.

In addition, Brown is a member of the XSEDE Campus Champions program, which gives us a small allocation on each of the XSEDE machines. If you would like help getting started with XSEDE, or would like to discuss efficient use of resources please contact support@ccv.brown.edu.

Globus can be used to transfer files between Oscar and XSEDE machines. The Oscar Globus endpoint is brownccv#transfer.