Using the Totient Cluster

Overview


Minimalist Bash Scripting


Bash scripting can get somewhat complicated if you want to use it as a tool for processing. However, if you just need a simple script to submit jobs to a cluster, there are only a few key items you need to know.

  1. You can create local variables in your script. Like other scripting languages, types of variables are implicit. You can use a variable as a string, and then later set it to a number.

    # Create a string variable
    MY_STR="Some thing"
    # Create an integer
    MY_INT=12
    

    Note: in Bash, you only have integers. If you create a ‘real number’, it will be treated as a string.

  2. You can dereference the value of a variable by prefixing a $ to the name of the variable. The following will all print the value of each variable to the console:

    # Printing the local variables define above
    echo "The string is: $MY_STR"
    echo "The integer is: $MY_INT"
    # You also have access to the many 'environment variables'
    echo "The current working directory is: $PWD"
    echo "The C compiler you have set is: $CC"
    echo "The shell you are using is: $SHELL"
    

    Note: if the C compiler line did not print anything for $CC, then you have not set this environment variable. That is not a problem.

  3. Bash is particularly sensitive about whitespace.

    # Whitespace around the equals sign will break
    MY_VAR = "Some value"
    
  4. There is a difference between single and double quotes. In single quotes, special characters (\, $, !, ?, and * are some) are used literally. In double quotes, they keep their special meaning.

    Using double quotes:

    VAR="The string $MY_STR is not equal to $MY_INT."
    echo "$VAR"
    

    will print The string Some thing is not equal to 12. to the console.

    Using single quotes:

    VAR='The string $MY_STR is not equal to $MY_INT.'
    echo "$VAR"
    

    will print The string $MY_STR is not equal to $MY_INT. to the console.

    Note 1:

    Even the syntax highlighting for the above is different, most text editors do this as well. Generally you will be fine using double quotes. When writing more complicated scripts, it is useful to know that you can bypass special characters (if you have used sed, you have fought this before).

    Note 2:

    Whitespace revisited. You have access to the same printf function as C in your shell. If we do

    VAR="The string $MY_STR is not equal to $MY_INT."
    printf "%s\n" "$VAR"
    

    it will work. However, if we change it to

    VAR="The string $MY_STR is not equal to $MY_INT."
    printf "%s\n" $VAR
    

    it will perform differently. This is another example of the sensitivity of whitespace in Bash. When you have "$VAR", putting it inside the double quotes says ‘form this as a single entity’. When you have $VAR, the contents are expanded as is. Since there were spaces in the original string, they are now treated as separate arguments. printf will complain because there are now more arguments being supplied than format specifiers (%s in this case).

Queuing jobs on Totient


qsub [opts] <job-script.pbs>

  • Usually, the majority of the work is done by job-script.pbs
  • Read the man page! There are a lot of important environment variables that qsub defines for the script.
    • The one you will use most frequently: $PBS_O_WORKDIR
    • E.g. cd $PBS_O_WORKDIR in the script so the executable can be found
  • The other common argument you will use is the -v flag, which allows you to supply arguments to the job script.

The best way to understand how this all works is by running through some examples! To start, we will run through examples 1, 2, and 3 in the example repository.

Bash Scripting Revisited


Now that you know how to use qsub, the next thing you will probably want to be able to do is supply arguments to your PBS script. Ordinarily, when you want to use command-line arguments in Bash you can refer to them using

Variable Value
$0 The name of the script.
$1 The first command-line argument.
$2 The second command-line argument.
... ...
$n The nth command-line argument.
$* All command line arguments concatenated.
$@ The command line arguments as an array.

However, with qsub, if you want to provide your script with command-line arguments you will need to use the -v flag in conjunction with default arguments.

In the fourth example provided in the example repository, we are scripting the number of processors requested. The main method reads this as an additional command-line argument. In the job script job-pi-parallel-v2.pbs, we now use a default argument:

P_NUM_THREADS=${NUM_THREADS:-1}

This will create a variable P_NUM_THREADS that has a default value of 1. The syntax is a little strange, noting that the - indicates the default is to follow (as in the default value here is 1, not -1). The way it works is it tries to read in the environment variable NUM_THREADS. If the variable is found, then this will perform

P_NUM_THREADS=$NUM_THREADS

if it is not found, then it will perform

P_NUM_THREADS=1

Now, in order to provide a value for NUM_THREADS, we call qsub with

qsub -v NUM_THREADS=<value> job-pi-parallel-v2.pbs

This will define the environment variable NUM_THREADS for the pbs script, and you would replace <value> with the number of threads you would like to use.

The corresponding script to call this job script with a few different numbers of threads is batch_jobs.sh. It is important to remember that on most clusters you will be billed by compute time, which is related to the number of nodes / processors per node that you request. In the example script, we are intentionally varying the number of threads used (e.g. for a scaling study). In general, you will not need to do this but instead will normally want to request all of the nodes / threads you can ;)

Multiplexing a Terminal


Terminal multiplexing sounds like an advanced topic, but it is actually quite simple. The idea is to be able to leave a session open on one computer, and return to it later (e.g. to see if your computation has finished). In conjunction with ssh, this is an immensely powerful tool that you are encouraged to use.

In addition to being able to join the session again from another computer via ssh, the majority of terminal multiplexers provide simple keyboard shortcuts to allow you to spawn new terminals with ease. For example, if you had a single terminal open in a terminal multiplexer, with a simple keyboard shortcut you could split the screen vertically and have another terminal session open. No need to open a new terminal, navigate to where you want to go, and potentially use your window manager to position the original terminal on the left half and the new terminal on the right half.

There are many situations in which this could expedite your workflow. For example, suppose you were writing a C program, working on foo.c with a terminal-based text editor (e.g. emacs or vim). You realize that one of the methods you are writing needs to have the method signature changed in both foo.c and foo.h. Simply split the screen, open foo.h with your text editor, save the changes, and close the new pane. You never even needed to close the terminal editing foo.c!

While there are many multiplexers available (Screen, iTerm2, …), we use tmux. tmux, like many programs, will look for a configuration file in your home directory. Specifically, tmux will look for the file ~/.tmux.conf. Remember that if your username was someUser, then the file would be /home/someUser/.tmux.conf.

tmux functions without a configuration file, but you will ultimately want to make one. You are welcome to use the .tmux.conf hosted here to get you started, read the various options customized in the file and change accordingly (e.g. if your default shell is zsh, you will need to change the first to configurations).

Warning: for compatibility with vim, the default ctrl+b action trigger for tmux has been remapped to ctrl+a. Honorable mentions:

Keyboard Sequence Action Performed
ctrl+a then v Split current terminal pane vertically.
ctrl+a then s Split current terminal pane horizontally.
ctrl+a then z Maximize (or un-maximize) current terminal pane.
ctrl+a then h Make terminal pane to the left the active pane.
ctrl+a then j Make terminal pane below the active pane.
ctrl+a then k Make terminal pane above the active pane.
ctrl+a then l Make terminal pane to the right the active pane.
ctrl+d Close active terminal pane. If only one, ends tmux session

Warning: if you use vim in tmux and accidentally hit ctrl+s, you have activated a feature of tmux you probably did not intend. Gain control of vim again with ctrl+q.

Lastly, if you want to

Shared Terminal Sessions


There are many ways to share terminal sessions, but we rather enjoy tmate. It will use all of your tmux configurations, but will enable you to allow others to view what you are typing in your terminal both via ssh or via a web browser.

Simply execute tmate show-messages to obtain the URLs for both, noting that you should clear your screen immediately! tmate allows users to join AND write to your terminal (if you give them the associated URL), so you will want to make sure to hide the URLs shown by tmate show-messages for read-only sessions.

Note: there is a known bug with tmate where if users ssh into a read-only session, the only way to leave the session is to close the terminal entirely. This can be problematic if a user was say already in a tmux session that was important to them, so make sure you warn your users ahead of time!

Checkout the samples at tmate’s website.