Amherst College I.T. I.T. home. Amherst College.
I.T. home.
ATS > Software > Scientific Programming with Python > Simulating a Magnet

Scientific Programming with Python

Part 3: Simulating a Magnet

Previous: Newton’s Laws of Motion

Current: Simulating a Magnet

 


NumPy provides a number of tools useful for simulations and for working with simple data types.

Topics


The Ising Model

The Ising model is a simple model of a ferromagnet, using a regular lattice of magnetic “spins” that influence each other to line up, fighting against entropy all the while.


Magnetic Spins

Magnets are atomic lattices with unpaired electrons, producing a spin magnetic moment at each position of the lattice:

↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑          ↑ ↑ ↑ ↑ ↑ ↓ ↓ ↑ ↑ ↑          ↑ → ← ↑ ↑ ↓ ← ↑ → →

↑ ↑ ↑ ↑ ↑ ↑ ↓ ↑ ↑ ↑          ↑ ↑ ↑ ↑ ↑ ↓ ↓ ↓ ↑ ↑          ↓ ← ↑ ↓ ← → ↓ ← → ↑

↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑          ↑ ↑ ← ← ↑ ↑ ↑ ↑ ↑ ↑          ← ↓ → ↓ ← ↑ ← ↑ ← ↓

↑ ↑ ↑ ← ↑ ↑ ↑ ↑ ↑ ↑          ↑ ← ← ← ↑ ↑ → → → →          ↓ ← → ← ↑ ← ↓ → ↓ →

↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑          ↑ ↑ ↑ ↑ ↑ ↑ → → → ↑          ← ↑ ↓ ↓ → ↑ ← → ↑ ↓

  Low Temperature             Medium Temperature            High Temperature

Bar Magnet and Magnetic FilingsThese spins are tiny magnets and like to align with each other, and collectively can add up to a macroscopically measurable magnetization, such as in the bar magnet at the right:

Ferromagnetic DomainsThe higher the temperature, however, the more the thermal energy randomizes the spins, and the uniform alignment is broken up by patches of spins in other directions called domains, as in the red and green regions in the micrograph at the left.

Above a critical temperature there is no longer a dominant magnetic direction, and the domain magnetizations cancel each other out.

At infinite temperature the spins are completely random in their orientation even over short distances.

We can describe the tendency of the spins to line up in terms of the (potential) energy of the spin interaction. Over the entire lattice of size N it is given by

E = Σij Jij si·sj

where Jij is a positive coupling constant that incorporates the magnetic moment. Then,

  • an alignment of the spins will produce a positive dot product and lower the energy, which is a more favorable state;
  • an anti-alignment of the spins will produce a negative dot product and increase the energy, which is a less favorable state.

Complete spin alignment will have a large negative energy, while thermally randomized spins will have a higher energy near zero.

To simplify the model we can:

  • establish a single z axis so that the spins in the vector dot product si·sj can be replaced by their z-component si = ±1 ;
  • limit the range of the interaction to only nearest-neighbor pairs, which can be written into the sum and allowJ to be a single value.

The energy thereby becomes

E = J Σnn sisj

And we can represent the atomic lattice by an array of ±1 values:

  1 -1 -1  1  1 -1  1  1  1 -1
 -1 -1 -1  1 -1 -1 -1  1  1  1
  1 -1  1  1 -1  1  1 -1 -1  1
 -1 -1  1 -1  1  1 -1 -1  1  1
 -1 -1 -1  1 -1  1  1  1  1 -1
  1 -1 -1  1 -1  1  1  1  1  1
  1 -1 -1 -1  1 -1 -1 -1 -1 -1
 -1 -1  1 -1 -1 -1  1 -1 -1 -1
 -1 -1  1  1 -1 -1 -1 -1 -1  1
 -1 -1  1  1 -1 -1 -1  1 -1  1

This is known as the Ising Model of magnets, and as we will see it retains their basic characteristics.

When the spins substantially align, there will be a measureable magnetization:

M = (1/N)∣Σ si

  • When T → 0, the spins completely align, and E → –2NJ and M → 1.
  • When T → ∞, the spins will be randomly oriented, and E → 0 and M → 0.

The magnetization is an example of an order parameter, which we can use to monitor the system order.


Spyns

Not surprisingly, we can implement an Ising system and calculate its magnetization using Python.

We’ll use a two-dimensional lattice of size N = L × L:

L = 10
N = L**2

NumPy can define arrays with preset values in a number of ways. For example, to create a system at zero temperature, all spins aligned, we use:

import numpy as np
spins = np.ones((L,L), np.int)
print(spins)


[[1 1 1 1 1 1 1 1 1 1] [1 1 1 1 1 1 1 1 1 1] [1 1 1 1 1 1 1 1 1 1] [1 1 1 1 1 1 1 1 1 1] [1 1 1 1 1 1 1 1 1 1] [1 1 1 1 1 1 1 1 1 1] [1 1 1 1 1 1 1 1 1 1] [1 1 1 1 1 1 1 1 1 1] [1 1 1 1 1 1 1 1 1 1] [1 1 1 1 1 1 1 1 1 1]]

The first argument to ones() is the size of the lattice, as an n-tuple for an n-dimensional lattice.

The second argument sets the data type, in this case NumPy integers (otherwise it would produce a float).

The magnetization will tell us the degree of order in the model:

M = lambda : np.absolute(np.sum(spins))/N
print(M())

⇒ 1.0

Perfectly ordered, by design!

Warning: in Python 2.x, you’ll want to use float(N) in the above expression to avoid integer truncation.

The keyword lambda is an alternative way to define a function, often used for short and one-line functions. It’s known as an anonymous function because no name is defined for it, though it can be assigned a name as above. More importantly, it’s an expression rather than a complete statement, so it can also be an argument to a function.

The function np.sum() simply adds together all elements of the array it’s handed as an argument.

To create a system at infinite temperature, with a random distribution of values, we can use a subset of NumPy tools, numpy.random, that provides random number generators:

import numpy.random as npr
spins = npr.random_integers(0, 1, (L,L)) * 2 - 1
print(spins)


[[ 1  1  1  1  1 -1 -1 -1 -1  1]
 [ 1 -1  1  1 -1  1 -1 -1  1 -1]
 [ 1 -1 -1  1  1 -1  1  1  1  1]
 [-1  1  1  1  1  1 -1 -1 -1 -1]
 [-1 -1 -1  1 -1 -1 -1  1  1 -1]
 [-1  1  1 -1 -1 -1  1  1 -1  1]
 [ 1  1 -1 -1  1  1  1 -1  1  1]
 [-1  1  1 -1  1 -1  1  1 -1 -1]
 [ 1 -1  1  1 -1 -1  1 -1  1 -1]
 [-1 -1  1  1 -1  1 -1  1  1 -1]]

The first two arguments to random_integers() provide the range of integer values, minimum to maximum inclusive, here producing values in the range (0, 1).

The third argument to random_integers() is the size of the lattice, again as an n-tuple for an n-dimensional lattice, here (L, L).

A broadcast multiplication of the array of integers (0, 1) by 2 and then subtracting 1 produces the desired values of (–1, 1).

The magnetization is close to zero, as it should be for a completely disordered state:

print('M = ', M())

⇒ M = 0.06

It will never be precisely zero because it’s defined here, essentially, as the standard deviation of the average spin, but the larger the lattice we choose, the closer to zero it will be:

M = (1/N)∣Σ si~ 1/N1/2

So for N = 100, M ~ 0.1, as seen in the code snippet above.


Spyn Statistics

For a given temperature, how can we put this system in an equilibrium state so that we can calculate its order parameter? First we better define what we mean by “equilibrium”.

Statistical mechanics tells us that, for a system in equilibrium at a temperature T, representing a bath of thermal energy that randomizes the spins, the probability of a particular configuration of the spins is proportional to its Boltzmann factor:

P({si}) ∝ exp(–E/kT)

where the Boltzmann constant k “converts” temperature to a corresponding energy.

Clearly the lowest energy state will have the highest probability.

For the Ising Model:

P({si}) ∝ exp((–J Σnn sisj)/kT)
        = exp(β Σnn sisj )

where β = J/kT varies between ∞ and 0 as the temperature T varies between 0 and ∞.

In particular,

  • When T → ∞ and β → 0, the exponential is equal to 1 for every pair of spins no matter what their orientation, meaning all states are equally likely, and there will be just as many “up” spins as “down” spins, on average.
  • When T → 0 and β → ∞, the exponential is largest when the spins are all +1 or all –1, i.e. they all line up in one of those two directions.

Now consider an individual spin si and its neighbors sinn; it has an energy relative to its neighbors of

Ei = –(J /2) si Σnnsinn

where the factor of 1/2 avoids double-counting the energy once we include the neighboring spins. It’s helpful to define a spin sum Si as:

Si = siΣnnsinn

Ei = –(J /2)Si

The probability of a particular microstate si given neighbors sinn is then:

P(si|sinn) ∝ exp(–Ei/kT)
          = exp((β/2)Si)

Distinct examples of the individual spin sum, energy, and probability are:

Configuration     1
 1 –1  1
    1
   -1
-1  1 -1
   -1
    1
-1  1 -1
   -1
   1
1  1 -1
  -1
   1
1  1 -1
   1
    1
 1  1  1
    1
   –1
–1 –1 –1
   –1
Si = –4 -4 -2 0 +2 +4 +4
Ei = +2J +2J +J 0 J –2J –2J
P(si|sinn)
exp(-2β)
< 1
exp(-2β)
< 1
exp(-β)
< 1
1 exp(+β)
> 1
exp(+2β)
> 1
exp(+2β)
> 1

We can again see that:

  • when the spins are less aligned, Si < 0 and Ei > 0 and P(si|sinn) < 1;
  • when the spins are more aligned Si > 0 and Ei < 0 and P(si|sinn) > 1.

In addition, note that flipping the central spin will always invert the signs of Si and Ei.

Now suppose we were to flip the central spin and this lowered the energy, e.g. we went from the second (blue) configuration in the table above to the last (pink) configuration. The spin sum is higher and the probability is also higher.

On the other hand, if flipping the central spin increased the energy, e.g. going from the last (pink) configuration to the second (blue) configuration, the spin sum will be lower and the probabilty will also be lower.

When in equilibrium the spins will flip back and forth over time, driven by the random energy of the temperature bath. However, on average, the system retains its state, in particular its degree of order. This means that the probability P of a microstate times the rate of transition W out of that microstate must be the same as the reverse process, i.e. over time the transition probabilities are equal:

P(si|sinn)W(si → –si) = P(–si|sinn)W(–sisi)

This condition is known as detailed balance, and is what we mean when we say a system is in equilibrium.

Detailed balance can also be used to drive a nonequilbrium system to equilibrium by flipping spins in a compatible manner, and this is the key to simulating such a system.

The ratio of the two transition rates is given by

W(–sisi)/ W(si → –si) = P(si|sinn)/P(–si|sinn)
                        = exp((β/2)Si) / exp(–(β/2)Si)
                        = exp(βSi)

Suppose flipping the spin si → –si lowers the energy; again this means that initially Ei > 0 and Si < 0. Because only the above ratio is important, we can choose this transition rate equal to 1, i.e. the spin will always flip:

W(Ehigh → Elow) = 1

Consequently, when flipping the spin si → –si increases the energy, initially Ei < 0 and Si > 0, we must have

W(Elow → Ehigh) = exp(–βSi) < 1

and the spin only has a fractional probability of flipping.

Increasing the energy is therefore less likely than decreasing it, and after repeated flipping attempts the higher energy state will become the most probable state, so that a nonequilbrium system must approach equilibrium over time.


Monte Carlo Simulations

Because of the randomizing effect of thermal energy and the probabilistic nature of the system energetics, we can use random numbers to equilibrate a system.


The Monte Carlo Procedure

Suppose we start with the spins at infinite temperature, which means they are randomly either ±1. We then instantaneously lower the temperature, a process known as quenching, and allow the system to equilibrate, or anneal.

Alternatively, for low temperatures, we can start with a system at zero temperature, with a uniform distribution of either +1 or –1, and then instantantaneously raise the temperature.

We can simulate such systems as follows:

  1. Generate a lattice with the known initial configuration;
  2. Randomly pick a spin;
  3. Calculate its spin sum Si = siΣnnsinn;
  4. If Si is less than or equal to zero, we flip it, since W(Ehigh → Elow) = 1;
  5. Otherwise, calculate W(Elow → Ehigh) = exp(–βSi);
    1. Calculate a uniformly distributed random number 0 ≤ R < 1 and compare it to W;
    2. If R < W, flip the spin (because a small transition rate W means a low probability of flipping);
  6. Return to (2) and repeat until the system equilibrates.

This Monte Carlo procedure is a standard practice in statistical science as a way to calculate the properties of complex systems.


The Monte Python Simulation

We’ll save the previous code in a file named monte.py!

import numpy as np
import
numpy.random as npr
L = 10
N = L**2
M = lambda : np.absolute(np.sum(spins))/N
spins = npr.random_integers(0, 1, (L,L)) * 2 - 1
# More code here.
print(spins)
print('M = ', M())

We’ll also set a large value of β that should produce some amount of ordering since it corresponds to a low temperature:

beta = 10

Now the next steps will take place in a loop after generating the spins, starting by randomly picking an initial spin, as sequential picking of spins would generate spatial correlations:

i = 0
while i < N :
    i += 1
    s = tuple(npr.randint(0, L, 2))

This last line is a slightly different random integer generator than random_integers() that returns two values in the range (0, L–1) (yes, here it’s inclusive-exclusive!). This NumPy array, converted to a tuple s=(sx, sy), will be indices into the array spins.

 1 -1 -1 sl  s sr  1 sb  1 -1
-1 -1 -1  1 sb -1 -1  1  1  1
 1 -1  1  1 -1  1  1 -1 -1  1
-1 -1  1 -1  1  1 -1 -1  1 st
sr -1 -1  1  1 st  1  1 sl  s
 1 -1 -1 -1 sl  s sr  1  1 sb
st -1 -1  1 -1 sb -1 -1 -1 -1
 s sr  1 -1 -1 -1  1 -1 -1 sl
sb -1  1  1 -1 -1 -1 st -1  1
-1 -1  1  1 st -1 sl  s sr  1

Now locate the indices of the adjacent spins, left, right, bottom, and top:

(sx, sy) = s
sl = (sx-1, sy)
sr = ((sx+1)%L, sy)
sb = (sx, sy-1)
st = (sx, (sy+1)%L)

When the spin s is at the left or bottom, one index is 0, and (0-1)-1. But Python defines slices so that negative indices count backward from the largest value, i.e. the index -1 is equivalent to the index L-1, again wrapping around to the right or top edge.

On the other hand, when the spin s is on the right or top edge, one index is L-1, so that (L-1+1)L, which Python treats as out-of-bounds. But we can use the modulus operator %, which provides the remainder when dividing two integers, to always produce a value within the range (0, L–1), specifically (L-1+1)%L0, wrapping around to the left edge or bottom edge.

These are known as circular boundary conditions and they help to reduce errors associated with finite lattice size.

Now calculate the spin sum Si = siΣnnsinn:

    S = spins[s] * ( spins[sl] + spins[sr] + spins[sb] + spins[st] )

Finally test for the energy condition:

    if S <= 0 :
        spins[s] *= -1
    else:
        W = np.exp(-beta*S)
        R = npr.rand()
        if R < W : spins[s] *= -1

There is another random number generator here, rand(), which returns a floating point number between 0 (inclusive) and 1 (exclusive).

Once i reaches N, this loop terminates. We can calculate the matrix and magnetization at this time to see how they have changed, with the following statements that are not indented so they are only calculated once:

print(spins)
print('M = ', M())

Passing through the lattice once, on average, represents a single unit of time, since the entire lattice should have the chance to update once, no matter what its size. To run the loop for multiple units of time, define a number of time steps tmax and multiply it by the lattice size:

i = 0; tmax = 100
while
i < tmax * N :
    i += 1
    s = tuple(npr.randint(0, L, 2))
    sl = (sx-1,sy)
    sr = ((sx+1)%L,sy)
    sb = (sx,sy-1)
    st = (sx,(sy+1)%L)
    S = spins[s] * ( spins[sl] + spins[sr] + spins[sb] + spins[st] )
    if S <= 0 :
        spins[s] *= -1
    else:
        W = np.exp(-beta*S)
        R = npr.rand()
        if R < W : spins[s] *= -1

Problem 5: Ensemble Average

The calculations above are for particular lattice sizes, and over particular periods of time. But they are also for particular configurations of spins. The most likely state is calculated by considering the probabilities of all configurations, known as an ensemble, and averaging them. Since a large lattice will have a huge number of configurations 2N, it’s impractical to calculate every one.

However, a random selection of spins such as above can be repeated multiple times, and that should sample the configuration space, providing a good approximation that can be averaged.

Add a loop to the above that repeats Z = 100 times, and calculate the average magnetization:

M⟩ = (1/Z) ΣZ M

Also calculate the average square magnetization:

M2⟩ = (1/Z) ΣZ M 2

and then the standard error of the mean value of M:

σM = sqrt[(⟨M2⟩ – ⟨M2)/Z]


High-Pyrformance Computing

Simulating a large number of variations of a physical model can take a lot of CPU time, and computing clusters provide a lot of CPUs.


The Amherst College Computing Cluster

Larger lattices can take a long time to process in the loop above, and as in Homework #5 they should then be averaged over a number of different configurations. While the script above should run well in Spyder, it will be very slow overall.

We can therefore turn to the Amherst College Computing Cluster, which provides 452 CPU cores on 76 Unix computers, and has Python already installed.

On the cluster we can run a number of calculations, or jobs, on these multiple cores and for longer periods of time.

Cluster jobs are prepared, tested, and submitted on an entry computer or head node, which you can access over the Internet. The College Computing Cluster’s head node has the domain name cluster.amherst.edu, or simply cluster while on campus.

Procedure 1: Connecting to the Computing Cluster

The Cluster’s computing environment requires the use of text commands rather than a graphical user interface, so you’ll need to open a command-line interpreter, also called a “terminal” or “shell” program, and log in to the head node of the cluster over the network.

  1. Connect to cluster:
    • On the Mac:
      1. Navigate to the folder Applications and then the folder Utilities, and open the application Terminal.
      2. After the command prompt <username>$, type in the “secure shell” command
      3. ssh amherstusername@cluster.amherst.edu

        and press the <enter> key. On campus you can leave off .amherst.edu.

      4. If you are asked if you want to continue connecting, type yes.
      5. When asked for your password, enter your Amherst password.
    • Putty Configuration DialogOn Windows:
      1. Obtain the application PuTTY from the Amherst Software drive or from the Internet (TTY is an abbreviation for a teletypewriter, an old type of computer terminal).
      2. When you open PuTTY, you’ll see the configuration dialog at the right; in the field Host Name, type in:
      3. cluster.amherst.edu

      4. In the field Saved Sessions, type in:
      5. Cluster

      6. Click the button Save.
      7. Click the button Open.

      In subsequent sesssions you can simply click on the saved session Cluster and then click on the buttons Load and Save to use this configuration.

  2. At the login: prompt, provide your Amherst username and password.

    The first time you log in to the cluster with a shell, your account will be configured for use (file sharing is not available until after that).

    Then you will see the command prompt [<username>@csc2-node01 ~]$. Here “csc2-node01” refers to the cluster head node, and ~ refers to your home folder, on Unix called a directory.
  3. Like the front-most window on Macs or Windows, Unix maintains a current directory, from which all file references are based. To see what the full path is to that directory, type the Unix “print working directory” command:
  4. pwd
    ⇒ /home/username

  5. To see the files present in your home directory, type the Unix “list” command:
  6. ls
    ⇒ cluster-archive cluster-scratch

    The first is a folder to store old projects; the second is a folder for current projects.

    Both of these are actually network drives that are mounted in your home directory; on Macs and Windows such disks are always mounted at the top level of your computer (or at least appear that way).

  7. “Change directory” into cluster-scratch:
  8. cd cluster-scratch

  9. It’s best to have a dedicated folder/directory for each project, so “make a new directory” in cluster-scratch to contain your script and associated files, e.g.:
  10. mkdir ising

  11.  Now “change directory” into this new folder, e.g.:
  12. cd ising

Procedure 2: Sharing Files with the Computing Cluster

To submit jobs to the cluster, you must copy your Python script over to the head node of the cluster.

  1. To copy files to cluster, you must use either file sharing or SFTP (follow these links for explicit instructions).

    File sharing will probably be more familiar to you, as the network drives appear as if they are local, like your U:/Userfiles drive.

    File sharing also allows you to continue to edit within Spyder and retain permissions you’ve set, so it’s recommended.

    You can connect to the same network drives as seen in Procedure 1:
    • Your home directory, referred to either by your username or homes;
    • The folder to store old projects, cluster-archive;
    • The folder for current projects, cluster-scratch.

    As each of these are distinct network drives, they must be mounted separately on your computer (note: the second two will also appear as subfolders of the first, but are not accessible that way, only when logged in to cluster itself, as in Procedure 1).

    • On Macs: Cluster Drive OptionsAfter you connect, you’ll see a dialog such as the one at the right; click on cluster-scratch and click the button OK.
    • On Windows: You must type in a full network drive path, viz. \\cluster.amherst.edu\cluster-scratch
  2. The folder cluster-scratch should open, and you should see any subfolders you might have created in Procedure 1, e.g. ising.

    You can now copy files into and out of it by dragging them, etc.

    Warning: Unix, Windows, and Macs all use different characters to indicate the end of line in a text file; usually they are successfully translated when copying between them, but occasionally they won’t be and error messages may appear in the following steps. This most commonly occurs when transferring from a Windows computer, and if that’s the case you can use the Unix utility dos2unix to ensure a file created on Windows with DOS line-endings has the correct format. Once you’ve converted the file Spyder will respect that end-of-line setting.

Python Scripts on the Computing Cluster

Once you’ve mapped a network drive to your local computer as in Procedure 2, you can also use the Spyder IDE to read and write files on it.

Procedure 3: Setting Up Spyder to Use the Computing Cluster

To work with your Python script on the computing cluster, it’s simplest to set up Spyder to reference your working folder in cluster-scratch.

  1. In Spyder, click on the button  Browse a working directory and navigate to the folder connected to in Procedure 2, e.g. ising. This folder will remain in the adjacent list of working directories for easy access.
  2. To have iPython see this same directory, using one of the following options:
    • Click on the IPython button  Set as current console’s working directory; however this doesn’t work in recent versions of IPython.
    • Restart IPython by menuing  Options >  Restart kernel. Unfortunately all previous assignments and definitions will be lost, though command history is not.
    • Type IPython’s built-in “change directory” command, though you’ll have to type out the full path:
      • Macs: %cd /Volumes/cluster-scratch/ising
      • Windows: %cd Z:\ising

      The % is only necessary if you have already defined a Python variable named cd.

    To verify that the directory change worked, type IPython’s built-in “print working directory” command:

    %pwd

    ⇒ '/Volumes/cluster-scratch/ising'

  3. Copy your Python script, e.g. monte.py, to the folder in step 1, from within Spyder by menuing File > Save as….

When running your script on the Computing Cluster, there are a few changes you’ll want to make so that it works more simply with Unix and the Cluster’s job management software.

Procedure 4: Making a Python Script Self-Executable

To simplify the execution of your script, you can make it run as a “self-executable” Unix application, rather than as a file that’s an argument for the Python interpreter.

  1. So that Unix knows which interpreter to use for your script, add the following statement as the first line of code in your script, with no preceding space, and save it:
  2. #! /usr/local/bin/python3

    The first two characters are known as a shebang, and tell Unix to run the script with the specified interpreter.

    The full path to the interpreter should be used, but be aware that the Python interpreter’s location can be different for different versions of Python, and can also vary between the Cluster and other Unix computers. In particular, Python 2 is currently the default version on the Cluster, and is located at /usr/bin/python, while Python 3 is located in in the “local binary” folder .

    Because this statement appears as a comment to Python, it’s safe to leave it in even when running the script on your local computer.

  3. To get Unix to recognize that your script can be executed directly you must now switch to your command-line program set up in Procedure 1 , make sure you are in the same directory as your script, and “change mode” to include executable status, e.g.:
  4. chmod +x monte.py

  5. To review the executable status of your script (and other information), you can use the Unix “list -long” command:
  6. ls -l
    ⇒ -rwxr--r-- 1 username username 1012 Dec 16 15:55 monte.py

    The three groups of permissions here, rwx, r--, and r-- refer to read, write, and execute permissions for you (the user), your group, and all others. It is followed by your username and your group name, and the last-modified date of the file.

  7. Now you can run your executable; for security reasons you must explicitly refer to its directory, and since your current directory has the abbreviation . the command will look something like:
  8. ./monte.py

    Mavg = 0.772
    sigmaM = 0.0367717282705

One additional useful capability is to be able to hand your executable one or more arguments that could have different values for different jobs in the cluster.

In Unix, such command-line arguments are typed following the initial command and separated by spaces, e.g.

./monte.py 10

where 10 might be a specific value of a variable such as beta.

If you import the “system” or sys module into your executable:

import sys

then you will have access to command-line arguments, passed to your executable as a list of strings:

sys.argv

⇒ ['./monte.py', '10']

beta = float(sys.argv[1])

⇒ 10.0

Note that item 0 in this list is set to the command you used to run the executable.

In general you will want to test to make sure that the expected number of arguments are provided, and either set a default value for missing arguments or exit the program with an error message (another function provided by the sys module):

if len(sys.argv) < 2 :
    sys.exit('Usage: ' + sys.argv[0] + ' beta')
else:
    beta = float(sys.argv[1])

Then when running the executable without arguments:

./monte.py

⇒ Usage: ./monte.py beta

One other important change for your executable is to format its output to make it easier to analyze large quantities of data; most simply, you can write comma-separated values so that the results of different jobs can all be concatenated together into a table:

print('L,beta,Mavg,sigmaM')
print('%d,%0.2f,%0.2f,%0.3f' % (L,beta,Mavg,sigmaM))


L,beta,Mavg,sigmaM
10,10.00,0.77,0.038


The Condor Job-Management System

The Condor job-management software provides an efficient means to control a project running on a large number of computers.

A set of related jobs are submitted as a cluster of jobs that are automatically distributed over any available computer cores.

By design CPU time is shared with other researchers; the general behavior is that newer clusters are given priority, and the longer they run the lower their priority becomes.

Along with your executable program, Condor will need a command file for each job cluster, which provides details on how to run each job, including any command-line arguments that might vary from job to job.

The basic structure of a command file is a series of key-value pairs, which describe either global job properties or individual job properties, for example:

## Global job properties
universe     = vanilla
notification = error
notify_user = username@amherst.edu initialdir = /mnt/scratch/username/ising executable = monte.py requirements = (((Arch=="INTEL") || (Arch=="x86_64")) && (OpSys=="LINUX")) priority = 5 ## Job properties arguments = 0.1 output = /mnt/scratch/username/ising/results/0/out error = /mnt/scratch/username/ising/results/0/err log = /mnt/scratch/username/ising/results/0/log queue ## Job properties arguments = 0.2 output = /mnt/scratch/username/ising/results/1/out error = /mnt/scratch/username/ising/results/1/err log = /mnt/scratch/username/ising/results/1/log queue ...

As with Python, a # introduces a comment.

The keyword universe describes possible modes of running your executable; when the jobs are independent of each other (don’t need to pass information amongst themselves), the proper value is vanilla. There are other possiblities for more specialized applications, e.g. for code compiled with parallel-processing libraries.

The keyword notification determines whether an e-mail will be sent for certain situations:

  • complete (the default): upon job completion, whether properly or due to errors
  • error: only when job errors occur
  • never

The complete option is not generally recommended, because it can generate hundreds of e-mails, one for each job, which look like:

This is an automated email from the Condor system
on machine "csc2-node01.amherst.edu". Do not reply.

Condor job 150601.0
/mnt/scratch/username/ising/monte.py 0.1
has exited normally with status 0
....

If you set notification to something other than never, you should include your e-mail address using the notify_user keyword.

The keywords initialdir and executable are the initial directory, where your files are located, and the application to run, which should be in that same directory.

The keyword requirements describes various node characteristics necessary for your application, in particular the hardware architecture (Arch) and operating system (OpSys), together known as the node platform. Since Condor runs on heterogeneous equipment, but many applications cannot, this keyword is important to set properly, especially since the default platform characteristics are the same as the head node. The requirements can both be limited with an AND statement (&&) or expanded with an OR statement (||). For the Amherst College Computing Cluster the value combination above describes all of its platforms, which all run the Linux operating system, but which are either AMD or Intel CPUs — and therefore run the same compiled binary code.

The keyword priority is any integer and is a relative priority for your own jobs — a lower priority, such as 0 (the default), will be run only after your other jobs with higher priority have run. This doesn’t effect job priority relative to other users, which is set by Condor using its own algorithm.

The keyword arguments is a list of text values that will be handed to your application, e.g. monte.py 0.1.

The keywords output, error, and log describe files to store, respectively, standard output from your application (what would appear as output if you ran it directly), any error messages that occur, and Condor’s log of running the job providing information similar to the e-mail above.

The keyword queue tells Condor to submit one job with the previous set of characteristics. You can then change a few of them for the next job, and queue that.

There are many more command-file keywords that might occasionally be useful and that are described in the Condor documentation starting on p. 717.

Once you’ve created your command file, use condor_submit to send your project to the Condor job queue.

Example: Creating a Condor Command File Using Python

A Condor command file is required to run a project on the Computing Cluster, to describe the characteristics of each job in the project cluster.

The command file can be very long and the arguments and files will vary from job to job, so it’s convenient to create it from a program, using almost any language, but the example here will be Python (of course). This is especially useful when job arguments can be generated algorithmically.

For the Monte Python simulation above, an example Condor command file generator “montecmd.py” is downloadable here, and is detailed below:

  1. As with your application, you’ll need to reference the full path to the location of the Python intrepreter with a shebang on the first line of the generator:
  2. #! /usr/local/bin/python3

  3. Because the command file references locations in the file directory, the generator can make use of the “operating system” module os to obtains its own “current working directory” using os.getcwd(), so that it can create the command file and the results directories in the same place:
import os, math

# Create file references for this project:
projectName = 'monte'
executable = projectName + '.py'
baseDirectory = os.getcwd()
commandFile = baseDirectory + '/' + projectName + '.cmd'

The generator first writes out global job parameters into the command file:

  • a “vanilla universe” (meaning no communication between the different jobs);
  • notification on error, but not upon job completion (you might want to change this to complete for a long-running job);
  • the locations of the initial directory and the project executable;
  • hardware requirements (here redundant since all of the cluster CPUs match);
  • the project running priority, which you can use to prioritize your own projects (your priority relative to others is beyond your control):
# Write the global parameters needed for all jobs into the command file: 
commands = open(commandFile, 'w')
commands.writelines(['## Global job properties\n',
	'universe =     vanilla\n',
	'notification = error\n',
	'notify_user  = username@amherst.edu\n',
'getenv = true\n', 'initialdir = ' + baseDirectory + '\n', 'executable = ' + executable + '\n', 'requirements = (((Arch=="INTEL") || (Arch=="x86_64")) && \ (OpSys=="LINUX"))\n', 'priority = 5\n'])

The results directory is created using more functions from the os module, with rwx permissions for yourself but not your group (---) or others (---), which can be written as a binary number 0b 111 000 000 or as an octal number 0o700 (the default is world-writable 0o777, which may be necessary for some Condor configurations):

# Create the results directory and limit accessibility to yourself with
# the permission rwx --- --- (0o700):
baseResultsDirectory = baseDirectory + '/results'
if not os.path.exists(baseResultsDirectory) : 
    os.mkdir(baseResultsDirectory, 0o700)

The job arguments are generated with a simple algorithm for the values of beta to be used, which are placed in a list and counted:

# Job arguments, here [0.1, 0.2, 0.3, …, 1.3, 1.4, 1.5]:
betaValues = [(x+1)/10 for x in range(10)] 
jobsCount = len(betaValues)

Each job will have its own results directory within the main results directory, and giving them a uniform-length filename will allow them to be sorted in numerical order:

# Create a format string that will produce results directory names that 
# have a standard length for sorting purposes, e.g. 'results/%02d'
#  => 'results/00', ..., 'results/09', 'results/10', ..., 'results/99'
resultsDirectoryFormat = baseResultsDirectory + '/%0' + \                          str(math.floor(math.log10(jobsCount - 1)) - 1) + 'd'

Now loop through the jobs and create their individual results directories by applying the string format operator:

# Loop through the parameters to be passed for each job:
for job in range(jobsCount) :

    # Create the results directory for this specific job.
    resultsDirectory = resultsDirectoryFormat % job
    if not os.path.exists(resultsDirectory) : 
        os.mkdir(resultsDirectory, 0o700)

In this loop the parameters for each job are also written into the command file:

  • the argument(s) for each job are the corresponding value of beta;
  • the output file has the data your executable writes;
  • the error file describes issues that occur as the executable runs, e.g. due to Python syntax errors;
  • the log file has more general details about running the job, and can tell you about non-Python issues such as file permission issues:
    # Write the arguments for this job.
    commands.writelines(['\n## Job properties\n',
    	'arguments = ' + str(betaValues[job]) + '\n',
    	'output = ' + resultsDirectory + '/out\n',
    	'error = ' + resultsDirectory + '/err\n',
    	'log = ' + resultsDirectory + '/log\n',
    	'queue\n'])

Finally, close the command file:

commands.close()

The command file generator should be saved on the computing cluster in the same location as your project executable.

As with your executable, the command file generator should be provided with executable permissions:

chmod +x montecmd.py

The command file generator can then be run to create the command file and the results directories:

./montecmd.py

And you should also verify the presence of the results directories:

ls results
⇒
00  01  02  03  04  05  06  07  08  09  10  11  12  13  14  15

 

Procedure 6: Running a Project on the Computing Cluster

Once you have an application and a Condor command file describing how to run it, such as described above, there are just a few commands to know to get it running on the cluster:

  1. To submit a project command file to Condor:
  2. condor_submit monte.cmd
    ⇒
    Submitting job(s)
    Logging submit event(s)..........
    15 job(s) submitted to cluster 149415.

    Note the cluster ID 149415, which allows you to reference this cluster of jobs.

  3. To monitor your Condor jobs:
  4. condor_q username
    ⇒
    -- Submitter: csc2-node01.amherst.edu : <148.85.78.21:46911>
       ID      OWNER        SUBMITTED     RUN_TIME ST PRI SIZE CMD               
    149415.0   username   12/18 16:56   0+00:03:34 R  5   9.8  monte.py 0.1      
    149415.1   username   12/18 16:56   0+00:03:34 R  5   9.8  monte.py 0.2      
    
    149415.14  username   12/18 16:56   0+00:03:34 R  5   9.8  monte.py 1.4      
    149415.15  username   12/18 16:56   0+00:03:34 R  5   9.8  monte.py 1.5      
    
    15 jobs; 0 idle, 15 running, 0 held

    The individual jobs are listed line-by-line and identified as <clusterID>.<processID>, e.g. 149415.0, 149415.1, etc.

    The actual command that Condor is running is shown at the end, and you should verify that it’s what you expect.

    The status of each job is listed in the ST column, with R for running, I for idle, and H for on hold. Idle jobs are waiting for their turn to run, and will appear most commonly for long-running jobs. Held jobs are generally put in that state manually with the command condor_hold, but can sometimes end up there due to errors in the job.

  5. To remove Condor jobs, particularly important if the results aren’t what you expect, your jobs are unexpectedly on hold, etc.:
  6. condor_rm 149415 149416.9

    The command arguments are a list of cluster IDs and/or cluster.process IDs, or your user ID to remove all of your jobs.

  7. If your job does not complete, or has unexpected results, check the err and log files in each result directory for messages that describe any issued with your code or the file itself.

There are many more commands that might occasionally be useful and which are described in the Condor documentation.

 

Procedure 7: Collecting Your Results

The results of your cluster job are now stored in the directories results/00/out, results/01/out, .

  1. To collect your output together in one file:
  2. cat results/*/out > monte.csv
    

    The wild card character * will match any of the job directories 0, 1, , and the command cat concatenates them together. The standard output redirect character > takes the result, which would otherwise be printed, and places it in the new file named monte.csv in your current directory.

  3. The content of monte.csv is in the format known as comma-separated values, which can be easily read with Excel and other programs:
  4. cat monte.csv
    ⇒
    L,beta,Mavg,sigmaM
    10,0.10,0.077,0.006
    L,beta,Mavg,sigmaM
    10,0.20,0.105,0.008
    …
    L,beta,Mavg,sigmaM
    10,1.40,0.978,0.011
    L,beta,Mavg,sigmaM
    10,1.50,0.978,0.013  
        
  5. The header lines, included with each file in case you need to look at them individually, can be removed by sorting with Excel, selecting them, and deleting them.

    More esoterically the extra lines can be removed with the Unix stream editor:
  6. cat results/*/out | sed -e 1n -e /L/d > monte.csv

    The pipe character | redirects the standard output of the cat command into the standard input of the next command, sed (a combination known as a pipeline).

    The sed command applies two edits with the option -e, the first 1n to write the first line and move on to the next, the second /L/d to match any other line that includes the text L and delete it:

    cat monte.csv
    ⇒
    L,beta,Mavg,sigmaM
    10,0.10,0.077,0.006
    10,0.20,0.105,0.008
    …
    10,1.40,0.98,0.011 10,1.50,0.98,0.013

Unix has a wealth of commands to process the content of files, you can find a list of them on Wikipedia.

 

Problem 6: The Full Monte

Using the computing cluster, repeat Problem 5 over larger arrays and longer times, by generalizing the command-line input of your executable to allow run-time choices of these values. Calculate in each case the magnetization:

M⟩ = (1/Z) ΣZ M

the average square magnetization:

M2⟩ = (1/Z) ΣZ M 2

and then the standard error of the mean value of M:

σM = sqrt[(⟨M2⟩ – ⟨M2)/Z]

Previous:
Newton’s Laws of Motion

Scientific Programming with Python:
Simulating a Magnet

 
 

Top of Page  | Using IT DocumentationIT Home  |  IT Site Map  |  Search IT