Using MPI Applications on the MS Cluster

Here are details on how to build and run an MPI application for the MS Cluster. This was all done on a Windows XP machine with Visual Studio 2005. I installed the MPICH2 windows binaries for local testing (my debug target), and the MSMPI SDK libraries for the executable for the cluster (my release target). There are more details in the Users Guide linked from the main course page. If you find any problems with the details in this guide, let Nathan know.

Make sure when logging into the MS cluster that you use msftlabs\[username] as your username.

Here are the details for building this simple MPI application (C code from a C++ VS project):

Create a new C++ Win32 Console Application Project - I removed the default headers and source files and added the above file to the project
Add to the include path the Configuration Properties->C/C++->General->Additional Include Directories:
- For Debug build, I pointed to the include directory of MPICH2
- For Release build, I pointed to the include directory of MSMPI (C:\Program Files\Microsoft Compute Cluster Pack\Include by default)
Add the MPI libraries to the Configuration Properties->Linker->Input->Additional Dependencies:
- For Debug build, I pointed to the MPICH2 mpi.lib file
- For Release build, I pointed to the MSMPI library (C:\Program Files\Microsoft Compute Cluster Pack\Lib\i386\msmpi.lib)
After building the release executable, place the executable on the cluster. Command line details:
- ftp hpc.msftlabs.com
- login with your user name prefixed with 'msftlabs\'
- cd user/[your user name]
- put [exe_name].exe
- Also place 3 empty files into your directory (in.txt, out.txt, err.txt)
- See the guide for more info.
I've experienced some problems with using the interface to schedule a job, so I used the javascript version for submitting jobs with this jsdl file. The jsdl file contains the number of processors, as well as the name of the executable, and your user name which will need to be changed as appropriate. After making those changes here are the command line details for scheduling a job:
- job.js submit /jobfile:job.jsdl /user:msftlabs\[username] /password:[password]
- Provided everything works well, you should get a couple of popups describing the job id for the job.
- Check job status details on the cluster web interface, the details provided by the javascript are very terse. You can also see how many jobs are in front of you by changing the "Submitted by:" filter to "Show all"
If the job was successful, you can retrieve the output via the same ftp method given above. Once in your directory, do a 'get out.txt' and check the results. I noticed that when running the reduction with 10 jobs, that the result of the reduction was a bit off '66' from the expected '45'. It seems that it was run on 12 machines instead of 10.