Using MPI Applications on the MS Cluster
Here are details on how to build and run an MPI application for the MS Cluster.
This was all done on a Windows XP machine with Visual Studio 2005. I installed
windows binaries for local testing (my debug target), and the
MSMPI SDK libraries for
the executable for the cluster (my release target). There are more details in the Users Guide linked from the main course page. If you find any problems with the details in this guide, let Nathan know.
Make sure when logging into the MS cluster that you use msftlabs\[username] as your username.
Here are the details for building this simple MPI application (C code from a C++ VS project):
- Create a new C++ Win32 Console Application Project - I removed the default headers and source files and added the above file to the project
- Add to the include path the Configuration Properties->C/C++->General->Additional Include Directories:
- For Debug build, I pointed to the include directory of MPICH2
- For Release build, I pointed to the include directory of MSMPI
(C:\Program Files\Microsoft Compute Cluster Pack\Include by default)
- Add the MPI libraries to the Configuration Properties->Linker->Input->Additional Dependencies:
- For Debug build, I pointed to the MPICH2 mpi.lib file
- For Release build, I pointed to the MSMPI library
(C:\Program Files\Microsoft Compute Cluster Pack\Lib\i386\msmpi.lib)
- After building the release executable, place the executable on the cluster. Command line details:
- ftp hpc.msftlabs.com
- login with your user name prefixed with 'msftlabs\'
- cd user/[your user name]
- put [exe_name].exe
- Also place 3 empty files into your directory (in.txt, out.txt, err.txt)
- See the guide for more info.
- job.js submit /jobfile:job.jsdl /user:msftlabs\[username] /password:[password]
- Provided everything works well, you should get a couple of popups describing the job id for the job.
If the job was successful, you can retrieve the output via the same ftp method given above. Once in your directory, do a 'get out.txt' and check the results. I noticed that when running the reduction with 10 jobs, that the result of the reduction was a bit off '66' from the expected '45'. It seems that it was run on 12 machines instead of 10.