50291: Developing High-performance Applications using Microsoft Windows HPC Server 2008 (5 Days)
About this Course
This five-day instructor-led course provides students with the knowledge and skills to develop high-performance computing (HPC) applications for Microsoft Windows HPC Server 2008. Students learn about the product Microsoft Windows HPC Server 2008, and how to design, debug, tune and run high-performance computing applications under HPC Server 2008. Students also learn the most compelling technologies for building HPC applications, including parametric sweep, multi-threading, OpenMP, .NET Task Parallel Library, MPI, MPI.NET, and HPC Server's SOA-based approach. Students program in Visual C++ as well as C#, and work with both managed and unmanaged code.
Audience Profile
This course is intended for software developers who need to develop long-running, compute-intensive, or data-intensive apps targeting multi-core and cluster-based hardware. No experience in the field of high-performance computing is required.
At Course Completion
After completing this course, students will be able to:
- Understand the goals of the high-performance computing (HPC) field.
- Measure and evaluate the performance of HPC apps.
- Design HPC apps using the a variety of technologies: parametric sweep, threads, OpenMP, MPI, and SOA.
- Design HPC apps targeting a variety of hardware: from single-core to multi-core to cluster-based.
- Implement HPC apps using C++ or C#.
- Integrate HPC apps with Windows HPC Server 2008, including a client-friendly front-end.
- Performance tune HPC applications under Windows HPC Server 2008.
- Setup and configure a standard cluster running Windows HPC Server 2008.
Course Outline
Module 1: Introduction to High-Performance Computing and HPC Server 2008
This module introduces the field of high-performance computing, the product Microsoft Windows HPC Server 2008, and developing software for HPCS-based clusters.
Lessons
- Motivation for HPC
- Brief product history of CCS and HPCS
- Brief overview of HPC Server 2008 — components, job submission, scheduler
- Product differentiators
- Software development technologies: parametric sweep, threads, OpenMP, MPI, SOA, etc.
- Measuring performance — linear speedup
- Predicting performance — Amdahl’s law
Lab : Introduction to HPC And Windows HPC Server 2008
- Submitting and monitoring jobs
- Running an HPC app
- Measuring performance
- Measuring the importance of data locality
Module 2: Multi-threading for Performance
This module introduces explicit, do-it-yourself multi-threading in VC++ and C#.
Lessons
- Multi-threading for responsiveness and performance
- The costs of multi-threading
- Structured, fork-join parallelism
- Multi-threading in C# using the .NET Thread class
- Multi-threading in VC++ using the Windows API
- Load balancing
- Scheduling multi-threaded apps on Windows HPC Server
Lab : Multi-threading in VC++ and C#
- Creating a multi-threaded app
- Running and measuring performance locally
- Running and measuring performance on the cluster
Module 3: The Dangers of Multithreading
This module discusses the risks of multi-threaded programming (and concurrent programming in general), and then presents strategies for solving the most common pitfalls.
Lessons
- Race conditions
- Critical sections
- Starvation
- Livelock
- Deadlock
- Compiler and language implications
- Memory models
- Locks
- Interlocking
- Lock-free designs
Module 4: The HPCS Job Scheduler
This module introduces the heart of HPCS-based clusters — the Job Scheduler.
Lessons
- Throughput vs. performance
- Nodes vs. sockets vs. cores
- Jobs vs. Tasks
- Job and task states
- Default scheduling policies
- The impact of job priorities and job preemption
- Job resources and dynamic growing / shrinking
- Submission and activation filters
Lab : Working with the Job Scheduler
- Environment variables in HPC Server 2008
- Exit codes and denoting success / failure
- Checkpointing in case of failure
- Multi-task jobs and task dependences
Module 5: Parallel Application Design
This module discusses common design patterns for parallel apps, along with HPCS-specific design issues.
Lessons
- Two sample design problems…
- Foster’s method
- Common problem decompositions
- Common communication patterns
- Computation vs. communication
- Design patterns: master-worker, pipeline, map-reduce, SOA, parametric sweep, and more
Module 6: Introduction to OpenMP
This module introduces OpenMP — Open MultiProcessing — for shared-memory, multi-threaded programming in VC++.
Lessons
- What is OpenMP?
- Shared-memory programming
- Using OpenMP in Visual Studio with VC++
- Parallel regions
- Execution model
- Data parallelism
- Load balancing, static vs. dynamic scheduling
- Scheduling OpenMP apps on Windows HPC Server
Lab : Intro to OpenMP
- Creating a simple OpenMP app from scratch
- Using OpenMP to parallelize an existing application
- Running and measuring performance locally
- Running and measuring performance on the cluster
Module 7: Running and measuring performance on the cluster
Running and measuring performance on the cluster
Lessons
- Barriers
- Critical sections
- Synchronization approaches
- Implementing common design patterns — conditional, task, master-worker, nested
- Data coherence and flushing
- Environment variables
- Common pitfalls
Module 8: Introduction to the .NET Task Parallel Library
This module introduces the Task Parallel Library (TPL) for shared-memory, multi-threaded programming in .NET 4.0.
Lessons
- What is the TPL?
- Moving from threads to tasks
- Using the TPL in Visual Studio with C#
- Execution model
- Parallel.For
- Data and task parallelism
- Synchronization approaches
- Concurrent data structures
- Scheduling TPL-based apps on Windows HPC Server
Lab : Intro to the TPL
- Creating a simple TPL-based app from scratch
- Using the TPL to parallelize an existing application
- Running and measuring performance locally
- Running and measuring performance on the cluster
Module 9: Interfacing with HPCS-based Clusters
This module demonstrates the various ways you can interface with Windows HPC Server 2008, in particular using the HPC Server 2008 API.
Lessons
- Cluster Manager
- Job Manager
- Job Description Files
- clusrun
- Console window
- PowerShell
- Scripts
- Programmatic access via HPCS API v2.0
Lab : Interfacing with Windows HPC Server 2008
- Clusrun is your friend
- Scripting
- Using the HPCS API to submit and monitor a job
Module 10: Intro to SOA with HPC Server 2008
This module presents one of the most interesting and unique features of Windows HPC Server 2008 — service-oriented HPC.
Lessons
- Service-oriented architectures
- SOA and WCF
- Mapping SOA onto Jobs and the Job Scheduler
- Private vs. shared sessions
- Secure vs. insecure sessions
Module 11: Create SOA-based Apps with HPC Server 2008
This module presents the details of building a SOA-based HPC app, from start to finish.
Lessons
- Service-side programming
- Service configuration
- Client-side programming
- WCF configuration and tracing
Lab : SOA-based HPC with HPCS and WCF
- Creating an SOA-based HPC app from start to finish
- Service-side…
- Client-side…
Module 12: General Performance Tuning of Parallel Applications
This module discusses various performance tuning strategies on Windows for parallel apps.
Lessons
- Performance counters
- Heat map in Windows HPC Server 2008
- Customizing the heat map
- perfmon
- xperf (aka the Windows Performance Toolkit)
- SOA tuning
- What to look for…
- Other tools
Module 13: Introduction to MPI
This module introduces *the* most common approach to developing cluster-wide, high-performance applications: the Message-Passing Interface.
Lessons
- Shared-memory vs. distributed-memory
- The essence of MPI programming — message-passing SPMD
- Microsoft MPI
- Using MSMPI in Visual Studio with VC++
- Execution model
- MPI Send and Receive
- mpiexec
- Scheduling MPI apps on Windows HPC Server
Lab : Introduction to MPI
- Creating a simple MPI app using Send and Receive
- Running and measuring performance locally
- Running and measuring performance on the cluster
Module 14: Data Parallelism and MPI’s Collective Operations
This module discusses data parallelism in MPI, and how best to build data parallel MPI apps using its collective operations.
Lessons
- Data parallelism in MPI
- A real world example
- Broadcast
- Scatter
- Gather
- Barriers
- Reductions
- Defining your own reduction operator
- Common pitfalls
Lab : Data Parallelism and MPI’s Collective Operations
- Parallelizing an existing MPI application
- Mapping Sends and Receives to Broadcast, Scanner, Gather, and All_reduce
- Running and measuring performance locally
- Running and measuring performance on the cluster
Module 15: MPI.NET
This module overviews MPI.NET, a .NET wrapper around MSMPI.
Lessons
- Why MPI.NET?
- Using MPI.NET in Visual Studio with C#
- Type-safe Send and Receive
- Collective operations in MPI.NET
- Execution model
- Scheduling MPI.NET apps on Windows HPC Server
Module 16: Using MPI — Debugging, Tracing, and Other Tools
This module dives into the practical realities of using MPI and MPI.NET — debugging, tracing options, and other tools of interest.
Lessons
- Local debugging with Visual Studio
- Remote debugging with Visual Studio
- General MPI tracing
- Tracing with ETW (Event Tracing for Windows)
- Trace visualization
- Other tools for MPI developers
Lab : MPI Debugging and Tracing
- Debugging with Visual Studio
- Tracing with ETW
- Viewing traces with Jumpshot and Vampir
Module 17: Designing MPI Applications
This module presents the most common design issues facing MPI developers.
Lessons
- Hiding latency by overlapping computation and communication
- Avoiding deadlock
- Hybrid designs involving both MPI and OpenMP
- Buffering
- Error handling
- I/O and large datasets
Module 18: MPI-2
This module summarizes the advanced features of MPI-2 and MSMPI.
Lessons
- Groups
- Communicators
- Topologies
- Non-scalar data: packing/unpacking, non-contiguous arrays, and user-defined datatypes
- MPI I/O
- Remote memory access
- [ Dynamic process creation is not supported in MSMPI ]
Lab : Working with Advanced Features in MPI-2
- MPI Topologies
- MPI Data types
Module 19: Excel-based HPC Apps
This module presents techniques for bringing the potential of high-performance computing to the world of spreadsheets.
Lessons
- Excel as a computation engine
- Performing Excel computations on Windows HPC Server 2008
- Using Excel Services
- Using Excel UDFs
- Future versions of Excel and HPC Server
Module 20: Porting UNIX apps to Windows HPC Server 2008
This module discusses strategies for porting UNIX applications to Windows HPC Server 2008.
Lessons
- The most common porting issues
- 32-bit to 64-bit
- UNIX calls
- Manual porting of UNIX code
- Cygwin
- MinGW
- Microsoft SUA— Subsystem for UNIX-based Applications
Module 21: Open Grid Forum HPC Basic Profile
This module introduces the OGF’s HPC Basic Profile, and how to enable support in Windows HPC Server 2008.
Lessons
- What is the OGF HPC Basic Profile?
- Platform-neutral job submission
- JSDL — Job Submission Description Language
- Enabling in Windows HPC Server 2008
Module 22: Setup and Administration of Windows HPC Server 2008
This module overviews the basic setup and administration of an HPCS-based cluster.
Lessons
- Hardware requirements
- Software requirements
- Initial decisions
- Headnode setup
- Compute node setup
- Broker node setup
- Developer machine setup
- Diagnostics
- Maintenance — including performance
- Troubleshooting
Additional Reading
To help you prepare for this class, review the following resources:
- http://www.microsoft.com/hpc
- Any book in the field of HPC software development, such as M. Quinn’s “Parallel Programming in C with MPI and OpenMP” or P. Pacheco’s “Parallel Programming with MPI”.
Before attending this course, students must have:
- Basic experience using the Windows platform.
- Basic programming experience on Windows using Visual Studio.
- 2 or more years of programming experience in C++ or C#.