High performance communications libraries for Microsoft Windows 2000

This page describes the port of high performance communications libraries (BIP, MPI-BIP) to Microsoft Windows 2000. This work is funded by Microsoft Research through a project with INRIA Rhône-Alpes . This research is conducted by people from RESO Action INRIA inside the RESAM Laboratory in Lyon, France.

People

Roland Westrelin (INRIA "Ingénieur Expert" funded by Microsoft Research)
Laurent Lefèvre (INRIA full Time Researcher)

Microsoft contact

Pierre-Yves Saintoyant (Responsible for the University Relations in "Europe Middle East and Africa" area at Microsoft)

Acknowledgments

We would like to thank Loïc Prylli (LIP) for the help he provided with the GM driver.

Port of BIP low level communication layer on Microsoft Windows 2000

The BIP low level communication layer is composed of three components. We describe then briefly here and introduce the strategy we used to have them working on windows.

A kernel module. It has several roles:
- First it must act as a driver: it discovers the Myrinet board, registers itself with the operating system as the driver for this peripheral and properly initializes the hardware.
- As any classical network driver, it may interface with the TCP/IP stack to handle communications for the Myrinet board.
- It provides some basic services to the BIP library. At the initialization time, it gives direct access to the Myrinet board by the BIP library. It is also used by the BIP library to register/unregister memory (pin down memory pages in physical memory, provide the address translations).
We are not interested in the second role since the main advantage of using BIP is to have zero-copy communications with a light weight protocol. Our idea to provide the first and third services to the library was to rely on Myricom's GM driver. Indeed this driver already provide functionalities close to what we need. And it is available for a wide range of platforms including windows 2000. So we modified the GM driver so that it provides a new set of services for the BIP library. It doesn't mean that this was a piece of cake but it was probably a lot easier that re-writing a new driver from scratch.
The BIP library. When it was written, it was targeted only to linux. Thus, even though there is no fundamental limits that prevent a native port to the win32 system, we decided to use the cygwin porting layer which is freely available. Using this library has several advantages. Maintenance of the code is easy. There is only a set of source files with no ugly #ifdef/#endif. It comes with a full environment which includes a set of handy tools: make to manage the project, gcc to compile the code, perl to use the script provided with BIP, ssh to access the remote nodes. It is in very active development and is getting better and better at a quick pace. We see very few objections to the use of the cygwin system. It is still possible to use a third party compiler for the application to ensure top performance. The BIP and MPI library in themselves don't use system calls for any critical tasks and the application writer has the freedom to use win32 calls directly to save the extra overhead introduced by the cygwin layer. Note that even if the cygwin library is a very powerful tool, we still had to rewrite some part of the BIP library using native win32 calls.
The firmware: nothing to do here, it is independent of the operating system.

Port of MPI-BIP high level communication layer on Microsoft Windows 2000

Since MPI-BIP is a higher level layer, it is less dependent on the underlying operating system and hardware. The port of MPI on top of BIP was realatively easy and allows us to experiment and gather applications results (NAS benchmarks).

Performance results

The following experiments were run on a cluster of 8 dual 933Mhz PIII connected by Myrinet 2000 hardware (Lanai9 133Mhz, serial links).

Micro-benchmarks: point to point experiments

Point to point latency of BIP and MPI-BIP (click here for full size graph)

Point to point bandwidth of BIP and MPI-BIP (click here for full size graph)

Application benchmarks: the NAS parallel benchmarks

IS

	1 processor	4 processors	8 processors	8 x 2 processors
Time in seconds (class A)	9.47	2.66	1.53	1.31
Time in seconds (class B)	38.02	10.70	6.05	5.34

LU

	1 processor	4 processors	8 processors	8 x 2 processors
Time in seconds (class A)	1596.73	397.62	201.39	195.75
Time in seconds (class B)		1647.42	862.58	536.09

These results are comparable to the one we get under Linux on the same platform. The only difference is the performance of the fortran compiler (g77): the one provided with cygwin generates code significantly slower. We didn't investigate the problem much but, it is probably possible to correct this strange behaviour.

Roland WESTRELIN

Last modified: Wed Mar 30 18:34:25 CEST 2005