Skip to main content

Notice: This Wiki is now read only and edits are no longer possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Jump to: navigation, search

Difference between revisions of "PTP/designs/new sdm"

< PTP‎ | designs
(Startup)
Line 13: Line 13:
 
== Startup ==
 
== Startup ==
  
The communications network comprises a master process and a number of server processes. To debug an N process application, N+1 debugger processes are started (1 master and N server processes.) SDM startup occurs in two phases: the master process is started; then the server processes are started.  
+
The communications network comprises a master process and a number of server processes. To debug an N process application, N+1 debugger processes are started (1 master and N server processes.) SDM startup occurs in two phases: the master process is started; then the server processes are started. When debugging an application using PTP, the resource manager is responsible for coordinating this startup.  
  
=== Master Process ===
+
=== Master Process Launch ===
  
The master process could be located anywhere, but since it needs to be able to communicate with both the debugger front-end and the server processes, it will normally be launched on the system login node (the location specified in the resource manager configuration).   
+
The master process could be located anywhere, but since it needs to be able to communicate with both the debugger front-end and the server processes, it will normally be launched on the system login node (the location specified in the resource manager configuration).  When it is launched, the master process is passed arguments specifying a TCP/IP address and port number that it uses to connect to the front-end. Once connected, it waits for commands from the front-end.
  
 +
=== Server Process Launch ===
  
The server processes will be controlling the application, so need to be located on the same nodes as the application processes. It is assumed that the server processes will be launched by the same runtime system that is used to launch normal applications.  
+
The server processes will be controlling the application, so need to be located on the same nodes as the application processes. It is assumed that the server processes will be launched by the same runtime system that is used to launch normal applications (typically MPI). Server processes are passed arguments specifying the executable being debugged, a flag indicating if the debugger is to attach to a running process, and any application arguments. The servers are also passed a TCP/IP port number to bind to. If the port number is in use, then the server will increment the port number and try to bind again. This will be repeated until the server finds an available port number.
  
 
Once the debugger processes are started, they simply wait for messages from their parent and children. As soon as it starts, the master process informs the front-end that the debugger is ready for operation, which then proceeds as follows:
 
Once the debugger processes are started, they simply wait for messages from their parent and children. As soon as it starts, the master process informs the front-end that the debugger is ready for operation, which then proceeds as follows:

Revision as of 08:59, 9 May 2008

Overview

This document describes the changes to the scalable debug manager that are proposed for the PTP 2.1 release. It should be read in conjunction with the Scalable Debug Manager design document.

The major changes to the SDM for the 2.1 release are:

  • Remove dependency on OpenMPI for debugger startup
  • Remove dependency on MPI communication primitives
  • Allow communication infrastructure to be pluggable
  • Clean separation of protocol specific and protocol independent components
  • Support for I/O forwarding

Startup

The communications network comprises a master process and a number of server processes. To debug an N process application, N+1 debugger processes are started (1 master and N server processes.) SDM startup occurs in two phases: the master process is started; then the server processes are started. When debugging an application using PTP, the resource manager is responsible for coordinating this startup.

Master Process Launch

The master process could be located anywhere, but since it needs to be able to communicate with both the debugger front-end and the server processes, it will normally be launched on the system login node (the location specified in the resource manager configuration). When it is launched, the master process is passed arguments specifying a TCP/IP address and port number that it uses to connect to the front-end. Once connected, it waits for commands from the front-end.

Server Process Launch

The server processes will be controlling the application, so need to be located on the same nodes as the application processes. It is assumed that the server processes will be launched by the same runtime system that is used to launch normal applications (typically MPI). Server processes are passed arguments specifying the executable being debugged, a flag indicating if the debugger is to attach to a running process, and any application arguments. The servers are also passed a TCP/IP port number to bind to. If the port number is in use, then the server will increment the port number and try to bind again. This will be repeated until the server finds an available port number.

Once the debugger processes are started, they simply wait for messages from their parent and children. As soon as it starts, the master process informs the front-end that the debugger is ready for operation, which then proceeds as follows:

  1. The first command sent by the front-end is a global initialization command that supplies the name of the executable being debugged and its arguments.
  2. The master process broadcasts this command to all the server processes.
  3. Each server processes initializes the debug backend engine.
  4. Depending on the startup options, a breakpoint will be automatically inserted in main() and the application process started.
  5. The event generated as a result of the breakpoint being reached will be sent back to the front-end to indicate that initialization is complete.

The SDM does not place any restrictions on where the master and server processes are located. Since many cluster systems do not allow MPI processes to run on the head (login) node, this means that the master process may be running on one of the compute nodes, possibly co-located with a server process.

Communications Network

Each server process computes its location in the tree (including the location of its parent and the number and location of its children) using the MPI rank provided by the runtime. When debugging an N process job, the server processes are always assumed to be ranks 0 through N-1, and the master process rank N.

Commands

Debug commands are sent from the front-end to the master process in order to perform some kind of debug action on the target application. Each debug command contains a bitmap, where each bit corresponds to one of the server processes. When a server process receives a message, it exclusive-or's this bitmap with a bitmap representing the ancestor processes for each of its children. If the result is non-zero, the message is forwarded to the child. The server also checks to see if its own rank is included in the bitmap, and if so, will perform the debug operation on the application process it is controlling.

Events

Each debug command generates a corresponding event in each the target processes specified in the command bitmap. An event contains a bitmap representing the processes that generated the event. When a server process receives an event from a child, it attempts to aggregate it with corresponding events received from other children. This aggregation is achieved by computing a hash over the body of the event message. If the hash matches that of events received from the other children, then the event is discarded and the corresponding bit is set in the event bitmap. The server will wait for events for a predetermined time before forwarding the aggregated event. This time is specified as a parameter to the debug command that generated the event.

GDB Backend

The current implementation use GDB as the backend debug engine. The gdb backend initialization proceeds as follows:

  1. The server creates pipes for stdin, stdout and stderr and forks a new process.
  2. The child process starts GDB.
  3. The parent process sends commands to GDB to load the application executable, set a breakpoint and start execution.

The GDB backend uses the GDB/MI Interface to communicate with GDB. Debugger commands are translated into the corresponding GDB/MI command syntax, and GDB/MI output is translated into debugger events.

Back to the top