Skip to main content

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Jump to: navigation, search

PTP/designs/2.x

NOTE: THIS DOCUMENT IS A DRAFT CURRENTLY UNDER DEVELOPMENT. THERE MAY BE SUBSTANTIAL CHANGES IN THE NEAR FUTURE.

Overview

The Parallel Tools Platform (PTP) is a portable, scalable, standards-based integrated development environment specifically suited for application development for parallel computer architectures. The PTP combines existing functionality in the Eclipse Platform, the C/C++ Development Tools, and new services specifically designed to interface with parallel computing systems, to enable the development of parallel programs suitable for a range of scientific, engineering and commercial applications.

This document provides a detailed design description of the major elements of the Parallel Tools Platform version 2.x.

Architecture

The Parallel Tools Platform provides an Eclipse-based environment for supporting the integration of development tools that interact with parallel computer systems. PTP provides pre-installed tools for launching, controlling, monitoring, and debugging parallel applications. A number of services and extension points are also provided to enable other tools to be integrated with Eclipse an fully utilize the PTP functionality.

Unlike traditional computer systems, launching a parallel program is a complicated process. Although there is some standardization in the way to write parallel codes (such as MPI), there is little standardization in how to launch, control and interact with a parallel program. To further complicate matters, many parallel systems employ some form of resource allocation system, such as a job scheduler, and in many cases execution of the parallel program must be managed by the resource allocation system, rather than by direct invocation by the user.

In most parallel computing environments, the parallel computer system is remote from the user's location. This necessitates that the parallel runtime environment be able to communicate with the parallel computer system remotely.

The PTP architecture has been designed to address these requirements. The following diagram provides an overview of the overall architecture.

Ptp20 arch.png

The architecture can be roughly divided into three major components: the runtime platform, the debug platform, and tool integration services. These components are defined in more detail in the following sections.

Runtime Platform

Launch Platform

Debug Platform

The debug platform comprises those elements relating to the debugging of parallel applications. The debug platform is comprised of the following elements in the architecture diagram:

  • debug model
  • debug views
  • debug controller
  • debug launch
  • scalable debug manager

The debug model provides a representation of a job and its associated processes being debugged. The debug views allow the user to interact with the debug model, and control the operation of the debugger. The debug controller interacts with the job being debugged and is responsible for updating the state of the debug model. The debug model and controller are collectively known as the parallel debug interface (PDI), which is set of abstract interfaces that can be implemented by different types of debuggers to provide the concrete classes necessary to support a variety of architectures and systems. PTP provides an implementation of PDI that communicates via a proxy running on a remote machine. The Java side of the implementation provides a set of commands and events that are used for communication with a remote debug agent. The proxy protocol is built on top of the same protocol used by the runtime services. The scalable debug manager is an external program that runs on the remote system and implements the server side of the proxy protocol. It manages the debugger communication between the proxy client and low-level debug engines that control the debug operations on the application processes.

Debug Model

The debug model represents objects in the target program that are of interest to the debugger. Model objects are updated to reflect the states of the corresponding objects in the target, and can be displayed in views in the user interface. They form the core of a model-view-controller design pattern. PDI provides a broad range of model objects, and these are summarized in the following table.

Model Object Description
IPDIAddressBreakpoint A breakpoint that will suspend execution when a particular address is reached
IPDIArgument A function or procedure argument
IPDIArgumentDescriptor Detailed information about an argument
IPDIExceptionpoint An breakpoint that will suspend execution when a particular exception is raised
IPDIExpression An expression that can be evaluated to produce a value
IPDIFunctionBreakpoint A breakpoint that will suspend execution when a particular function is called
IPDIGlobalVariable A variable that has global scope within the execution context
IPDIGlobalVariableDescriptor Detailed information about a global variable
IPDIInstruction A machine instruction
IPDILineBreakpoint A breakpoint that will suspend execution when a particular source line number is reached
IPDILocalVariable A variable with scope local to the current stack frame
IPDILocalVariableDescriptor Detailed information about a local variable
IPDIMemory An object representing a single memory location in an execution context
IPDIMemoryBlock A contiguous segment of memory in an execution context
IPDIMixedInstruction A machine instruction with source context information
IPDIMultiExpressions Expressions in multiple execution contexts
IPDIRegister A special purpose variable
IPDIRegisterDescriptor Detailed information about a register
IPDIRegisterGroup A group of registers in a target execution context
IPDIRuntimeOptions Configuration information about a debug session
IPDISharedLibrary A shared library that has been loaded into the debug target
IPDISharedLibraryManagement A shared library manager
IPDISignal A POSIX signal
IPDISignalDescriptor Detailed information about a signal
IPDISourceManagement Manages information about the source code
IPDIStackFrame A stack frame in a suspended execution context (thread)
IPDIStackFrameDescriptor Detailed information about a stack frame
IPDITarget A debuggable program. This is the root of the PDI model
IPDITargetExpression An expression with an associated variable
IPDIThread An execution context
IPDIThreadGroup A group of threads
IPDIThreadStorage Storage associated with a thread
IPDIThreadStorageDescriptor Detailed information about thread storage
IPDITracepoint A point in the program execution at which data will be collected
IPDIVariable A data structure in the program
IPDIVariableDescriptor Detailed information about a variable
IPDIWatchpoint A breakpoint that will suspend execution when a particular data structure is accessed

Debug Views

The debugger provides a number of views that display the state of objects in the debug model. The views provided by the PTP debugger are summarized in the following table.

View Name Description
Breakpoints View Displays the breakpoints for the program, and if the breakpoint is active or inactive. Active breakpoints will trigger suspension of the execution context.
Debug View Displays the threads and stack frames associated with a process. Is used to navigate to different stack frame locations.
Parallel Debug View Displays the parallel job and processes associated with the job. Allows the user to define groupings of processes for control and viewing purposes.
PTP Variable View Registers variables that will be displayed in tooltip popups in the Parallel Debug View.
Signals View Displays the state of signal handling in the debugger, and allows the user to change how signals will be handled.
Variable View Displays the values of all local variables in the current execution context.

Debug Controller

The debug controller is responsible for maintaining the state of the debug model, and for communicting with the backend debug engine. Commands to the debug engine are initiated by the user via interaction with views in the user interface, and cause the debug controller to invoke the appropriate method on the PDI implementation. PDI then translates the commands into a format required by the backend debug engine. Events that are generated by the backend debug engine are converted by the PDI debugger implementation into PDI model events. The controller then uses these events to update the debug model.

Debug API

The debug controller uses an API (part of the parallel debug interface (PDI)) to communicate with a backend debug engine. A new PDI implementation is only required to support a different debug protocol than that used by the scalable debug manager (SDM).

Addressing

Most debugger methods and events require knowledge of which processes they are directed to, or where they have originated from. The BitList class is used for this purpose. Debugger processes are numbered from 0 to N-1 (where N is the total number) and correspond to bits in the BitList. For example, to send a command to processes 15 through 30, bits 15-30 would be set in the BitList. The PDI makes no assumptions how the BitList is represented once it leaves the PDI classes.

Data Representation

Unlike most debuggers, a PDI debugger does not use strings to represent data values that have been obtained from the target processes. Instead, PDI provides a first-class data type that represents both the type and value of the target data in an architecture independent manner. This allows Eclipse to manipulate data that originates from any target architecture, regardless of word length, byte ordering, or other architectural issues. The following table shows the classes that are available for representing both simple and compound data types.

Type Class Value Class Description
IAIFTypeAddress IAIFValueAddress Represents a machine address.
IAIFTypeArray IAIFValueArray Represents a multi-dimensional array.
IAIFTypeBool IAIFValueBool Represents a boolean value.
IAIFTypeChar IAIFValueChar Represents a single character.
IAIFTypeCharPointer IAIFValueCharPointer Represents a pointer to a character string (i.e. a C string type).
IAIFTypeClass IAIFValueClass Represents a class.
IAIFTypeEnum IAIFValueEnum Represents an enumerated value.
IAIFTypeFloat IAIFValueFloat Represents a floating point number.
IAIFTypeFunction IAIFValueFunction Represents a function object.
IAIFTypeInt IAIFValueInt Represents an integer.
IAIFTypeLong IIAIFValueLong Represents a long integer.
IAIFTypeLongLong IIAIFValueLongLong Represents a long long integer.
IAIFTypeNamed IAIFValueNamed Represents a named object.
IAIFTypePointer IAIFValuePointer Represents a pointer to an object.
IAIFTypeRange IAIFValueRange Represents a value range.
IAIFTypeReference IAIFValueReference Represents a reference to a named object.
IAIFTypeShort IAIFValueShort Represents a short integer.
IAIFTypeString IAIFValueString Represents a string (not C string).
IAIFTypeStruct IAIFValueStruct Represents a structure.
IAIFTypeUnion IAIFValueUnion Represents a union.
IAIFTypeVoid IAIFValueVoid Represents a void type.

Command Requests

The PDI specifies a number of commands that are sent from the UI to the backend debug engine in order to perform debug operations. Commands are addressed to destination debug processes using the BitList class. The following table lists the interfaces that must be implemented to support debug commands.

Interface Description
IPDIDebugger This is the main interface for implementing a new debugger. The concrete implementation of this class must provide methods for each debugger command, as well as some utility methods for controlling debugger operation.
IPDIBreakpointManagement This interface provides methods for managing all types of breakpoints, including line, function, and address breakpoints, watchpoints, and exceptions.
IPDIExecuteManagement This interface provides methods for controlling the execution of the program being debugged, such as resuming, stepping, and termination.
IPDIMemoryBlockManagement This interface provides methods for managing direct access to process memory.
IPDISignalManagement This interface provides methods for managing signals.
IPDIStackframeManagement This interface provides methods for managing access to process stack frames.
IPDIThreadManagement This interface provides methods for managing process threads.
IPDIVariableManagement This interface provides methods for managing all types of variables (local, global, etc.) and expression evaluation.

Events

Every PDI command results in one or more events. An event contains a list of source addresses that are represented by a BitList class. Each event may also contain additional data that provides more detailed information about the event. The data in an event with multiple sources is assumed to be identical for each source. The following table provides a list of the available events.

Event Description
IPDIChangedEvent Notification that a PDI model object has changed.
IPDIConnectedEvent Notification that the debugger has started successfully.
IPDICreatedEvent Notification that a new PDI model object has been created.
IPDIDestroyedEvent Notification that a PDI model object has been destroyed.
IPDIDisconnectedEvent Notification that the debugger session has terminated.
IPDIErrorEvent Notification that an error condition has occurred.
IPDIRestartedEvent Not currently used.
IPDIResumedEvent Notification that the debugger target has resumed execution.
IPDIStartedEvent Notification that the debugger has successfully started.
IPDISuspendedEvent Notification that the debugger target has been suspended.

Many events also provide additional information associated with the event result. The following table provides a list of these interfaces.

Interface Description
IPDIBreakpointInfo Information about a process when it stops at a breakpoint.
IPDIDataReadMemoryInfo Result of a IPDIDataReadMemoryRequest
IPDIEndSteppingRangeInfo Information about the process when a step command is completed.
IPDIErrorInfo Additional information about the cause of an error condition.
IPDIExitInfo Information about the reason that a target process exited.
IPDIFunctionFinishedInfo Not currently implemented.
IPDILocationReachedInfo Not currently implemented.
IPDIMemoryBlockInfo Represents a block of memory in the target process.
IPDIRegisterInfo Not currently implemented.
IPDISharedLibraryInfo Not currently implemented.
IPDISignalInfo Information about a signal on the target system.
IPDIThreadInfo Represents information about a thread.
IPDIVariableInfo Represents information about a variable.
IPDIWatchpointScopeInfo Not currently implemented.
IPDIWatchpointTriggerInfo Not currently implemented.

Scalable Debug Manager

The Scalable Debug Manager (SDM) is an implementation of the PDI. The SDM comprises four main parts: a Java PDI implementation, a scalable communications infrastructure, a debug wire protocol, and a pluggable debug backend.

Debugger Startup & Communication

The PTP debugger startup occurs in two phases. The first phase initializes the PDI debugger implementation, which then waits for an incoming connection. The second phase submits a debug job to the currently selected resource manager. A debug job contains additional information that allows the resource manager to establish the debug infrastructure necessary to debug the application program. The resource manager is responsible for starting both the debugger and the application program.

Communication between Eclipse and the external debug infrastructure is via TCP/IP sockets. When the PTP debugger is launched, the PDI debugger implementation listens on a randomly generated port number for incoming connections. The port number is passed to the resource manager as part of the debug launch. When the resource manager starts the debug infrastructure, it must pass this port number as a command-line option in order for the infrastructure to connect back to the PDI implementation. Once the connection is completed, all debugger communication occurs via this socket.

SDM PDI implementation

The SDM PDI implementation is responsible for converting PDI commands and events into a wire protocol. Once the connection has been established, the PDI implementation receives commands that are created as a result of user interaction with the debug views (e.g. the user creates a breakpoint, or clicks a single step button). These PDI commands are converted into the wire protocol and sent via the socket. Events received on the socket are converted into PDI events and used to update the debug model state.

Scalable Communications Infrastructure

The SDM implements a scalable communications infrastructure using a C program that uses the MPI programming model. NOTE: future versions of the SDM will not rely on MPI. Typically, this infrastructure will be running on a machine (the parallel machine) that is remote from the machine running Eclipse (the workstation). When debugging a multi-process application, the SDM starts a server process for each application process, plus one additional process that acts as the master controller. Each of the server processes forms part of a binomial tree that broadcasts commands to minimize the message latency to each process. The infrastructure also performs message aggregation when receiving debugger events. Each command specifies a timeout period (possibly infinite) for message aggregation. A server process waits for this period before sending the aggregated message to it's parent. Message aggregation is achieved by computing a hash over the message body, and aggregating messages with identical hashes.

Debug Wire Protocol

The debugger uses a simple text-based protocol to communicate between the Eclipse-based debug controller and the SDM master process. The format of the protocol is described in more detail in Debugger Wire Protocol. The protocol consists of commands and events. A command instructs the SDM to perform some action on a group of processes. Each command results in one or more events. Commands are addressed to processes using a bitmap, where each bit represents a process in the binomial tree. Events resulting from commands are aggregated where possible, by waiting for a predetermined time (possibly infinite) before the event is sent. Events also carry a bitmap indicating which processes the event corresponds to. A command is completed when every process receiving a command has produced a corresponding event.

On the Java side, each command and event is represented by a class. When a command is initiated, it is transformed into the text-based protocol, and sent to the SDM master process. Incoming events are transformed into the corresponding class, and forwarded to the debug controller. The following tables list the available commands and their corresponding events.

Debugger Commands
Command Class Command Description
ProxyDebugBreakpointAfterCommand Set attributes on a breakpoint so the execution context will be suspended after the breakpoint is reached a predetermined number of times.
ProxyDebugCLICommand Send a backend-specific command.
ProxyDebugConditionBreakpointCommand Set a breakpoint that will only trigger when an expression evaluates to true.
ProxyDebugDataReadMemoryCommand Read a block of memory.
ProxyDebugDataWriteMemoryCommand Write a block of memory.
ProxyDebugDeleteBreakpointCommand Remove a breakpoint.
ProxyDebugDisableBreakpointCommand Temporarily disable a breakpoint.
ProxyDebugEnableBreakpointCommand Re-enable a disabled breakpoint.
ProxyDebugEvaluateExpressionCommand Evaluate an expression in the execution context and return the result.
ProxyDebugGetPartialAIFCommand Return a partially evaluated AIF object.
ProxyDebugGetTypeCommand Get the type of a variable.
ProxyDebugGoCommand Resume (or begin) execution of a process.
ProxyDebugInterruptCommand Interrupt an executing process.
ProxyDebugListArgumentsCommand List the arguments of the current stack frame.
ProxyDebugListGlobalVariablesCommand List all the global variables.
ProxyDebugListInfoThreadsCommand List the threads of the process.
ProxyDebugListLocalVariablesCommand List all local variables in the current stack frame.
ProxyDebugListSignalsCommand Determine how signals are handled by the backend debugger.
ProxyDebugListStackframesCommand List all stack frames in the current execution context.
ProxyDebugSetCurrentStackframeCommand Set the current stack frame.
ProxyDebugSetFunctionBreakpointCommand Set a breakpoint on a function.
ProxyDebugSetLineBreakpointCommand Set a breakpoint on a source line.
ProxyDebugSetThreadSelectCommand Select the current execution context.
ProxyDebugSetWatchpointCommand Set a watchpoint on an expression.
ProxyDebugSignalInfoCommand Set how a signal should be handled.
ProxyDebugStackInfoDepthCommand Determine the stack depth in the execution context.
ProxyDebugStartSessionCommand Start the debug session.
ProxyDebugStepCommand Single step the execution context.
ProxyDebugTerminateCommand Terminate the process being debugged.
ProxyDebugVariableDeleteCommand Remove a debugger variable.


Debugger Events
Event Name Description
IProxyDebugArgsEvent Contains a list of arguments for the stack frame.
IProxyDebugBreakpointHitEvent Generated when a breakpoint is hit.
IProxyDebugBreakpointSetEvent Generated when a breakpoint has been successfully set.
IProxyDebugDataEvent Contains the type and contents of a target data structure.
IProxyDebugDataExpValueEvent Contains a string representation of an expression value.
IProxyDebugErrorEvent Indicates that a debug error has occurred.
IProxyDebugExitEvent Indicates that the target process has exited normally, and the exit value.
IProxyDebugInfoThreadsEvent Contains a list of threads.
IProxyDebugInitEvent Indicates that the debugger has successfully initialized.
IProxyDebugMemoryInfoEvent Contains data from a block of memory.
IProxyDebugOKEvent Acknowledges a successful command operation.
IProxyDebugPartialAIFEvent Contains a partially evaluated AIF object.
IProxyDebugSetThreadSelectEvent Indicates a thread was successfully selected.
IProxyDebugSignalEvent Generated in response to a process receiving a signal.
IProxyDebugSignalExitEvent Generated when a process exits due to a signal.
IProxyDebugSignalsEvent Contains a list of signals and how they are handled by the backend debugger.
IProxyDebugStackframeEvent Contains a list of stack frames.
IProxyDebugStackInfoDepthEvent Contains the stack frame depth.
IProxyDebugStepEvent Generated when an execution context completes a single step.
IProxyDebugSuspendEvent Generated when an execution context is suspended (e.g. when hitting a breakpoint)
IProxyDebugTypeEvent Contains the result of a request for a variable type.
IProxyDebugVarsEvent Contains the result of a request for the contents of a variable.

On the SDM side, the master process receives commands from the debug controller and calls the associated handler routine. This routine forwards the command to the processes specified by the command bitmap. Event responses are aggregated where possible, converted into the text-base protocol, and then forwarded to the debug controller.

Backend Debugger

The SDM uses a native debugger to perform debug actions on the target processes. The native debugger interfaces to the SDM via a backend interface that converts the debugger commands into commands that can be understood by the native debugger. Native debugger responses are converted back to debugger events. The SDM currently supports gdb as a native backend debugger, and communicates to gdb via the MI interface.

Remote Services

Overview

Although some users are lucky enough to have a parallel computer under their desk, most users will be developing and running parallel applications on remote systems. This means that PTP must support monitoring, running and debugging applications on computer systems that are geographically distant from the user's desktop. The traditional approach take to this problem was to require the users to log into a remote machine using a terminal emulation program, and then run commands directly on the remote system. Early versions of PTP also took this approach, requiring the use of the remote X11 protocol to display an Eclipse session that was actually running on the remote machine. However this technique suffers from a number of performance and usability problems that are difficult to overcome. The approach taken in later versions of PTP (2.0 and higher) is to run Eclipse locally on the user's desktop, and provide a proxy protocol to communicate with a lightweight agent running on the remote system, as described in the preceding sections. In addition to the proxy protocol, there are a number of other remote activities that must also take place to transparently support remote development. PTP support for these activities is described in the following sections.

Remote Services Abstraction

PTP takes the approach that the user should be spared from needing to know about specific details of how to interact with a remote computer system. That is, the user should only need to supply enough information to establish a connection to a remote machine (typically the name of the system, and a username and password), then everything else should be taken care of automatically. Details such as the protocol used for communication, how files are accessed, or commands initiated, do not need to be exposed to the user.

An additional requirement is that PTP should not be dependent on other Eclipse projects (apart from the platform). Unfortunately, while the Eclipse File System (EFS) provides an abstraction for accessing non-local resources, it does not support other kinds of services, such as remote command execution. The only alternative to EFS is the RSE project, but introducing such a dependency into PTP is not desirable at this time.

In order to address these requirements, PTP provides a remote services abstraction layer that allows uniform access to local and remote resources using arbitrary remote service providers. The following diagram shows the architecture of the remote services abstraction.

Remote services.png

The top two layers (shown in orange) comprise the abstraction layer. The lower (green) layer are the actual service provider implementations. By providing this separation between the abstraction and the service providers, no additional dependencies are introduced into PTP.

Three primary service types are provided:

Connection management 
Create and manage connections to the remote system. Once a connection has been established, it can be used to support additional activities.
File management 
Provides services that allow browsing for and performing operations on remote resources, either files or directories.
Process management 
Provides services for running commands on a remote system.

There is also support available for discovering and managing service providers.

Implementation Details

The main plugin for remote services is found in the org.eclipse.ptp.remote. This plugin must be included in order to access the remote abstraction layer. The plugin provides an extension point for adding remote services implementations. Each remote services implementation supplies a remote services ID which is used to identify the particular implementation. The following table lists the current remote services implementations and their ID's.

Plugin Name ID Remote Service Provider
org.eclipse.ptp.remote org.eclipse.ptp.remote.LocalServices Local filesystem and process services
org.eclipse.ptp.remote.rse org.eclipse.ptp.remote.RSERemoteServices Remote System Explorer

In addition to providing a remote services ID, each plugin must provide implementations for the three main services types: connection management, file management, and process management. The plugin is also responsible for ensuring that it is initialized prior to any of the services being invoked.

Entry Point: PTPRemotePlugin

The activation class for the main remote services plugin is PTPRemotePlugin. This class provides two main methods for accessing the remote services:

IRemoteServices[] PTPRemotePlugin.getAllRemoteServices() 
This method returns an array of all remote service providers that are currently available. A service provider for accessing local services is guaranteed to be available, so this method will always return an array containing at least one element.
IRemoteServices PTPRemotePlugin.getRemoteServices(String id) 
This method returns the remote services provider that corresponds to the ID give by the id argument.

Typically, the remote service providers returned by getAllRemoteServices() will be used to populate a dropdown that allows the user to select the service they want to use. Once the provider has been selected, the getRemoteServices() method can be used to retrieve the remote services at a later date.

Obtaining Services: IRemoteServices

The IRemoteServices interface represents a particular set of remote services, and is the main interface for interacting with the remote service provider. There are four main methods available:

boolean isInitialized() 
This should be called to check that the service has been initialized. Other methods should only be called if this returns true.
IRemoteConnectionManager getConnectionManager() 
Returns a connection manager. This is used for creating and managing connections.
IRemoteFileManager getFileManager() 
Returns a file manager for a given connection. A file manager is responsible for managing operations on files and directories.
IRemoteProcessBuilder getProcessBuilder() 
Returns a process builder for a given connection. A process builder is responsible for running commands on a remote system.

Connection Managerment: IRemoteConnectionManager

Once the service provider has been selected, the first thing that typically needs to be done is create a new connection. This is done using the IRemoteConnectionManager.newConnection() method. This method will allow the user to create a new connection using whatever method the underlying service provider uses for this. Typically it will be some kind of dialog that allows the connection parameters to be entered (hostname, username, password, etc.) The result will be an IRemoteConnection object that represents the connection to the remote system.

File Management: IRemoteFileManager

If file-type operations are required, then the next step would be to call IRemoteServices.getFileManager() specifying the newly create connection. The resulting IRemoteFileManager object has a number of methods for selecting and manipulating files:

IPath browseFile() 
This will present the user with a dialog allowing them to select a file on the remote system. The returned path will be the path of the remote file relative to the remote system.
IPath browseDirectory() 
Similar to browseFile() but the user can select a directory.
IRemoteResource getResource() 
Given a path on the remote system, this will return an object that can be used to manipulate the remote file or directory.

Process Manangement: IRemoteProcessBuilder

Coming...

Other Services

Coming...

Tool Integration Services

Back to the top