Skip to main content

Notice: This Wiki is now read only and edits are no longer possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Jump to: navigation, search

PTP/designs/3.x/rm proxy

< PTP‎ | designs
Revision as of 14:53, 18 June 2007 by Unnamed Poltroon (Talk) (REMOVE_ALL)

Overview

This is a preliminary design for the PTP Resource Management proxy communication protocol. This protocol is used to communicate between the Resource Manager System in Eclipse, and a lightweight proxy agent running on a target system. The primary purpose of the protocol is for system monitoring, process launch, and process control activities.

Terminiology

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC 2119]

Resource Manager System

The Resorce Manager System (RMS) is an Eclipse plugin that manages interaction with arbitrary resource managers. A resource manager in this context, is anything that provides program launch and monitoring services on a target system. Typically, a resource manager will be a job scheduler (e.g. LSF, LoadLeveler, PBS, etc.) running on a large multi-user system. Other types of resource managers include the Open Runtime Environment (ORTE) which is part of the OpenMPI distribution, or the MPICH2 runtime system. The RMS is responsible for populating an internal model in Eclipse which provides a cached representation of the system and program state. Various user interface views are available to inspect and interact with this model. Details of the RMS are provided in a separate document PTP/designs/rms.

Proxy Agent

The RMS communicates with proxy agents to gather information about the state of a target system. The proxy agent may be located on either a local or remote machine.

Proxy Agent Launch

The RMS is responsible for launching the proxy agent. On a local machine, this just involves executing a local process. To launch on a remote machine, the RMS must use an authenticated command service, such as ssh. Current plans are to utilize the Remote System Explorer (RSE) system to provide this remote proxy launch capability.

Proxy Session

One instance of a communcation channel between the RMS and a proxy agent is known as a session. A session only supports communication to a single proxy agent at a time. The mechanism used to effect communcation between the RMS and a proxy agent is not defined in this document, but can be any bi-directional communications channel (e.g. TCP/IP sockets, etc.)

Proxy Protocol

The communication protocol used between the RMS and the proxy agent is a simple text-based asynchronous command/event protocol. The RMS sends one or more commands to the proxy agent, which in turn will generate events that are returned to the RMS.

Some generic properties of the protocol include:

  • One command MAY generate multiple events.
  • Commands and events are matched using a transaction ID (tid). The tid in an event MUST match a corresponding command.
  • Completion of a command is indicated by either an ERROR or OK event with matching tid.
  • Tids need only be unique for uncompleted commands. Once a command is completed, it's tid can be reused.
  • Any events received with an invalid tid SHOULD be discarded.

Message Format

Commands and events consist of sequences of ASCII characters formatted into a message. A message is transmitted in the following format:

LENGTH COMMAND_OR_EVENT

LENGTH and COMMAND_OR_EVENT are separated by a space (hex 20). The LENGTH is the length of the COMMAND_OR_EVENT portion of the message including the space. COMMAND_OR_EVENT is the actual text of the command or event.

The COMMAND_OR_EVENT portion of the message will contain a sequence of numbers and strings, formatted according to the rules described in the #Commands or #Events sections below.

Numbers are always formatted as fixed length sequences of hexadecimal characters. Strings are formatted as follows:

LENGTH:CHARACTERS

where LENGTH is the number of characters in the string (formatted as 8 hexadecimal characters), ':' is a colon character (hex 3A), and CHARACTERS are the actual ASCII characters in the string.

For example, the string "A String" would be formatted as:

00000008:A String

A zero length string would be formatted as:

00000000:

Protocol Phases

The proxy protocol is divided into a number of phases. A phase determines the legal commands that can be sent to the proxy agent. During a particular phase, illegal commands SHOULD be discarded. Note: this may be changed to SHOULD generate an ERROR event.

Phases follow a strict ordering. Transition from one phase to the next occur when an OK event is received in response to a phase initiation command. A phase initiation command is a command that must be sent to initiate a particular phase. Once a phase has been initiated, any legal commands for that phase may be sent. The phase ordering is defined as follows:

INITIALIZE -> MODEL_DEF -> {START_EVENTS -> STOP_EVENTS}

The phases are defined in more detail in the following sections.

INITIALIZE

This is the first phase, and is used to initiate a communication session between the RMS and proxy agent, and agree on any protocol parameters that apply to this session.

Phase initiation command:

  • INIT

Legal commands:

  • none

DISCOVERY

The discovery phase is used to allow the proxy agent to inform the RMS of any dynamic property information. This information currently consists of attribute definitions and filter definitions which are described in more detail below.

Phase initiation command:

  • MODEL_DEF

Legal commands:

  • none

NORMAL

The normal phase is entered once the initialize and discovery phases are completed. This is the normal command/event processing phase.

Phase initiation command:

  • START_EVENTS

Legal commands:

  • SUBMIT_JOB
  • TERMINATE_JOB
  • STOP_EVENTS
  • QUIT

SUSPENDED

The suspended phase is used when the RMS needs to prevent the proxy agent from sending additional events.

Phase initiation command:

  • STOP_EVENTS

Legal commands:

  • START_EVENTS
  • QUIT

Phase Example

The following provides a simple example of phase transitions. Commands go from left to right, events from right to left. The command tid is shown in ()'s after the command or event name.

-- intialize phase --
INIT(1)         ->
                <- OK(1)
-- definition phase --
MODEL_DEF(2)    ->
                <- ATTR_DEF(2)
                <- ATTR_DEF(2)
                <- OK(2)
-- normal phase --
START_EVENTS(3) ->
                <- NEW_MACHINE(3)
                <- NEW_NODE(3)
STOP_EVENTS(4)  ->
                <- OK(3)
-- suspended phase --
START_EVENTS(5) ->
                <- OK(4)
-- normal phase --
                <- NEW_QUEUE(5)
QUIT(6)         ->
                <- OK(5)
                <- OK(6)

Note that the first suspended phase is not entered until the OK event corresponding to the START_EVENTS command (tid 3) is received. Similarly, the second normal phase is not entered until after the OK event corresponding to the STOP_EVENTS command (tid 4) is received.

Commands

Commands are formatted as simple ASCII text strings. A proxy command consists of a header and a body, separated by a space (hex 20), as follows:

COMMAND_HEADER COMMAND_BODY

The command header consists of three fixed length strings separated by colons (hex 3A), so it is itself fixed length. The format of the header is:

COMMAND_ID:TID:NUM_ARGS

where

COMMAND_ID is a 4 digit number representing the command to be performed

TID is the 8 digit transaction ID assigned to this command

NUM_ARGS is an 8 digit number of space separated elements in the command body

The following sections describe the currently defined commands.

QUIT

Message Format

0000:TID:00000000
Description 
Terminate the proxy agent. This command will cause the proxy agent to terminate as soon as possible.
Events 
OK

INIT

Message Format

0001:TID:00000002 VERSION BASE_ID
Description 
Initialize proxy communication. VERSION is the wire protocol version number. BASE_ID" is the base ID used by the proxy agent when allocating new element IDs. After this command has been received, the proxy is ready to receive and process other commands from the RM. Initialization data may be passed on the command line when the proxy is run.
Events 
OK, ERROR

MODEL_DEF

Message Format

0002:TID:00000000
Description 
Start the proxy discovery phase. The proxy responds with a series of ATTR_DEF and FILTER_DEF events. Attributes (see below) are meta-data describing data from the proxy that the RMS is expected to receive and possibly display in the UI.
Events 
ATTR_DEF, FILTER_DEF, OK, ERROR

START_EVENTS

Message Format

0003:TID:00000000
Description 
Initiate normal event processing phase.
Events 
CHANGE_JOB, CHANGE_MACHINE, CHANGE_NODE, CHANGE_PROCESS, CHANGE_QUEUE, NEW_JOB, NEW_MACHINE, NEW_NODE, NEW_PROCESS, NEW_QUEUE, REMOVE_JOB, REMOVE_MACHINE, REMOVE_NODE, REMOVE_PROCESS, REMOVE_QUEUE, OK, ERROR

STOP_EVENTS

Message Format

0004:TID:00000000
Description 
Suspend normal event processing phase.
Events 
OK, ERROR

SUBMIT_JOB

Message Format

0005:TID:NUM_ARGS ATTR_1 ATTR_2 ... ATTR_N
Description 
Submit a job to the resource manager for execution.
Events 
OK, ERROR

TERMINATE_JOB

Message Format

0006:TID:00000001 JOB_ID
Description 
Request the terminaton of an existing job. The meaning of 'termination' depends on the state of the job.
Events 
OK, ERROR

MOVE_JOB

Message Format

0007:TID:00000002 JOB_ID QUEUE_ID
Description 
Not yet implemented. This command is intended to allow jobs to be moved between queues.
Events 
OK, ERROR

CHANGE_JOB

Message Format

0008:TID:NUM_ARGS JOB_ID ATTR_1 ATTR_2 ... ATTR_N - 1
Description 
Not yet implemented. This command is intended to allow a job's status to be changed (e.g. place a hold on a job)
Events 
OK, ERROR

LIST_FILTERS

Message Format

0009:TID:00000000
Description 
Not yet implemented. This command lists the filters that are currently enabled in the proxy agent.
Events 
OK, ERROR

SET_FILTERS

Message Format

000A:TID:NUM_ARGS ATTR_1 ATTR_2 ... ATTR_N
Description 
Not yet implemented. This command sets the filters in the proxy agent.
Events 
OK, ERROR

Events

Events are used by the proxy agent to communicate the results of commands or other information back to the RMS. As mentioned previously, each event MUST contain a tid of a corresponding command.

Events, like commands, are formatted as simple ASCII text strings. A proxy event consists of a header and a body, separated by a space (hex 20), as follows:

EVENT_HEADER EVENT_BODY

The event header consists of three fixed length strings separated by colons (hex 3A), so it is itself fixed length. The format of the header is:

EVENT_ID:TRANSACTION_ID:NUM_ARGS

where

EVENT_ID is a 4 digit number representing the event

TRANSACTION_ID is the 8 digit transaction ID of the command that generated this event

NUM_ARGS is an 8 digit number of space separated elements in the event body

The following sections describe the currently defined events.

OK

Message Format

0000:TID:00000000
Description 
Indicates that the command with corresponding TID has been completed successfully
Arguments 
none

ERROR

Message Format

0005:TID:00000002 ERROR_CODE_ATTRIBUTE ERROR_MSG_ATTRIBUTE
Description 
Indicates that the command with corresponding TID has not been completed successfully. The reason for the failure are provided in the attributes.

Arguments

ERROR_CODE_ATTRIBUTE is a numeric attribute containing the error code
ERROR_MSG_ATTRIBUTE is a string attribute containing a textual representation of the error

MESSAGE

Message Format

00FA:TID:00000003 MSG_LEVEL_ATTRIBUTE MSG_CODE_ATTRIBUTE MSG_TEXT_ATTRIBUTE
Description 
A log message that will be displayed by the user interface.

Arguments

MSG_LEVEL_ATTRIBUTE is a string attribute containing the message level. Valid levels are '"FATAL'", '"ERROR", "WARNING", and "INFO".
MSG_CODE_ATTRIBUTE is a numeric attribute containing the message code
MSG_TEXT_ATTRIBUTE is a string attribute containing a textual representation of the message

ATTR_DEF

Message Format

00FB:TID:NUM_ARGS NUM_DEFS ATTRIBUTE_DEF ... ATTRIBUTE_DEF
Description 
Used to create new attribute definitions.

Arguments

NUM_DEFS is the number of attribute definitions to follow.
ATTRIBUTE_DEF is an attribute definition.

All attribute definitions begin with the following elements:

NUM_ELEMENTS ID TYPE NAME DESCRIPTION DEFAULT

where:

NUM_ELEMENTS is the number of elements (separated by spaces) in this attribute definition.
ID is a unique definition ID string.
TYPE is a string representing the type of the attribute. Legal values are: "ARRAY", "BOOLEAN", "DATE", "DOUBLE", " ENUMERATED", "INTEGER", and "STRING".
NAME is a string representing the short name of the attribute. This is displayed in property views as the name of the attribute.
DESCRIPTION is a string represting a description of the attribute. This is dispayed when more information about the attribute is requested.
DEFAULT is a string representing the default value of the attribute. There must be a legal conversion between this string and the actual attribute value.

A number of attribute types require additional elements to be supplied for the definition:

Date Attribute

ATTRIBUTE_DEFINITION DATE_STYLE TIME_STYLE LOCALE MIN_MAX

where:

DATE_STYLE is a string representing the date format. Legal values are: "SHORT", "MEDIUM", "LONG", and "FULL".
TIME_STYLE is a string representing the time format. Legal values are: "SHORT", "MEDIUM", "LONG", and "FULL".
LOCALE is a string represting a country code. See java.lang.Local for legal values.
MIN_MAX is, optionally, two strings representing the minimum and maximum dates supported by the attribute.

Double Attribute

ATTRIBUTE_DEFINITION MIN_MAX
MIN_MAX is, optionally, two strings representing the minimum and maximum values supported by the attribute.

Enumerated Attribute

ATTRIBUTE_DEFINITION VALUES
VALUES is a space separated list of strings representing the enumerated values.

Integer Attribute

ATTRIBUTE_DEFINITION MIN_MAX
MIN_MAX is, optionally, two strings representing the minimum and maximum values supported by the attribute.

CHANGE_JOB

Message Format

00E6:TID:NUM_ARGS NUM_RANGES JOB_RANGE
Description 
Used to update attributes in a range of job model elements. Multiple attributes in multiple jobs can be updated simultaneously with this event.

Arguments

NUM_RANGES is the number of job ranges contained in this event.
JOB_RANGE represents the attributes that have changed for a given range of jobs.

A JOB_RANGE is formatted as follows:

ID_RANGE NUM_ATTRS ATTR_LIST

where:

ID_RANGE is a range of ID's in #Range Set Notation
NUM_ATTRS is the number of attributes associated with this range of jobs.
ATTR_LIST is a list of the attributes that will be updated for each job in the range.

The attribute list, ATTR_LIST, consists of a sequence of attributes separated by spaces.

CHANGE_MACHINE

Message Format

00E7:TID:NUM_ARGS NUM_RANGES MACHINE_RANGE
Description 
Used to update attributes in a range of machine model elements. Multiple attributes in multiple machines can be updated simultaneously with this event.

Arguments

NUM_RANGES is the number of machine ranges contained in this event.
MACHINE_RANGE represents the attributes that have changed for a given range of machines.

A MACHINE_RANGE is formatted as follows:

ID_RANGE NUM_ATTRS ATTR_LIST

where:

ID_RANGE is a range of ID's in #Range Set Notation
NUM_ATTRS is the number of attributes associated with this range of machines.
ATTR_LIST is a list of the attributes that will be updated for each machine in the range.

The attribute list, ATTR_LIST, consists of a sequence of attributes separated by spaces.

CHANGE_NODE

Message Format

00E8:TID:NUM_ARGS NUM_RANGES NODE_RANGE
Description 
Used to update attributes in a range of node model elements. Multiple attributes in multiple nodes can be updated simultaneously with this event.

Arguments

NUM_RANGES is the number of node ranges contained in this event.
NODE_RANGE represents the attributes that have changed for a given range of nodes.

A NODE_RANGE is formatted as follows:

ID_RANGE NUM_ATTRS ATTR_LIST

where:

ID_RANGE is a range of ID's in #Range Set Notation
NUM_ATTRS is the number of attributes associated with this range of nodes.
ATTR_LIST is a list of the attributes that will be updated for each node in the range.

The attribute list, ATTR_LIST, consists of a sequence of attributes separated by spaces.

CHANGE_PROCESS

Message Format

00E9:TID:NUM_ARGS NUM_RANGES PROCESS_RANGE
Description 
Used to update attributes in a range of process model elements. Multiple attributes in multiple processes can be updated simultaneously with this event.

Arguments

NUM_RANGES is the number of process ranges contained in this event.
PROCESS_RANGE represents the attributes that have changed for a given range of processes.

A PROCESS_RANGE is formatted as follows:

ID_RANGE NUM_ATTRS ATTR_LIST

where:

ID_RANGE is a range of ID's in #Range Set Notation
NUM_ATTRS is the number of attributes associated with this range of processes.
ATTR_LIST is a list of the attributes that will be updated for each process in the range.

The attribute list, ATTR_LIST, consists of a sequence of attributes separated by spaces.

CHANGE_ QUEUE

Message Format

00EA:TID:NUM_ARGS NUM_RANGES QUEUE_RANGE
Description 
Used to update attributes in a range of queue model elements. Multiple attributes in multiple queues can be updated simultaneously with this event.

Arguments

NUM_RANGES is the number of queue ranges contained in this event.
QUEUE_RANGE represents the attributes that have changed for a given range of queues.

A QUEUE_RANGE is formatted as follows:

ID_RANGE NUM_ATTRS ATTR_LIST

where:

ID_RANGE is a range of ID's in #Range Set Notation
NUM_ATTRS is the number of attributes associated with this range of queues.
ATTR_LIST is a list of the attributes that will be updated for each queue in the range.

The attribute list, ATTR_LIST, consists of a sequence of attributes separated by spaces.

NEW_JOB

Message Format

00DC:TID:NUM_ARGS PARENT_ID NUM_JOB_DEFS JOB_DEF
Description 
Define new job model elements. Multiple jobs can be defined simultaneously with this event.

Arguments

PARENT_ID is a string representing the ID of the parent queue of this job.
NUM_JOB_DEFS is the number of job definitions contained in this event.
JOB_DEF represents the definition of a range of jobs and associated attributes.

A JOB_DEF is formatted as follows:

ID_RANGE NUM_ATTRS ATTR_LIST

where:

ID_RANGE is a range of ID's in #Range Set Notation
NUM_ATTRS is the number of attributes associated with this job definition.
ATTR_LIST is a list of the attributes that will be set for each newly created job.

The attribute list, ATTR_LIST, consists of a sequence of attributes separated by spaces.

NEW_MACHINE

Message Format

00DD:TID:NUM_ARGS PARENT_ID NUM_MACHINE_DEFS MACHINE_DEF
Description 
Define new machine model elements. Multiple machines can be defined simultaneously with this event.

Arguments

PARENT_ID is a string representing the ID of the parent resource manager of this machine.
NUM_MACHINE_DEFS is the number of machine definitions contained in this event.
MACHINE_DEF represents the definition of a range of machines and associated attributes.

A MACHINE_DEF is formatted as follows:

ID_RANGE NUM_ATTRS ATTR_LIST

where:

ID_RANGE is a range of ID's in #Range Set Notation
NUM_ATTRS is the number of attributes associated with this machine definition.
ATTR_LIST is a list of the attributes that will be set for each newly created machine.

The attribute list, ATTR_LIST, consists of a sequence of attributes separated by spaces.

NEW_NODE

Message Format

00DE:TID:NUM_ARGS PARENT_ID NUM_NODE_DEFS NODE_DEF
Description 
Define new node model elements. Multiple nodes can be defined simultaneously with this event.

Arguments

PARENT_ID is a string representing the ID of the parent machine of this node.
NUM_NODE_DEFS is the number of node definitions contained in this event.
NODE_DEF represents the definition of a range of nodes and associated attributes.

A NODE_DEF is formatted as follows:

ID_RANGE NUM_ATTRS ATTR_LIST

where:

ID_RANGE is a range of ID's in #Range Set Notation
NUM_ATTRS is the number of attributes associated with this node definition.
ATTR_LIST is a list of the attributes that will be set for each newly created node.

The attribute list, ATTR_LIST, consists of a sequence of attributes separated by spaces.

NEW_PROCESS

Message Format

00DF:TID:NUM_ARGS PARENT_ID NUM_PROCESS_DEFS PROCESS_DEF
Description 
Define new process model elements. Multiple processes can be defined simultaneously with this event.

Arguments

PARENT_ID is a string representing the ID of the parent job of this processs.
NUM_PROCESS_DEFS is the number of process definitions contained in this event.
PROCESS_DEF represents the definition of a range of processes and associated attributes.

A PROCESS_DEF is formatted as follows:

ID_RANGE NUM_ATTRS ATTR_LIST

where:

ID_RANGE is a range of ID's in #Range Set Notation
NUM_ATTRS is the number of attributes associated with this process definition.
ATTR_LIST is a list of the attributes that will be set for each newly created process.

The attribute list, ATTR_LIST, consists of a sequence of attributes separated by spaces.

NEW_QUEUE

Message Format

00E0:TID:NUM_ARGS PARENT_ID NUM_QUEUE_DEFS QUEUE_DEF
Description 
Define new queue model elements. Multiple queues can be defined simultaneously with this event.

Arguments

PARENT_ID is a string representing the ID of the parent resource manager of this queue.
NUM_QUEUE_DEFS is the number of queue definitions contained in this event.
QUEUE_DEF represents the definition of a range of queues and associated attributes.

A QUEUE_DEF is formatted as follows:

ID_RANGE NUM_ATTRS ATTR_LIST

where:

ID_RANGE is a range of ID's in #Range Set Notation
NUM_ATTRS is the number of attributes associated with this queue definition.
ATTR_LIST is a list of the attributes that will be set for each newly created queue.

The attribute list, ATTR_LIST, consists of a sequence of attributes separated by spaces.

REMOVE_ALL

Message Format

00F0:TID:00000000
Description 
Remove all model elements know by the RMS for this session.

Arguments

none

REMOVE_JOB

Message Format

00F1:TID:00000001 ID_RANGE
Description 
Remove job model elements from the model. Multiple jobs can be remove simultaneously with this event. All children of the jobs will also be removed from the model.

Arguments

ID_RANGE is a range of ID's to be removed in #Range Set Notation

REMOVE_MACHINE

Message Format

00F2:TID:00000001 ID_RANGE
Description 
Remove machine model elements from the model. Multiple machines can be remove simultaneously with this event. All children of the machines will also be removed from the model.

Arguments

ID_RANGE is a range of ID's to be removed in #Range Set Notation

REMOVE_NODE

Message Format

00F3:TID:00000001 ID_RANGE
Description 
Remove node model elements from the model. Multiple nodes can be remove simultaneously with this event. All children of the nodes will also be removed from the model.

Arguments

ID_RANGE is a range of ID's to be removed in #Range Set Notation

REMOVE_PROCESS

Message Format

00F4:TID:00000001 ID_RANGE
Description 
Remove process model elements from the model. Multiple processes can be remove simultaneously with this event.

Arguments

ID_RANGE is a range of ID's to be removed in #Range Set Notation

REMOVE_QUEUE

Message Format

00F5:TID:00000001 ID_RANGE
Description 
Remove queue model elements from the model. Multiple queues can be remove simultaneously with this event. All children of the queues will also be removed from the model.

Arguments

ID_RANGE is a range of ID's to be removed in #Range Set Notation

Attributes

Attributes are used so that data sent to the RM is self describing. Attributes are meta-data describing actual data. Attribute ids must be unique and are generated by the proxy. The attribute name must persist across instances of the proxy. An attribute has a:

   ATTR_ID:
   ATTR_NAME:
   ATTR_TYPE:
   ATTR_SNAME:
   ATTR_LNAME:
   ATTR_MIN_VALUE:
   ATTR_MAX_VALUE:
   ATTR_DEF_VALUE:
   ATTR_VALSl:

Back to the top