Last commit for DeveloperManual.txt: 72c1330e44359f1fbf0b947d2964714bbc7523ac

more changes ..

gskc [2002-03-12 22:27:11]
more changes ..
  1. psl.survivor.* Developer Manual
  2. -------------------------------
  5. Authors: Gaurav S. Kc (
  6. Jean-Denis Greze (
  9. Overview
  10. --------
  12. psl.survivor.* (survivor) is a non-workflow-implementation-specific
  13. survivable framework for executing workflows.
  15. The initial build contains a psl.survivor.nrl package, providing a
  16. survivable adaptation of the framework for the Naval Research Lab's
  17. workflow system.
  19. Survivability is achieved by using Replicator Nodes that are
  20. responsible for monitoring the execution of the workflow. In case that
  21. a workflow branch is unable to succesfully succeed, Replicator Nodes
  22. find the latest known execution of that branch (or potentially an
  23. ancestor branch) and restart execution. The versioning mechanism is
  24. the critical component of allowing the workflow to restart even if
  25. much of the latest calculations are lost. In theory, a workflow branch
  26. can be restarted from any point in its execution as long as we can get
  27. to an ancestor version of that branch.
  29. Survivor also uses mobile agents (psl.worklets) for communication
  30. aspects between different processors. The idea being that smart agents
  31. infrastructure can be used to move version objects and restart the
  32. workflow.
  35. Testing
  36. -------
  38. Survivor was tested on a completely connected workflow system. We
  39. looked at both individual node and multiple node failures to test the
  40. system.
  42. In the future, we would like to test the survivablity of the system on
  43. a non-connected graph representation of the workflow processors. See
  44. Survivor, Survivor-NG and Survivor-007 papers for more details.
  47. Development
  48. -----------
  50. The main purpose of this document is to help in the development and
  51. extension of survivor. Following is a package overview.
  54. *** psl.survivor:
  56. o todo.txt - list of future changes (todo) and changes recently
  57. implemented (done)
  59. o - run an individual workflow processor and optionally
  60. have it join a group of processors ready to execute a workflow
  63. ***
  64. This package uses worklets (psl.worklets) to implement communications
  65. between different Task Processors.
  67. o - encapsulates the various task-processing
  68. capabilities of a Processor
  70. o - code actually responsible for using Worklet Virtual
  71. Machines (WVM) for communication. This object represents the local
  72. Processor(s) objects in the overlaid network created for data-replication
  73. purposes. It uses the asynchronous message-passing mechanism among peer WVMs
  74. to assist in the transfer of data, and control during the workflow execution.
  76. o -
  78. o - used by processor to communicate to other
  79. Processors. This class is responsible for "packing" and "unpacking"
  80. message sent via worklets. It invokes the proper callbacks on
  81. processors when receiving messages.
  83. o - generic Serializable container used to
  84. transport data from one Processor to another.
  87. *** psl.survivor.proc:
  88. This package contains most of the classes responsible for
  89. implementating survivable Processing (aka executing workflows) and
  90. replication (keeping track of the workflow and making sure its
  91. execution reaches termination)
  93. o - used by processor to store its knowledge of other
  94. Processors and Replicators. A PoolData object does not necessarly
  95. contain information about every Processor in existence, just the ones
  96. that the local processor knows about. Between all of the Processors'
  97. PoolData, it is possible to get a representation of the complete
  98. execution environment.
  100. o - Major component responsible for execution of tasks
  101. as well as ensuring that there are replicators around to ensure
  102. survivability. The Processor class must be extended to work with a
  103. specific workflow implementation (such as NRL's). Processors are
  104. distributed.
  106. o - Component responsible for tracking the execution
  107. of the workflow. In case of failure or problem, Replicators take the
  108. steps necessary to keep executing the workflow. Replicator are
  109. distributed throughout the network. In general, every Processor has a
  110. Replicator, though it is possible to have Replicators with dummy
  111. Processors (that are unable to execute anything).
  113. o - A handle to a replicator (which may be
  114. remote). By using handles, we can use worklets to communicate with
  115. them. Processors use ReplicatorHandles to announce how local task
  116. executions are going.
  118. o - Internal representation of a taskdefinition
  119. xml-document. Task definitions give additional requirements that a
  120. processor needs to have in order to be able to execute a certain task
  122. o - A handle to a processor (which may be
  123. remote). By using handles, we can use worklets to communicate with
  124. other processors. This is used to ask a remote processor to execute a
  125. certain task.
  127. o - Important information about the workflow as a
  128. whole. This includes all of that workflow's tasks' definitions.
  131. *** psl.survivor.proc.nrl:
  132. This package contains the relevant specializations required to
  133. integrate Naval Research Lab's MLS workflow system into Survivor.
  135. o - this is the major component in this package, it
  136. contains hooks that enable the proper setup of ServiceHost_Serv and
  137. Scheduler_Serv entities in a task processor. It is an extension of the
  138. Processor class from proc, and it invokes the appropriate methods in
  139. the NRL code so that the task is executed, and the return value is
  140. properly packaged for deliver to the next stage of the workflow
  141. execution.
  143. o - this class encapsulates the different data
  144. that are passed between consecutive tasks as specified in the workflow
  145. definition.
  148. *** psl.survivor.util:
  149. This package contains utility classes that are used by survivor. The
  150. most important utility classes are the Version and VersionCache
  151. classes. Combined, these include the principle mechanism for keeping
  152. track of a workflow's partial executions for survivability purposes.
  154. o - Used for internal representation of
  155. Capabilities and Task Definitions.
  157. o - Represents a partial execution of a workflow, along
  158. with data to restart the workflow at that point. It also contains
  159. information about where and how the data was executed until then.
  161. o - Data Structure used to keep track of different
  162. versions. It detects version clashes (this is important in detecting
  163. execution of a workflow branch that is no longer valid because of a
  164. survivability scenario). It also allows us to find the "latest"
  165. relevant version stored given an "ideal" version that we would like to
  166. find. This is helpful in restarting workflows where we have lost
  167. execution data and have to start a few steps back.
  170. *** psl.survivor.xml:
  171. This package provides utility classes for parsing xml documents and
  172. creating java objects.
  174. o - From a Processor Capabilities xml, creates a
  175. Processor listennng on the correct ports, and using the correct
  176. extension of the Processor class. Used by
  178. o - From a workflow description xml,
  179. creates task definition object for each task that might be executed.
  182. *** psl.survivor.test: This package contains data for workflows that
  183. were used for testing purposes.
  186. Changes to the original system
  187. ------------------------------
  189. in the original system is the network-aware part
  190. of each task processor, and it uses Java-RMI for communication with
  191. other processors in the workflow. Due to the nature of the system
  192. design, we required the ServiceHost_Serv implementation in the
  193. psl.survivor system to have a slight variation, viz. if
  194. ServiceHost_Serv detects that it is running as part of a psl.survivor
  195. runtime, it would extract and use a data item hidden in one of the
  196. parameters to one of its methods. This particular method is actually
  197. invoked when setting up the rmi registration for the task processor.