Skip to main content

Notice: This Wiki is now read only and edits are no longer possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Jump to: navigation, search

SMILA/Documentation/Bundle org.eclipse.smila.processing.pipelets

This page describes the SMILA pipelets provided by bundle org.eclipse.smila.processing.pipelets.

org.eclipse.smila.processing.pipelets.CommitRecordsPipelet

Description

Commits each record in the input variable on the blackboard to the storages. Can be used to save the records immediately during the workflow instead of only when a workflow has been finished.

Configuration

none.

org.eclipse.smila.processing.pipelets.SetAnnotationPipelet

Description

Sets a configurable annotation on each record in the input variable. This can be used to control the operation of services and pipelets that look at special annotations to distinguish between different operation modes.

Since annotations on the root metadata object of records can now be set inline in the <invokeService> or <invokePipelet> activity (see SMILA/Documentation/BPEL Workflow Processor), this pipelet is only needed to set annotations on attributes. This means probably that is will not be used very much (-;

Configuration

Property Type Description
Name String Name of annotation ot set
AnonValue String an anonymous value of the annotation. Can occur multiple times.
NamedValue:<name> named value of the annotation for name <name>
Path String : attribute path Path to attribute to attach the annotation to. If not set, annotation is set on the root metadata object of the record. The index in the final step of the path is irrelevant, the annotation if always attached to the attribute, not on contained literals or objects.

Example

The following example was used in the AddPipeline of the SMILA example application to set an annotation that advises LuceneService in the following invocation to add the records to the index: It creates an annotation named org.eclipse.smila.lucene.LuceneService with a named value executionMode=ADD (see documentation of LuceneService for details):

<extensionActivity name="setAnnotations">
  <proc:invokePipelet>
    <proc:pipelet class="org.eclipse.smila.processing.pipelets.SetAnnotationPipelet" />
    <proc:variables input="request" />
    <proc:PipeletConfiguration>
      <proc:Property name="Name">
        <proc:Value>org.eclipse.smila.lucene.LuceneService</proc:Value>
      </proc:Property>
      <proc:Property name="NamedValue:executionMode">
        <proc:Value>ADD</proc:Value>
      </proc:Property>
    </proc:PipeletConfiguration>
  </proc:invokePipelet>
</extensionActivity>

org.eclipse.smila.processing.pipelets.HtmlToTextPipelet

Description

Extract plain text and metadata from an HTML document in an attribute or attachment of each record and writes it to configurable attributes or attachments.

The pipelet uses the CyberNeko HTML parser NekoHTML to parse HTML documents.

Configuration

Property Type Description
inputType String : ATTACHMENT, ATTRIBUTE selects if the HTML input is found in an attachment or attribute of the record
outputType String : ATTACHMENT, ATTRIBUTE selects if the plain text should be stored in an attachment or attribute of the record
inputName String name of input attachment or path to input attribute (process literals of attribute)
outputName String name of output attachment or path to output attribute for plain text (store result as literals of attribute)
removeContentTagsÄÄ String comma separated list of HTML tags (case insensitive) for which the complete content should be removed from the resulting plain text. If not set, it defaults to "applet,frame,object,script,style". If the value is set, you must add the default tags explicitly to have their contents removed, too.
meta:<name> String: attribute path store the content of the <META> tag with name="<name>" (case insensitive) to the attribute named as the value of the property. E.g. a property named "meta:author" with value "authors" causes the content attributes of <META name="author" content="..."> tags to be stored in the attribute authors of the respective record.

Example

This configuration extracts plain text from the HTML document in attachment "html" and stores it in the attribute "text". It removes the complete content of heading tags <h1>, ..., <h4>. Additionally it looks for <meta> tags with names "author" and "keywords" and stores their contents in attributes "authors" and "keywords", respectively:

<PipeletConfiguration xmlns="http://www.eclipse.org/smila/processor">
    <Property name="inputType">
        <Value>ATTACHMENT</Value>
    </Property>
    <Property name="outputType">
        <Value>ATTRIBUTE</Value>
    </Property>
    <Property name="inputName">
        <Value>html</Value>
    </Property>
    <Property name="outputName">
        <Value>text</Value>
    </Property>
    <Property name="meta:author">
        <Value>authors</Value>
    </Property>
    <Property name="meta:keywords">
        <Value>keywords</Value>
    </Property>
    <Property name="removeContentTags">
        <Value>h1,h2,h3,h4</Value>
    </Property>
</PipeletConfiguration>


org.eclipse.smila.processing.pipelets.CopyPipelet

Description

This pipelet can be used to copy a String value between attributes and/or attachments. It suppoprts two execution modes:

  • COPY: copy the value from the input attribute/attachment to thee output attribute/attachment
  • MOVE: same as COPY, but after that delete the value from the input attribute/attachment

Configuration

Property Type Description
inputType String : ATTACHMENT, ATTRIBUTE selects if the input is found in an attachment or attribute of the record
outputType String : ATTACHMENT, ATTRIBUTE selects if output should be stored in an attachment or attribute of the record
inputName String name of input attachment or path to input attribute (process a String literal of attribute)
outputName String name of output attachment or path to output attribute for plain text (store result as String literal of attribute)
mode String : COPY, MOVE execution mode. Copy the value or move (copy and delete) the value. Default is COPY.

Example

This configuration shows how to copy the value of attachment 'Content' into the attribute 'TextContent':

<!-- copy txt from attachment to attribute -->
<extensionActivity name="invokeCopyContent">
    <proc:invokePipelet>
        <proc:pipelet class="org.eclipse.smila.processing.pipelets.CopyPipelet" />
        <proc:variables input="request" output="request" />
        <proc:PipeletConfiguration>
            <proc:Property name="inputType">
                <proc:Value>ATTACHMENT</proc:Value>
            </proc:Property>				       
            <proc:Property name="outputType">
                <proc:Value>ATTRIBUTE</proc:Value>
            </proc:Property>
            <proc:Property name="inputName">
                <proc:Value>Content</proc:Value>
            </proc:Property>
            <proc:Property name="outputName">
                <proc:Value>TextContent</proc:Value>
            </proc:Property>       
            <proc:Property name="mode">
                <proc:Value>COPY</proc:Value>
            </proc:Property>       
        </proc:PipeletConfiguration>       								
    </proc:invokePipelet>
</extensionActivity>


org.eclipse.smila.processing.pipelets.SubAttributeExtractorPipelet

Description

Extracts Literal values from an attribute that has a nested MObject. The attributes in the nested MObject can have nested MOBjects themselves. To address a attribute in the nested structure a path needs to be specified. The pipelet supports different execution modes:

  • FIRST: selects only the first Literal of the specified attribute
  • LAST: selects only the last Literal of the specified attribute
  • ALL_AS_LIST: selects all Literal values of the specified attribute and returns a list
  • ALL_AS_ONE: selects all Literal values of the specified attribute and concatenates them to a single string, using a seperator (default is blank)

This Pipelet works only on attributes, not attachments!

Configuration

Property Type Description
pathDelimiter String specifies the delimiter used to separate attribute names in inputPath
inputPath String the path along nested MObjects to the attribute with Literals
outputName String name of the attribute to store the extracted value(s) in
mode String : FIRST, LAST, ALL_AS_LIST, ALL_AS_ONE execution mode. See above for details.
separator String the separation string used for mode ALL_AS_ONE. Default is a blank

Example

This configuration can be applied to records provided by the FeedAgent. It shows how to access the subattribute 'Value' of attribute 'Contents', concatenating all values to one:

<extensionActivity name="invokeContentExtraction">
    <proc:invokePipelet>
        <proc:pipelet class="org.eclipse.smila.processing.pipelets.SubAttributeExtractorPipelet" />
        <proc:variables input="request" output="request" />
        <proc:PipeletConfiguration>
            <proc:Property name="pathDelimiter">
                <proc:Value>/</proc:Value>
            </proc:Property>
            <proc:Property name="inputPath">
                <proc:Value>Contents/Value</proc:Value>
            </proc:Property>
            <proc:Property name="outputName">
                <proc:Value>Content</proc:Value>
            </proc:Property>
            <proc:Property name="mode">
                <proc:Value>ALL_AS_ONE</proc:Value>
            </proc:Property>					    
        </proc:PipeletConfiguration>       								
    </proc:invokePipelet>
</extensionActivity>

Back to the top