Notice: This Wiki is now read only and edits are no longer possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.
Difference between revisions of "SMILA/Documentation/Bundle org.eclipse.smila.processing.pipelets"
m |
|||
Line 1: | Line 1: | ||
+ | This page describes the SMILA pipelets provided by bundle <tt>org.eclipse.smila.processing.pipelets</tt>. | ||
+ | |||
== org.eclipse.smila.processing.pipelets.CommitRecordsPipelet == | == org.eclipse.smila.processing.pipelets.CommitRecordsPipelet == | ||
=== Description === | === Description === | ||
− | Commits each record in the input variable on the blackboard to the storages. Can be used to save the records immediately during the workflow instead of only when a workflow has been finished. | + | Commits each record in the ''input'' variable on the blackboard to the storages. Can be used to save the records immediately during the workflow instead of only when a workflow has been finished. |
=== Configuration === | === Configuration === | ||
Line 13: | Line 15: | ||
=== Description === | === Description === | ||
− | Sets a configurable annotation on each record in the input variable. This can be used to control the operation of services and pipelets that look at special annotations to distinguish between different operation modes. | + | Sets a configurable annotation on each record in the ''input'' variable. This can be used to control the operation of services and pipelets that look at special annotations to distinguish between different operation modes. |
<blockquote> | <blockquote> | ||
− | Since annotations on the root metadata object of records can now be set inline in the invokeService or invokePipelet activity (see [[SMILA/Documentation/BPEL Workflow Processor]]), this pipelet is only needed to set annotations on attributes. This means probably that is will not be used very much (-; | + | Since annotations on the root metadata object of records can now be set inline in the <tt><invokeService></tt> or <tt><invokePipelet></tt> activity (see [[SMILA/Documentation/BPEL Workflow Processor]]), this pipelet is only needed to set annotations on attributes. This means probably that is will not be used very much (-; |
</blockquote> | </blockquote> | ||
Line 26: | Line 28: | ||
!Description | !Description | ||
|- | |- | ||
− | |Name | + | |''Name'' |
|String | |String | ||
|Name of annotation ot set | |Name of annotation ot set | ||
|- | |- | ||
− | |AnonValue | + | |''AnonValue'' |
|String | |String | ||
|an anonymous value of the annotation. Can occur multiple times. | |an anonymous value of the annotation. Can occur multiple times. | ||
|- | |- | ||
− | |NamedValue:<name> | + | |''NamedValue:<name>'' |
|named value of the annotation for name <name> | |named value of the annotation for name <name> | ||
|- | |- | ||
− | |Path | + | |''Path'' |
|String : attribute path | |String : attribute path | ||
|Path to attribute to attach the annotation to. If not set, annotation is set on the root metadata object of the record. The index in the final step of the path is irrelevant, the annotation if always attached to the attribute, not on contained literals or objects. | |Path to attribute to attach the annotation to. If not set, annotation is set on the root metadata object of the record. The index in the final step of the path is irrelevant, the annotation if always attached to the attribute, not on contained literals or objects. | ||
Line 44: | Line 46: | ||
==== Example ==== | ==== Example ==== | ||
− | The following example was used in the | + | The following example was used in the ''AddPipeline'' of the SMILA example application to set an annotation that advises LuceneService in the following invocation to add the records to the index: It creates an annotation named <tt>org.eclipse.smila.lucene.LuceneService</tt> with a named value ''executionMode=ADD'' (see documentation of ''LuceneService'' for details): |
<source lang="xml"> | <source lang="xml"> | ||
Line 78: | Line 80: | ||
!Description | !Description | ||
|- | |- | ||
− | inputType | + | ''inputType'' |
− | |String : ATTACHMENT, ATTRIBUTE | + | |String : ''ATTACHMENT, ATTRIBUTE'' |
|selects if the HTML input is found in an attachment or attribute of the record | |selects if the HTML input is found in an attachment or attribute of the record | ||
|- | |- | ||
− | |outputType | + | |''outputType'' |
− | |String : ATTACHMENT, ATTRIBUTE | + | |String : ''ATTACHMENT, ATTRIBUTE'' |
|selects if the plain text should be stored in an attachment or attribute of the record | |selects if the plain text should be stored in an attachment or attribute of the record | ||
|- | |- | ||
− | |inputName | + | |''inputName'' |
|String | |String | ||
|name of input attachment or path to input attribute (process literals of attribute) | |name of input attachment or path to input attribute (process literals of attribute) | ||
|- | |- | ||
− | |outputName | + | |''outputName'' |
|String | |String | ||
| name of output attachment or path to output attribute for plain text (store result as literals of attribute) | | name of output attachment or path to output attribute for plain text (store result as literals of attribute) | ||
|- | |- | ||
− | | | + | |''removeContentTagsÄÄ |
|String | |String | ||
− | |comma separated list of HTML tags (case insensitive) for which the complete content should be removed from the resulting plain text. If not set, it defaults to "applet,frame,object,script,style". If set, the default tags | + | |comma separated list of HTML tags (case insensitive) for which the complete content should be removed from the resulting plain text. If not set, it defaults to ''"applet,frame,object,script,style"''. If the value is set, you must add the default tags explicitly to have their contents removed, too. |
|- | |- | ||
− | |meta:<name> | + | |''meta:<name>'' |
|String: attribute path | |String: attribute path | ||
− | |store the content of the <META> tag with name="<name>" (case insensitive) to the attribute named as the value of the property. E.g. a property named "meta:author" with value "authors" causes the content attributes of <META name="author" content="..."> tags to be stored in the attribute | + | |store the content of the <tt><META></tt> tag with ''name="<name>"'' (case insensitive) to the attribute named as the value of the property. E.g. a property named ''"meta:author"'' with value "authors" causes the content attributes of <tt><META name="author" content="..."></tt> tags to be stored in the attribute ''authors'' of the respective record. |
|} | |} | ||
==== Example ==== | ==== Example ==== | ||
− | This configuration extracts plain text from the HTML document in attachment "html" and stores it in the attribute "text". It removes the complete content of heading tags <nowiki><h1>, ..., <h4></nowiki>. Additionally it looks for <meta> tags with names "author" and "keywords" and stores their contents in attributes "authors" and "keywords", respectively: | + | This configuration extracts plain text from the HTML document in attachment ''"html"'' and stores it in the attribute ''"text"''. It removes the complete content of heading tags <tt><nowiki><h1>, ..., <h4></nowiki></tt>. Additionally it looks for <tt><meta></tt> tags with names ''"author"'' and ''"keywords"'' and stores their contents in attributes ''"authors"'' and ''"keywords"'', respectively: |
<source lang="xml"> | <source lang="xml"> |
Revision as of 10:43, 16 October 2008
This page describes the SMILA pipelets provided by bundle org.eclipse.smila.processing.pipelets.
Contents
org.eclipse.smila.processing.pipelets.CommitRecordsPipelet
Description
Commits each record in the input variable on the blackboard to the storages. Can be used to save the records immediately during the workflow instead of only when a workflow has been finished.
Configuration
none.
org.eclipse.smila.processing.pipelets.SetAnnotationPipelet
Description
Sets a configurable annotation on each record in the input variable. This can be used to control the operation of services and pipelets that look at special annotations to distinguish between different operation modes.
Since annotations on the root metadata object of records can now be set inline in the <invokeService> or <invokePipelet> activity (see SMILA/Documentation/BPEL Workflow Processor), this pipelet is only needed to set annotations on attributes. This means probably that is will not be used very much (-;
Configuration
Property | Type | Description |
---|---|---|
Name | String | Name of annotation ot set |
AnonValue | String | an anonymous value of the annotation. Can occur multiple times. |
NamedValue:<name> | named value of the annotation for name <name> | |
Path | String : attribute path | Path to attribute to attach the annotation to. If not set, annotation is set on the root metadata object of the record. The index in the final step of the path is irrelevant, the annotation if always attached to the attribute, not on contained literals or objects. |
Example
The following example was used in the AddPipeline of the SMILA example application to set an annotation that advises LuceneService in the following invocation to add the records to the index: It creates an annotation named org.eclipse.smila.lucene.LuceneService with a named value executionMode=ADD (see documentation of LuceneService for details):
<extensionActivity name="setAnnotations"> <proc:invokePipelet> <proc:pipelet class="org.eclipse.smila.processing.pipelets.SetAnnotationPipelet" /> <proc:variables input="request" /> <proc:PipeletConfiguration> <proc:Property name="Name"> <proc:Value>org.eclipse.smila.lucene.LuceneService</proc:Value> </proc:Property> <proc:Property name="NamedValue:executionMode"> <proc:Value>ADD</proc:Value> </proc:Property> </proc:PipeletConfiguration> </proc:invokePipelet> </extensionActivity>
org.eclipse.smila.processing.pipelets.HtmlToTextPipelet
Description
Extract plain text and metadata from an HTML document in an attribute or attachment of each record and writes it to configurable attributes or attachments.
The pipelet uses the CyberNeko HTML parser NekoHTML to parse HTML documents.
Configuration
Property | Type | Description |
---|---|---|
String : ATTACHMENT, ATTRIBUTE | selects if the HTML input is found in an attachment or attribute of the record | |
outputType | String : ATTACHMENT, ATTRIBUTE | selects if the plain text should be stored in an attachment or attribute of the record |
inputName | String | name of input attachment or path to input attribute (process literals of attribute) |
outputName | String | name of output attachment or path to output attribute for plain text (store result as literals of attribute) |
removeContentTagsÄÄ | String | comma separated list of HTML tags (case insensitive) for which the complete content should be removed from the resulting plain text. If not set, it defaults to "applet,frame,object,script,style". If the value is set, you must add the default tags explicitly to have their contents removed, too. |
meta:<name> | String: attribute path | store the content of the <META> tag with name="<name>" (case insensitive) to the attribute named as the value of the property. E.g. a property named "meta:author" with value "authors" causes the content attributes of <META name="author" content="..."> tags to be stored in the attribute authors of the respective record. |
Example
This configuration extracts plain text from the HTML document in attachment "html" and stores it in the attribute "text". It removes the complete content of heading tags <h1>, ..., <h4>. Additionally it looks for <meta> tags with names "author" and "keywords" and stores their contents in attributes "authors" and "keywords", respectively:
<PipeletConfiguration xmlns="http://www.eclipse.org/smila/processor"> <Property name="inputType"> <Value>ATTACHMENT</Value> </Property> <Property name="outputType"> <Value>ATTRIBUTE</Value> </Property> <Property name="inputName"> <Value>html</Value> </Property> <Property name="outputName"> <Value>text</Value> </Property> <Property name="meta:author"> <Value>authors</Value> </Property> <Property name="meta:keywords"> <Value>keywords</Value> </Property> <Property name="removeContentTags"> <Value>h1,h2,h3,h4</Value> </Property> </PipeletConfiguration>