Skip to main content

Notice: This Wiki is now read only and edits are no longer possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Jump to: navigation, search

Difference between revisions of "SMILA/Specifications/Search Processing"

(New page: == The Issue == While writing SMILA/Specifications/Search API I have recognized that the way of processing defined for import processes is not sufficient for processing searches. The...)
 
 
(6 intermediate revisions by the same user not shown)
Line 1: Line 1:
== The Issue ==
+
{{Note|Obsolete:
 
+
Because processing services have been removed.}}
While writing [[SMILA/Specifications/Search API]] I have recognized that the way of processing defined for import processes is not sufficient for processing searches. The reason is mainly that in search we have two seperate kinds of objects for processing: the query object and the current result set. And we may have services that need so see both kinds of objects, e.g. highlighting or adaptation rule services. On the other hand the processing service/pipelet API should still be simple for implementors that do not care about a query object but just want to process the current set of objects. And these pipelets should also still be usable in search pipelines to process the query object (before the actual retrieval) or the result objects (afterwards).
+
 
+
== The Proposal ==
+
 
+
''This proposal is not yet completely finished. There are some open issues with it, so feedback is welcome.''
+
 
+
The basic idea to introduce a second Pipelet/Service interface that receives the ID of the effective query object:
+
 
+
<source lang="java">
+
interface SearchPipelet {
+
    Id[] process(Blackbaord bb, Id[] records, Id query) throws ProcessingException;
+
}
+
 
+
interface SearchProcessingService {
+
    Id[] process(Blackbaord bb, Id[] records, Id query) throws ProcessingException;
+
}
+
</source>
+
 
+
WSDL extensions: Besides the standard ProcessorMessage and ProcessorPortType used in import pipeline, we need in search pipelines:
+
 
+
<source lang="xml">
+
<message name="SearchProcessorMessage">
+
    <part name="query" element="rec:Record" />
+
    <part name="records" element="rec:RecordList" />
+
</message>
+
 
+
<portType name="SearchProcessorPortType">
+
    <operation name="search">
+
        <input message="proc:SearchProcessorMessage" name="in" />
+
        <output message="proc:SearchProcessorMessage" name="out" />
+
        <fault message="proc:ProcessorException" name="ex" />
+
    </operation>
+
</portType>
+
</source>
+
 
+
It is then the job of the ODE integration layer to coordinate route the elements of messages to the correct arguments in the service invocation:
+
 
+
* The initial query is written to <tt>$request.query</tt>
+
* Invocation of SimplePipelets/ProcessingServices (the old ones):
+
** if the <tt>records</tt> part of the input message is empty, invoke the pipelet/service with the ID of the (single) record in the <tt>query</tt> part.
+
*** If the result is still a single record, write it to the <tt>query</tt> part of the output message (what about searches with resultSize=1 here - maybe an option in extension activity that forces the result being written to <tt>records</tt>? Have to think about it...)
+
*** Else create a <tt>records</tt> part in the output message from the result records.
+
** Else, the pipelet/service is invoked with the IDs of the records in the message's <tt>records</tt> part
+
*** Result records are written to the <tt>records</tt> part of the output message.
+
* Invocation of SearchPipelets/SearchProcessingServices:
+
** the <tt>records</tt> part of the input message is used to construct the records ID list, the <tt>query</tt> part goes into the query argument.
+
** the result records become the <tt>records</tt> part of the output message, if the <tt>query</tt> part of the output message is empty, the <tt>query</tt> part of the input message is copied.
+
 
+
== Some Example BPEL pipelines ==
+
 
+
All used variables are assumed to be SearchProcessorMessages.
+
 
+
=== Simple Search Pipeline ===
+
 
+
<source lang="xml">
+
<sequence>
+
    <extensionActivity name="invokeTextminer">
+
        <proc:invokeService>
+
            <proc:service name="TextminerService" />
+
            <proc:variables input="request" output="request" />    <!-- uses request.query -->
+
        </proc:invokeService>
+
    </extensionActivity>
+
    <extensionActivity name="invokeCompletionRules">
+
        <proc:invokeService>
+
            <proc:service name="CompletionRulesService" />
+
            <proc:variables input="request" output="request" />    <!-- uses request.query -->
+
        </proc:invokeService>
+
    </extensionActivity>
+
    <extensionActivity name="invokeSearchIndex">
+
        <proc:invokeService>
+
            <proc:service name="SearchIndexService" />
+
            <proc:variables input="request" output="request" />   
+
        <!-- use request.query to produce request.records, but doesn't need to know both,
+
            hence being a standard ProcessingService is sufficient -->
+
        </proc:invokeService>
+
    </extensionActivity>
+
    <extensionActivity name="invokeCompletionRules">
+
        <proc:invokeService>
+
            <proc:service name="CompletionRulesService" />
+
            <proc:variables input="request" output="request" />    <!-- use request.records -->
+
        </proc:invokeService>
+
    </extensionActivity>
+
    <extensionActivity name="invokeHighlighter">
+
        <proc:invokeService>
+
            <proc:service name="HighlighterService" />
+
            <proc:variables input="request" output="request" />   
+
        <!-- use request.query AND request.records,
+
            hence this really needs to be a SearchProcessingService -->
+
        </proc:invokeService>
+
    </extensionActivity>
+
</sequence>
+
</source>
+
 
+
Ok, so far.
+
 
+
=== Use search result as query ===
+
 
+
<source lang="xml">
+
<sequence>
+
    ...
+
    <extensionActivity name="invokeIndexSearch">
+
        <proc:invokeService>
+
            <proc:service name="SearchIndexService" />
+
            <proc:variables input="request" output="request" />
+
        </proc:invokeService>
+
    </extensionActivity>
+
    <extensionActivity name="invokeIndexSearchAgain">
+
        <proc:invokeService>
+
            <proc:service name="SearchIndexService" />
+
            <proc:variables input="request" output="request" />   
+
        </proc:invokeService>
+
    </extensionActivity>
+
    ...
+
</sequence>
+
</source>
+
 
+
This would actually work if SearchIndex is a SimpleService:
+
* first invocation uses request.query, because request.records is empty;
+
* search result fills request.records;
+
* second invocation uses first record of request.records, if something has been found in first search
+
 
+
 
+
=== A problematic use case: "Federated search" ===
+
 
+
<source lang="xml">
+
<sequence>
+
    ...
+
    <flow> <!-- search in parallel -->
+
        <extensionActivity name="invokeSearchIndex1">
+
            <proc:invokeService>
+
                <proc:service name="SearchIndexService1" />
+
                <proc:variables input="request" output="result1" />
+
            </proc:invokeService>
+
        </extensionActivity>
+
        <extensionActivity name="invokeSearchIndex2">
+
            <proc:invokeService>
+
                <proc:service name="SearchIndexService2" />
+
                <proc:variables input="request" output="result2" />   
+
            </proc:invokeService>
+
        </extensionActivity>
+
        <extensionActivity name="invokeSearchIndex3">
+
            <proc:invokeService>
+
                <proc:service name="SearchIndexService3" />
+
                <proc:variables input="request" output="result3" />   
+
            </proc:invokeService>
+
        </extensionActivity>
+
    </flow>
+
   
+
    <!-- how to merge the record parts of the resultX.messages? />
+
    ...
+
   
+
</sequence>
+
</source>
+
 
+
The merging could probably be done using XSLT. But if we wanted to introduce a MergePipelet, we problably need a third kind of pipelet interface allowing to receive an arbitrary number of record lists:
+
 
+
<source lang="java">
+
interface AdvancedPipelet {
+
    Id[] process(Id[][] recordLists) throws ProcessingException;
+
</source>
+
 
+
and calling it via an extended <invokePipelet> activity:
+
 
+
<source lang="xml">
+
        ...
+
        <extensionActivity name="invokeResultMerge">
+
            <proc:invokePipelet>
+
                <proc:pipelet class="org.eclipse.smila.pipelets.SearchResultMergePipelet" />
+
                <proc:variables input="result1" output="mergeResult">
+
                  <proc:variable input="result2"/>
+
                  <proc:variable input="result3"/>
+
                </proc:variables>
+
            </proc:invokePipelet>
+
        </extensionActivity>
+
        ...
+
</source>
+
 
+
[[Category:SMILA]]
+

Latest revision as of 07:51, 19 January 2012

Note.png
Obsolete: Because processing services have been removed.

Back to the top