Skip to main content

Notice: This Wiki is now read only and edits are no longer possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Jump to: navigation, search

SMILA/Specifications/Search Processing

< SMILA‎ | Specifications
Revision as of 10:30, 16 January 2009 by Unnamed Poltroon (Talk) (New page: == The Issue == While writing SMILA/Specifications/Search API I have recognized that the way of processing defined for import processes is not sufficient for processing searches. The...)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

The Issue

While writing SMILA/Specifications/Search API I have recognized that the way of processing defined for import processes is not sufficient for processing searches. The reason is mainly that in search we have two seperate kinds of objects for processing: the query object and the current result set. And we may have services that need so see both kinds of objects, e.g. highlighting or adaptation rule services. On the other hand the processing service/pipelet API should still be simple for implementors that do not care about a query object but just want to process the current set of objects. And these pipelets should also still be usable in search pipelines to process the query object (before the actual retrieval) or the result objects (afterwards).

The Proposal

This proposal is not yet completely finished. There are some open issues with it, so feedback is welcome.

The basic idea to introduce a second Pipelet/Service interface that receives the ID of the effective query object:

interface SearchPipelet {
    Id[] process(Blackbaord bb, Id[] records, Id query) throws ProcessingException;
}
 
interface SearchProcessingService {
    Id[] process(Blackbaord bb, Id[] records, Id query) throws ProcessingException;
}

WSDL extensions: Besides the standard ProcessorMessage and ProcessorPortType used in import pipeline, we need in search pipelines:

<message name="SearchProcessorMessage">
    <part name="query" element="rec:Record" />
    <part name="records" element="rec:RecordList" />
</message>
 
<portType name="SearchProcessorPortType">
    <operation name="search">
        <input message="proc:SearchProcessorMessage" name="in" />
        <output message="proc:SearchProcessorMessage" name="out" />
        <fault message="proc:ProcessorException" name="ex" />
    </operation>
</portType>

It is then the job of the ODE integration layer to coordinate route the elements of messages to the correct arguments in the service invocation:

  • The initial query is written to $request.query
  • Invocation of SimplePipelets/ProcessingServices (the old ones):
    • if the records part of the input message is empty, invoke the pipelet/service with the ID of the (single) record in the query part.
      • If the result is still a single record, write it to the query part of the output message (what about searches with resultSize=1 here - maybe an option in extension activity that forces the result being written to records? Have to think about it...)
      • Else create a records part in the output message from the result records.
    • Else, the pipelet/service is invoked with the IDs of the records in the message's records part
      • Result records are written to the records part of the output message.
  • Invocation of SearchPipelets/SearchProcessingServices:
    • the records part of the input message is used to construct the records ID list, the query part goes into the query argument.
    • the result records become the records part of the output message, if the query part of the output message is empty, the query part of the input message is copied.

Some Example BPEL pipelines

All used variables are assumed to be SearchProcessorMessages.

Simple Search Pipeline

<sequence>
    <extensionActivity name="invokeTextminer"> 
        <proc:invokeService>
            <proc:service name="TextminerService" />
            <proc:variables input="request" output="request" />     <!-- uses request.query -->
        </proc:invokeService>
    </extensionActivity>
    <extensionActivity name="invokeCompletionRules">
        <proc:invokeService>
            <proc:service name="CompletionRulesService" />
            <proc:variables input="request" output="request" />     <!-- uses request.query -->
        </proc:invokeService>
    </extensionActivity>
    <extensionActivity name="invokeSearchIndex">
        <proc:invokeService>
            <proc:service name="SearchIndexService" />
            <proc:variables input="request" output="request" />     
        <!-- use request.query to produce request.records, but doesn't need to know both,
             hence being a standard ProcessingService is sufficient -->
        </proc:invokeService>
    </extensionActivity>
    <extensionActivity name="invokeCompletionRules">
        <proc:invokeService>
            <proc:service name="CompletionRulesService" />
            <proc:variables input="request" output="request" />     <!-- use request.records -->
        </proc:invokeService>
    </extensionActivity>
    <extensionActivity name="invokeHighlighter">
        <proc:invokeService>
            <proc:service name="HighlighterService" />
            <proc:variables input="request" output="request" />     
        <!-- use request.query AND request.records,
             hence this really needs to be a SearchProcessingService -->
        </proc:invokeService>
    </extensionActivity>
</sequence>

Ok, so far.

Use search result as query

<sequence>
    ...
    <extensionActivity name="invokeIndexSearch">
        <proc:invokeService>
            <proc:service name="SearchIndexService" />
            <proc:variables input="request" output="request" />
        </proc:invokeService>
    </extensionActivity>
    <extensionActivity name="invokeIndexSearchAgain">
        <proc:invokeService>
            <proc:service name="SearchIndexService" />
            <proc:variables input="request" output="request" />     
        </proc:invokeService>
    </extensionActivity>
    ...
</sequence>

This would actually work if SearchIndex is a SimpleService:

  • first invocation uses request.query, because request.records is empty;
  • search result fills request.records;
  • second invocation uses first record of request.records, if something has been found in first search


A problematic use case: "Federated search"

<sequence>
    ...
    <flow> <!-- search in parallel -->
        <extensionActivity name="invokeSearchIndex1">
            <proc:invokeService>
                <proc:service name="SearchIndexService1" />
                <proc:variables input="request" output="result1" />
            </proc:invokeService>
        </extensionActivity>
        <extensionActivity name="invokeSearchIndex2">
            <proc:invokeService>
                <proc:service name="SearchIndexService2" />
                <proc:variables input="request" output="result2" />     
            </proc:invokeService>
        </extensionActivity>
        <extensionActivity name="invokeSearchIndex3">
            <proc:invokeService>
                <proc:service name="SearchIndexService3" />
                <proc:variables input="request" output="result3" />     
            </proc:invokeService>
        </extensionActivity>
    </flow>
 
    <!-- how to merge the record parts of the resultX.messages? />
    ...
 
</sequence>

The merging could probably be done using XSLT. But if we wanted to introduce a MergePipelet, we problably need a third kind of pipelet interface allowing to receive an arbitrary number of record lists:

interface AdvancedPipelet {
    Id[] process(Id[][] recordLists) throws ProcessingException;

and calling it via an extended <invokePipelet> activity:

        ...
        <extensionActivity name="invokeResultMerge">
            <proc:invokePipelet>
                <proc:pipelet class="org.eclipse.smila.pipelets.SearchResultMergePipelet" />
                <proc:variables input="result1" output="mergeResult">
                  <proc:variable input="result2"/>
                  <proc:variable input="result3"/>
                </proc:variables>
            </proc:invokePipelet>
        </extensionActivity>
        ...

Back to the top