The Tukwila Execution Interface

Zack Ives, May 29, 1999

This document is a work in progress.  Please see the latest version at: http://www.cs.washington.edu/education/courses/544/CurrentQtr/tukwila_query.html

Introduction

The Tukwila query processing components are designed to be self-contained modules that can be swapped out as needed.  Each of the main components (reformulator, optimizer, execution engine, and wrappers) are separate code bases, each optionally in a different language and on a different platform.  A sockets-based communication interface with a standardized request model allow us to interchange parts.

Executing a Plan

The query execution engine constantly monitors port 7777 for query execution requests.   A request is simply a physical query plan in the format described later in this document.  The data transfer format is simply a 32-bit big-endian integer specifying the length of the plan, plus the ASCII text of the plan.

Once a query has been submitted to the execution system, it is expected that the optimizer will hold the socket open for the duration of query execution.  The optimizer should monitor the port for a status result string sent back by the optimizer.   (Again, this is sent as a 32-bit big-endian length plus a string of bytes.)

The query result string, which is described in a later section, will begin with an execution status.  This status should be used as the basis for the optimizer's decision to terminate successfully, terminate with error, or re-optimize a query.  If re-optimization is performed, the optimizer will simply send a new query plan through the same socket to the execution system.

Query Plans in Tukwila

The basic layout for a query plan is as follows:

  1. Header and user information
  2. Query requests (generally only one per plan):
    1. Fragments (may be multiple per plan):
      1. Query operator graph
      2. Fragment-local execution rules
    2. Global execution rules (i.e. rules across fragments)

More Detailed Sketch of a Plan

The header is fairly simple, and most of it can be repeated verbatim.  Much of the header information is for the benefit of XML parsers and for future use. Note that "auser" is the default user ID for many of the wrappers we have, so you should use that by default.

<?XML VERSION="1.00"?>
<!DOCTYPE "PLAN" SYSTEM "plan.dtd">

<PLAN ID=PlanName VERSION=1.00 ENCODING="PUBLICKEY">

<REQUESTOR>IP Address of source machine</REQUESTOR>
<REQUEST ID=RequestID RESULT=QueryRootNodeID USER="auser">

<FRAGMENT NUM=ExecutionStep>
<GRAPH>
    <OUTPUT NODE=FragmentRootNodeID />
    <N ID=NodeID OP=Operator attributes CARD=EstimatedSize>
    </N>
<!-- More nodes here -->
</GRAPH>
<RULES>
    <R ID=RuleID>
        <WHEN>Event</WHEN>
        <IF>ConditionTest</IF>
        <THEN>Action</THEN>
        <ELSE>Action</ELSE>
    </R>
<!-- More rules here -->
</RULES>
</FRAGMENT>
<!-- More fragments here -->

<!-- Global rules begin here -->
<RULES>
    <R ID=RuleID ACTIVE=state>
        <WHEN>Event</WHEN>
        <IF>ConditionTest</IF>
        <THEN>Action</THEN>
        <ELSE>Action</ELSE>
    </R>
<!-- More rules here -->
</RULES>
</REQUEST>
</PLAN>

Creating Plans for Tukwila

For the most part, constructing a plan is simply a matter of "filling in the template" above.  Each plan and request will get a unique ID.  The <REQUEST> element also has an attribute for the current user ID to use, and for specifying the root of the entire query plan.

Each fragment within a request is given a number representing an execution step.   Fragments are sequenced in order by execution step, and then executed from lowest to highest step number.  (A future extension to Tukwila will allow for multiple fragments to execute in the same step, but that is not currently supported.)

Within the fragment are an operator graph and a set of rules specific to the fragment.   The operator graph includes an <OUTPUT> element which specifies which node is the root of the fragment (and whose output should thus be written to disk), and the NODE parameter of this attribute must reference one of the nodes in the fragment graph.   Nodes are specified with the <N> element, and include operator type, expected cardinality of result, any children, and various other parameters.

The Operators

    <N ID=NodeID OP=JOIN TYPE=HASH CHILD=LeftChildNodeID
      CHILD=RightChildNodeID SPACE=KBytesToUse CARD=ExpectedCard>
        <WHERE>LeftChildNodeID.LeftChildAttrib =
        RightChildNodeID.RightChildAttrib</WHERE>
    </N>

    <N ID=NodeID OP=SELECT CHILD=ChildID CARD=ExpectedCard>
        <WHERE>ChildID.Attrib {relop} {expression}</WHERE>
    </N>

    <N ID=NodeID OP=PREFETCH TIMEOUT=TimeIn_ms CARD=ExpectedCard>
        <HOST>HostName:Port</HOST>
        <SERVICE>WrapperService</SERVICE>
        <QUERY>SQLQuery</QUERY>
    </N>
    <N ID=NodeID OP=WRAPPER TIMEOUT=TimeIn_ms CARD=ExpectedCard>
        <HOST>HostName:Port</HOST>
        <SERVICE>WrapperService</SERVICE>
        <QUERY>SQLQuery</QUERY>
    </N>

    <N ID=NodeID OP=PROJECT CHILD=ChildNodeID CARD=ExpectedCard>
        <ATTR>ChildNodeID.attribute</ATTR>
        <ATTR>ChildNodeID.attribute</ATTR>
...
    </N>

    <N ID=NodeID OP=COLLECTOR CHILD=ChildNodeID1 CHILD=ChildNodeID2
     CHILD=ChildNodeID3 SPACE=KBytesToUse CARD=ExpectedCard>
    </N>

Rules

A rule R is formally a quintuple <event, condition, action, active_flag, owner>, where the owner is a node, fragment, or plan; and the active_flag must be TRUE for the rule to be in the active set.

As an event E occurs, it is sent to the Tukwila event handler.  The event handler checks a hash table to see if any rules in the active set match E; if so, it places the event into the event queue.  Each event is individually dequeued in the order it was detected, and now all matching rules are triggered serially before the next event is processed.  The condition for each rule is tested, and if it is met the actions are executed sequentially.  If the optional ELSE clause is used, an alternate series of actions may be executed.  Formally, the rules' conditions should all be tested in parallel, and all actions should be executed in parallel; our implementation does both of these serially.

There are no synchronization or locking semantics, so rules are not allowed to have conflicting actions; the rule generator must prevent this from happening.    Rules are not executed for operators which have been terminated, in order to prevent deadlock and race conditions.  Once a rule has been executed, it becomes inactive, unless one of its composite actions is to reactivate the same rule.

Events

Conditions

Actions

Format

<RULES>
<R ID=ruleID [STATE=activity]>
<WHEN>event</WHEN>
<IF>condition</IF>
<THEN>action{;action}</THEN>
[<ELSE>action{;action}</ELSE></R>]
...
</RULES>

Execution Results

The query optimizer needs to constantly monitor the socket to see if the execution system has returned some sort of status message.  The format of the execution response is as follows:

Status (followed by newline):

One status line (with terminating newline) per node in the query plan, with the following items, each separated by a space:

A series of messages generated during parsing and execution, preceded by the header "EXECUTION_MESSAGES" and a newline.  These should be ignored by the optimizer, but are provided for debugging purposes.

A termination line reading "DONE" followed by a newline.   This marks the end of the results.

Sample Plan

<!--Sample Query execution plan for Tukwila System, zives 5/13/99 -->
<?XML VERSION="1.00"?>
<!DOCTYPE "PLAN" SYSTEM "plan.dtd">

<!-- Header: user, encryption info, etc. -->
<PLAN ID=Example VERSION=1.00 ENCODING="PUBLICKEY">

<!-- The requestor/request info isn't currently used, but is
     here for future expansion. The user is actually used by
     some wrappers -->
<REQUESTOR>127.128.129.130</REQUESTOR>
<REQUEST ID=One RESULT=test2 USER="auser">

<!-- Top-level fragment in this plan -->
<!-- Simply reads results of previous fragment -->
<FRAGMENT NUM=1>
<GRAPH>
    <OUTPUT NODE=test2/>
    <N ID=test2 OP=SCAN FILE=test CARD=600000>
    </N>
</GRAPH>
</FRAGMENT>

<!-- Lower-level fragment -->
<FRAGMENT NUM=0>
<GRAPH>
    <OUTPUT NODE=test/>
    <N ID=test OP=JOIN TYPE=HASH CHILD=projection CHILD=subresult
      SPACE=100000 CARD=600000>
        <WHERE>subresult.L_ORDERKEY = projection.O_ORDERKEY</WHERE>
    </N>
    <N ID=subresult OP=JOIN TYPE=HASH CHILD=selectli CHILD=supplier
      SPACE=30000 CARD=150000>
        <WHERE>selectli.L_SUPPKEY = supplier.S_SUPPKEY</WHERE>
    </N>
    <N ID=selectli OP=SELECT CHILD=lineitem CARD=600000>
        <WHERE>lineitem.L_SUPPKEY > 1000</WHERE>
    </N>
    <N ID=lineitem OP=WRAPPER TIMEOUT=10000 CARD=600000>
        <HOST>data.cs.washington.edu:7779</HOST>
        <SERVICE>TPCT.TPC</SERVICE>
        <QUERY>SELECT * FROM TPC.LINEITEM</QUERY>
    </N>
    <N ID=supplier OP=WRAPPER TIMEOUT=10000 CARD=15000>
        <HOST>data.cs.washington.edu:7779</HOST>
        <SERVICE>TPCT.TPC</SERVICE>
        <QUERY>SELECT * FROM TPC.SUPPLIER</QUERY>
    </N>
    <N ID = projection OP =PROJECT CHILD=order CARD=15000>
        <ATTR>order.O_ORDERKEY</ATTR>
        <ATTR>order.O_CUSTKEY</ATTR>
    </N>
    <N ID = order OP=PREFETCH TIMEOUT=10000 CARD=15000>
        <HOST>data.cs.washington.edu:7779</HOST>
        <SERVICE>TPCT.TPC</SERVICE>
        <QUERY>SELECT * FROM TPC.ORDER</QUERY>
    </N>

<!-- A collector looks like (assuming wrapper1-wrapper3 are node IDs):
    <N ID = misc OP=COLLECTOR CHILD=wrapper1 CHILD=wrapper2
     CHILD=wrapper3 SPACE=10000 CARD=15000>
    </N>
-->
</GRAPH>

<RULES>
<!-- Simple rule; not very useful, but demonstrates syntax -->
    <R ID=myRule>
        <WHEN>END_OF_FRAGMENT</WHEN>
        <IF>CARD(test) > 1000</IF>
        <THEN>REPLAN</THEN>
        <ELSE>ADD_MEMORY(test, 1)</ELSE>
    </R>
</RULES>
</FRAGMENT>

<!-- Global Rules; none here -->
<RULES>
</RULES>
</REQUEST>
</PLAN>