From: Michael Gubanov (mgubanov@cs.washington.edu)
Date: Mon Apr 19 2004 - 11:17:03 PDT
Paper: Relational Databases for Querying XML Documents: Limitations and
Opportunities,
J. Shanmugasundaram, K. Tufte, G. He, C. Zhang, D. DeWitt,
J. Naughton
Summary: The paper describes and evaluates new algorithms to
automatically
map XML DTDs to relational schemas and after that leverage the existing
power
of relational engine by translating queries over original XML data to
SQL queries
against created relational schema and exporting the result back to XML.
Main Ideas:
The main challenge in "conservative" approach (round-tripping through
RDB)
the authors selected is to be able to build an effective mapping from
XML DTD
to RDB schema. This further impacts query performance a lot and finally
the ability to leverage existing RDB power (indexes, query optimizer,
etc).
- Simplification technique of DTD before convertion was proposed
- Notion of DTD graph was proposed and used as a basis for three
DTD translation algorithms
Three new algorithms operating on DTD graph were proposed and evaluated:
- Basic inlining. Drawback: Creates too many target relations, which
results
in large number of SQL queries to generate when translating XML query
- Shared inlining. Drawback: Creates way less relations than basic, but
this results
in way too many joins in translated SQL query
- Hybrid inlining: Shared + inlining additional elements thus
alleviating
the drawbacks of shared and basic
- Semi-structured to SQL query conversion techniques for
simple path queries, simple recursive path queries and
arbitrary path expressions
- Conversion of SQL query output to XML
Flows: It would be probably worth taking some specific Internet/Intranet
XML application and let the colors of Hybrid algorithm shine even
brighter
on the specific example.
Relevance: The problem is definitely urgent for today's number of
XML-enabled Internet/Intranet applications and agility of growing
number of Web-Services.
This archive was generated by hypermail 2.1.6 : Mon Apr 19 2004 - 11:17:02 PDT