CSEP590D: Project
In this class you will implement a project on some query optimization
or query execution topic. The goal of the project is to:
- Expose you to some advanced tools, e.g. some modern query
engines, or some rule-based query optimization techniques, or to some
tensor algebra systems, and
- Encourage you to try yourself some of the techniques discussed
in class.
Here are some suggested topics. These are not specifications, but
rather suggestions of things for you to explore.
- Add some optimization rule to an existing query optimizer. The
most annoying missing rule is the aggregate push-down: one
possibility is for you to add this rule to some optimizer.
- Re-design a query optimizer using the EGG equality saturation
system. There will be a tutorial on EGG on April 14. If you choose
this topic then most likely you will not integrate your
rule-based optimizer with the rest of the system, but may be able to
compare the query plan generated by the system with that generated
by EGG and your rules.
- EGG explores the search space using (roughly) a breadth-first
search strategy. Try extending EGG with other strategies
(e.g. Beam search).
- Design a rule-based query optimizer for a sparse tensor-algebra
system. Sparse tensors are the same as relations, and tensor
algebra expressions are best optimized using a cost-based,
rule-based query optimizer. This project can be a stand-along
system that takes as input a tensor algebra expression and produces
an optimized expression. It does not need to be integrated with an
exiting tensor-algebra system. Your system will take as input some
tensor-algebra expression and will return a rewritten, hopefully
more efficient, tensor-algebra expression. For example, the input
might be:
v = sum i,j: ((X(i,j) - U(i)*V(j))**2)
and your optimize code will be:
v = sum i,j: (X(i,j))**2 - 2 * sum i,j: X(i,j)*U(i)*V(j) + sum i: U(i)**2 + sum j: V(j)**2
- Most advanced SQL systems support recursive queries through the
Common Tale Expression (CTE) construct, i.e. the WITH construct.
But most limit recursion to a single recursive relation, and a
linear query (i.e. no self-joins). Extend one such system (e.g.
postgres) to support non-linear recursion, and/or mutual
recursion.
- Perform a comparison between different cloud-based database
management system. For example, you could compare Redshift,
BigQuery, and Snowflake. (This requires access to cloud resources:
we can help you with credits for Snowflake, but not for the others.)
Your comparison can be both quantitative, e.g. measure runtime on
1-2 benchmarks by varying the size of the data, the number of joins,
the complexity of the query (e.g. with or without subqueries), and
also qualitative, e.g. what features are supported or not by the
different systems.
The project is not meant to be limited in scope, and not involve major
engineering. You may start by attempting to extend an existing
system, only to discover that the extension requires more engineering
than you can do in one month: in that case it's OK for you to use some
workaround, by which you implement that functionality without
extending the system. The important thing is for you to gain
experience with the systems and/or the techniques rather than to
deliver some new functionality or some improved performance.
Suggestions for systems to try out:
If you are considering a benchmarking
project,
here
is an example of a benchmark. Your project may be a bit more limited
in scope, e.g. compare only 3-4 systems instead of 6.
Milestones
1. Project Proposal (suggested length: 1 page): April 29
Your proposal must include your name(s) and a project
title (it's OK to change it later)
Date is flexible if you really need another day or two
Describe very briefly what you want to do:
- What question are you planning to address?
- What system are you using?
- Do you plan to use any data in experiments? Describe it
briefly.
- What do you hope to report in your project?
- If you work in a team, tell us about how you are planning
to split the work.
2. Project Milestone (2-3 pages): May 13
Date is flexible if you really need another day or two
This is a preliminary draft of your final report. Summarize
what you did already and what you are still planning to do. If
you have any preliminary results (graphs or something else) show
them! You can change them later.
I am planning to meet with each of you individually shortly
after you submit the milestone.
In Person Meeting: Tuesday, May 17
I plan to meet with each project team individually and discuss your progress.
3. In-class presentation (5 or 10 minutes): June 2
Date is hard
On June 2, we will have a mini-conference. Everyone gets to
present their project. This may take more than 3 hours, please
plan ahead.
- Presentation
order: here
- Guidelines: here
4. Project Final Report (4-5 pages): June 6
Date is hard
- The final project report should be a complete report on your project.
- The report should be structured like a short workshop or
conference paper, e.g. problem definition, background, proposed
method, results, bibliography. The format can vary significantly
based on what project you have chosen.
- You do not need to submit any code
- Stay tuned for more details.