CSEP 544: Database Management Systems -- Mini-project
Overview
The goal of the mini-project is to allow
you to experiment on your own with some techniques or
systems discussed in class. The workload should be
comparable to one homework assignment.
Organization
Teams of size 1 or 2 or 3.
Deliverables
- Project proposal (suggested length: 1 page)
- Major milestone (suggested length: 3-4 pages)
- Project presentations: 10' (+ 2' questions)
- Final report (suggested length: 5-8 pages)
Submit by placing the pdf file in the project
directory of your Gitlab repository.
Topic
This is a mini-project, hence
limited in scope. It is also open-ended: you are encouraged
to deviated from our suggestions and come up with your own
ideas.
Suggestions for choosing a topic
- Reproduce some of the experiments in a research
paper. Examples:
- Reproduce figures 7 and 8 from here. Variations: use a different
system rather than postgres, or use a different
dataset rather than the JOB benchmark.
- Reproduce figures 2 and 3
from here. HyPer is not available: you need to
choose some other DBMS instead.
- Reproduce figure 16
from here.
Notice that umbra is available only in binary.
- Compare different storage formats
as here
or here
- Pick your favorite paper, reproduce those
experiments. [We may add more suggestions here]
- Compare the performance of 2 or more DBMS on some
standard workloads.
- Suggestions for
workloads: JOB, STATS, Subgraph
Matching (SM), or the tried and
tested TPC
benchmarks.
- Suggestion for systems: Postgres, DuckDB, SQLServer, MySQL, SQLite, Oracle, Snowflake (basically anything that you have access to, and have interest in using).
- Implement an algorithm and measure its performance.
Suggestions:
- Implement Wander
Join (an efficient algorithm for sampling from joins).
- Implement Worst-case
Optimal Join (an efficient multi-join algorithm
for cyclic queries, which we plan to discuss in
class). To keep the project short, you may choose to
use an eager index instead of the lazy hash-trie.
- Add some new functionality to a system. E.g.
- Add a progress indicator to a query
engine (e.g. to DuckDB).
- Add a simple graphical output to a query engine;
e.g. instead of representing the output of
select
A,count(*) ...
as a table, represent it as a
bar graph.
- Preferred Implement some exploratory project
that is useful to your work. E.g. try out a new system,
or try out a new algorithm.
Project Proposal: Details
Please submit a pdf file. Suggested length: 1 page.
- Your name or names (please use the name you used in
Canvas)
- A title of your project
- Short description of what you want to do (could be as
short as 2-3 sentences)
- What system are you planning to use? And do you have
access to it? E.g. if you are planning to use Redshift,
do you have AWS credits? Or will use an open-source
system?
- What data are you planning to use? Same here: tell us
if you have access to the data, have looked at it, and
whether you have checked that you can get it and use it
for your mini-project.
- What are you planning to report? E.g. you plan to
show two graphs, the first is the runtime as a function of
the data size, the second is a graph showing the runtime
as a function of the record size. If you are less sure,
you can be more vague, but do make an effort to think
about what you would like to report.
- References: cite any papers that you are planning to
use in your mini-project.
You may change the plan in your proposal: we do not
enforce the final report to follow the proposal strictly.
The goal of the proposal is to force you to start thinking
about the project, and it's OK if you change your mind
later.
Major Milestone: Details
Please submit a pdf file. Suggested length: 3-4 pages.
Aim to have the structure of the final report, but parts
of it are only tentative. Structure your milestone as
follows:
- Your name or names (please use the name you used in
Canvas)
- The title of your project
- Describe the problem you are solving.
- Describe the system(s) that you are experimenting with.
- Describe the data data you are using.
- Results: 2-3 graphs.
- Discussion of the results: this is the most
interesting part of your min-project.
- For team projects: describe what each team
member did in the project.
- References: cite any papers that you used in your
mini-project.
- Some parts may be incomplete: in that case
describe the plan for finalizing that part.
Project Presentations: Details
We will organize a mini-workshop, where every project team
can present their work.
- Date: March 14, 2025, 2pm -
9:30pm. Date and time are
tentative.
- You will probably have 10', possibly 12' for your presentation
- Plan about 1 slide per minute. Suggested structure:
- Title slide with your name(s) and affiliations
- Problem description. 1-3 slides. Examples:
- Are you reproducing some experiments? Describe
what that paper is trying to solve.
- Are you benchmarking several systems?
Describe what question you are interested in. ("Which
of the systems X, Y, Z has the highest throughput for
applications of type BLAH?")
- Are you implementing a specific algorithm?
Describe the motivation behind it.
- Background: 1-2 slides. Describe any techniques
that the audience should know about in order to follow
your talk.
- Methodology 1-2 slide. Describe your approach.
- Findings 1-3 slides: shows us your experimental results.
- Discussion. This is important: summarize
the main take away points for the audience to
remember.
- Please try to attend the entire workshop, in addition
to presenting your project. Feel free to ask questions
and engage in discussions!
Final Report: Details
Submit a pdf file. Suggested length: 5-8 pages. Same
structure as the Major Milestone, with all parts completed now.