CSE 544: Database Management Systems -- Mini-project
Overview
The goal of the mini-project is to allow
you to experiment on your own with some techniques or
systems discussed in class. The workload should be
comparable to 1 or 2 homework assignments.
Organization
Teams of size 1 or 2 or 3.
Deliverables
- Project proposal (suggested length: 1 page)
- Major milestone (suggested length: 3-4 pages)
- Project presentations: 10' (+ 2' questions)
- Final report (suggested length: 5-8 pages)
Submit by placing the pdf file in the project
directory of your Gitlab repository.
Topic
This is a mini-project, hence
limited in scope. It is also open-ended: you are encouraged
to deviated from our suggestions and come up with your own
ideas.
Suggestions for choosing a topic
- Preferred: choose a data management problem related to your
own research.
- Reproduce some of the experiments in a research
paper. Examples:
- Reproduce figures 7 and 8 from here. Variations: use a different
system rather than postgres, or use a different
dataset rather than the JOB benchmark.
- Reproduce figures 2 and 3
from here. HyPer is not available: you need to
choose some other DBMS instead.
- Reproduce figure 16
from here.
Notice that umbra is available only in binary.
- Compare different storage formats
as here
or here
- Pick your favorite paper, reproduce those
experiments. [We may add more suggestions here]
- Compare the performance of 2 or more DBMS on some
standard workloads.
- Suggestions for workloads: JOB, STATS, Subgraph Matching (SM), or the tried and tested TPC benchmarks.
- Suggestion for systems: Postgres, DuckDB, SQLServer, MySQL, SQLite, Oracle, Snowflake (basically anything that you have access to, and have interest in using).
- Implement an algorithm and measure its performance.
Suggestions:
- Implement Wander
Join (an efficient algorithm for sampling from joins).
- Implement Worst-case Optimal Join (an efficient multi-join algorithm for cyclic queries, which we plan to discuss in class). To keep the project short, you may choose to use an eager index instead of the lazy hash-trie.
- Add some new functionality to a system. E.g.
- Add a progress indicator to a query
engine (e.g. to DuckDB).
- Add a simple graphical output to a query engine;
e.g. instead of representing the output of
select
A,count(*) ... as a table, represent it as a
bar graph.
P0: Project Teams October 17, 2025 (0 Points)
Decide if you want to work alone, or team up with some
colleague(s) in class; teams can be up to 3 stduents. Write
your team here.
P1: Project Proposal October 31 2025 (10 Points): Details
Please submit a pdf file. Suggested length: 1 page.
- Your name or names (please use the name you used in
Canvas)
- A title of your project
- Short description of what you want to do (could be as
short as 2-3 sentences)
- What system are you planning to use? And do you have
access to it? E.g. if you are planning to use Redshift,
do you have AWS credits? Or will use an open-source
system?
- What data are you planning to use? Same here: tell us
if you have access to the data, have looked at it, and
whether you have checked that you can get it and use it
for your mini-project.
- What are you planning to report? E.g. you plan to
show two graphs, the first is the runtime as a function of
the data size, the second is a graph showing the runtime
as a function of the record size. If you are less sure,
you can be more vague, but do make an effort to think
about what you would like to report.
- References: cite any papers that you are planning to
use in your mini-project.
You may change the plan in your proposal: we do not
enforce the final report to follow the proposal strictly.
The goal of the proposal is to force you to start thinking
about the project, and it's OK if you change your mind
later.
P2: Major Milestone November 21 2025 (40 points): Details
Please submit a pdf file. Suggested length: 3-4 pages.
Aim to have the structure of the final report, but parts
of it are only tentative. Structure your milestone as
follows:
- Your name or names (please use the name you used in
Canvas)
- The title of your project
- Describe the problem you are solving.
- Describe the system(s) that you are experimenting with.
- Describe the data data you are using.
- Results: 2-3 graphs.
- Discussion of the results: this is the most
interesting part of your min-project.
- For team projects: describe what each team
member did in the project.
- References: cite any papers that you used in your
mini-project.
- Some parts may be incomplete: in that case
describe the plan for finalizing that part.
One-on-One Project Meetings December 1st 2025
I plan to meet one-on-one with each team. More details TBD
P3: Project Presentations December 3rd 2025 (40 points): Details
We will organize a mini-workshop, where every project team
can present their work. More details TBD.
- Date: December 3rd
- You will probably have 10', possibly 12' for your presentation
- Plan about 1 slide per minute. Suggested structure:
- Title slide with your name(s) and affiliations
- Problem description. 1-3 slides. Examples:
- Are you reproducing some experiments? Describe
what that paper is trying to solve.
- Are you benchmarking several systems?
Describe what question you are interested in. ("Which
of the systems X, Y, Z has the highest throughput for
applications of type BLAH?")
- Are you implementing a specific algorithm?
Describe the motivation behind it.
- Background: 1-2 slides. Describe any techniques
that the audience should know about in order to follow
your talk.
- Methodology 1-2 slide. Describe your approach.
- Findings 1-3 slides: shows us your experimental results.
- Discussion. This is important: summarize
the main take away points for the audience to
remember.
- We will have 2 sessions: 2-5:30 and 6:00-9:20. Please
try to stay for your entire session (about 3.5 hours).
The goal is for you to ask questions, and engage in
discussions. We will have a poll where you will vote
for the best presentation in your session (and there
will be small prizes for the winners).
P4: Final Report December 8, 2025 (10 points): Details
Submit a pdf file. Suggested length: 5-8 pages. Same
structure as the Major Milestone, with all parts completed now.