CSE 373, Winter 2019: Project Grading

Note: this page is still under construction, so new content or clarifications to existing content may be added; however, the current content will not change significantly without notice.

Grading (link)

When it comes time to grade your projects, we use automated JUnit tests to help us test your code—just like you do throughout this course! Even though we provide some JUnit tests to help guide your progress, you will also be graded on additional secret tests. This means that passing all the provided tests does not guarantee 100% correctness. Instead, you should write extra test cases to help guarantee you more and more points. As the quarter progresses, we will provide fewer tests. To ensure the same quality of testing later on, you will want to practicing testing concepts early.

What will be graded? (link)

We only grade the files that we explicitly say are to be modified (this information is located on the assignment page). This rule is also true for JUnit tests being graded. So, do not write any necessary code (helper methods) in other files than the ones we intend you to modify. Once again, if you change another file in a way that prevents your code from compiling when we're grading, we may not grade your code, and you'll get a zero on the assignment.

Implementations

We grade the code you implement based on the following categories:

  • Style (we run Checkstyle)
  • Correctness (we run the JUnit tests mentioned above)
  • Design (we run JUnit tests for these as well, so note that these results may not always be consistent. If you lose points for efficiency and cannot determine what the efficiency error would come from after talking to a course staff member, you can submit a grade adjustment request and we can take a closer look.)

Tests

We grade your test files by running them against some of our implementations of the tested code: 1 fully correct solution and several incorrect solutions.

We check that each incorrect solution fails at least one of your tests. If no problems are detected with an incorrect solution, points are deducted for that specific implementation. The fully correct solution (matches what the specification requires exactly) should pass all your tests. If it fails any test, you will receive a 0 for this testing section. The reasoning behind this is that if tests say a correct solution is faulty, they'll almost definitely say buggy solutions are faulty, and would get all the rest of the points in this section for free (even if that test is completely wrong).

Note: If your implementation is wrong and your tests will likely end up wrong as well. You should review the specification (comments + assignment page) thoroughly to make sure your tests match what is required. You should also write tests to cover many possible cases so that you convince yourself your code is completely correct.

Note: If you submit a test without any timeout defined at all, we will add a timeout of 15 seconds. Additionally, if your test file takes longer than 1 minute to run all its tests, we may skip or delay grading your tests in order to give everyone timely feedback. (Since we need to run your tests once for each of our implementations, if every group uses 1 minute for each run, grading everyone's tests will take approximately 24 hours.)

Interpreting feedback (link)

After we're done grading, we'll push back to student repos a file containing feedback on scores for the assignment. The file will follow the following format:

# Feedback

Group dragon: netid1, netid2

Commit hash: 89315958911ba3eed9a7e982447b83dd304a579c

Raw score: 18 / 40

## Checkstyle

Score: 0 / 5

-   FAIL: (weight=1.0) ClassName1.java:5:8 [UnusedImportsCheck]

        Unused import - misc.exceptions.EmptyContainerException.

## ClassName1

Score: 1 / 5

-   PASS: (weight=1.0) testName1
-   FAIL: (weight=2.0) testName2

        java.lang.AssertionError: expected:<0> but was:<1>
            [stack trace truncated in this example...]
            at java.util.concurrent.FutureTask.run(FutureTask.java:266)
            at java.lang.Thread.run(Thread.java:748)

## ClassName2

Score: 10 / 10

-   PASS: (weight=1.0) testName3
-   PASS: (weight=1.0) testName4
-   PASS: (weight=0.5) testName5
-   PASS: (weight=1.0) testName6

## TestClassName1

Sometimes, we may also include some comments here for a section.
These are usually important, and you should read them.

Score: 0 / 10

-   FAIL: (weight=1.0) AllOk

        Test incorrectly reports correct implementation of ClassName1 as broken

-   SKIP: (weight=1.0) BuggyCaseName1
-   SKIP: (weight=1.0) BuggyCaseName2

## TestClassName2

Score: 7 / 10

-   PASS: (weight=1.0) AllOk
-   PASS: (weight=1.0) BuggyCaseName3
-   FAIL: (weight=1.0) BuggyCaseName4

        Unable to find bug with ClassName2 with [bug description]

-   PASS: (weight=1.0) BuggyCaseName5

At the top of the file will be the group name and members, followed by the full commit hash of the graded commit. Afterwards are the scoring details. Note that the total score listed at the top of the file is only the raw score; in multi-part assignments, you will have to chance to earn points back by submitting a fixed version of code along with the next part.

Style grading is completely binary based on the output of Checkstyle. If there were any Checkstyle errors in the files you needed to change for the assignment, you will receive a 0 points; otherwise, you'll receive full credit. (You may ignore the weights listed in this section, as they aren't used for anything.)

Regular tests (link)

The rest of the points will distributed between the different files in the assignment. Each section has some max score; if you pass all our provided tests and all our secret tests, you will get full credit; otherwise, you will lose points based on the tests you failed. The final score for each section is calculated as the weighted average of the tests for that section, scaled up to the maximum section score. In other words, the formula is as follows:

max_score * sum(weights_of_passing_tests) / sum(weights_of_all_tests)

This value is rounded down to the nearest integer.

For any tests you failed, the stack trace of the failure will also be included. Assersion errors will mean that your code failed some assertion in the test, whereas other errors and exceptions will show up if your code throws one when it shouldn't. If your test failed a timeout, however, this stack trace will probably not be very useful.

For secret tests, the test name and stack trace and will be all you have to go off of as you debug. We try to make sure our test names adequately describe the case being tested, but feel free to try asking at office hours if you need some more direction.

Test tests (link)

If the assignment included a testing portion, it will show up below the regular class tests. The testing portion is slightly different in that the "tests" are really testing your tests by running different versions of code through your tests. The name of each test will be a brief description of the bug in our code, or AllOk if there is no bug.

Note that, as mentioned earlier, all other cases in these sections will be skipped if the AllOk case does not pass. Also note that this section does not include stack traces for failures.

Occasionally, we may need to make manual raw grade adjustments for groups; this won't happen very often, but when it does, we'll note the adjustment in a section at the top of the feedback file, including the score adjustment amount and reason.

Regrades (link)

In multi-part projects (assignments that have two parts due on successive weeks), you will have the opportunity to regain up to half of the points lost in the first part by submitting a fixed version of them along with the second part. More precisely, your final part 1 grade will be calculate as follows:

max(raw_part_1_initial, mean(raw_part_1_initial, raw_part_1_regraded))

where raw_part_1_initial refers to the raw score for the initial version of the part 1 files, as written in the feedback file for part 1, and raw_part_1_regraded refers to the score for the version of those same files submitted when part 2 is turned in, as written in a separate regraded part 1 feedback file that will be pushed to your repo along with the part 2 feedback. This value is rounded down to the nearest integer.