CSE 373, Summer 2019: Project Grading

Table of Contents

  1. Grading

  2. Interpreting feedback

  3. Regrades

Grading

We'll pull code from all assignment repositories at 12am when assignments are due (and again on the following days for late submissions).

Afterwards, grading is done automatically using JUnit tests—the same stuff you'll be using throughout the course! For this first assignment, we'll only grade according to our provided tests, but future assignments will also be graded on additional secret tests. (The content of these tests will not be released, even after the due date.) This means that passing all the provided tests does not guarantee 100% correctness. Instead, you should write your own test cases to help guarantee the correctness of your code. As the quarter progresses, we will provide fewer and fewer tests.

What will be graded?

We only grade the files that we explicitly say are to be modified (this information is located on the assignment page). This rule is also true for JUnit tests being graded. This also means that it is never safe to change the public interfaces of any code we provide: this includes (but is not limited to) adding new constructors, adding public helper methods, changing the expected behavior of public methods. Once again, if you change any file in a way that prevents your code from compiling when we're grading, we may not grade your code, and you'll get a zero on the assignment.

Implementations

We grade the code you implement based on the following categories:

  • Style (we run Checkstyle)
  • Correctness (we run the JUnit tests mentioned above)
  • Design (we run JUnit tests for these as well, so note that these results may not always be consistent. If you lose points for efficiency and cannot determine what the efficiency error would come from after talking to a course staff member, you can submit a grade adjustment request and we can take a closer look.)

Tests

We grade your test files by running them against some of our implementations of the tested code: 1 fully correct solution and several incorrect solutions.

We check that each incorrect solution fails at least one of your tests. If no problems are detected with an incorrect solution, points are deducted for that specific implementation. The fully correct solution (matches what the specification requires exactly) should pass all your tests. If it fails any test, you will receive a deduction and your tests will be rerun without the tests that failed. The reasoning behind this is that if tests say a correct solution is faulty, they'll almost definitely say buggy solutions are faulty, and would get all the rest of the points in this section for free (even if that test is completely wrong).

Note: If your implementation is wrong and your tests will likely end up wrong as well. You should review the specification (comments + assignment page) thoroughly to make sure your tests match what is required. You should also write tests to cover many possible cases so that you convince yourself your code is completely correct.

Note: If you submit a test without any timeout defined at all, we will add a timeout of 15 seconds. Additionally, if your test file takes longer than 1 minute to run all its tests, we may skip or delay grading your tests in order to give everyone timely feedback. (Since we need to run your tests once for each of our implementations, if every group uses 1 minute for each run, grading everyone's tests will take approximately 24 hours.)

Interpreting feedback

After we're done grading (typically sometime over the weekend), we'll push back to student repos a file containing feedback and scores for the assignment. Here's an example feedback file:

# Feedback

Group dragon: netid1, netid2

Commit hash: 89315958911ba3eed9a7e982447b83dd304a579c

Raw score: 18 / 40

## Checkstyle

Score: 0 / 5

-   FAIL: (weight=1.0) ClassName1.java:5:8 [UnusedImportsCheck]
        Unused import - misc.exceptions.EmptyContainerException.

## ClassName1

Score: 1 / 5

-   PASS: (weight=1.0) testName1
        Description: Sometimes, our tests will include descriptions of their contents.
-   FAIL: (weight=2.0) testName2
        Description: We try to include descriptions for tests with complex behavior or
            ambiguous names.
        java.lang.AssertionError: expected:<0> but was:<1>
            [stack trace truncated in this example...]
            at java.util.concurrent.FutureTask.run(FutureTask.java:266)
            at java.lang.Thread.run(Thread.java:748)

## ClassName2

Score: 10 / 10

-   PASS: (weight=1.0) testName3
-   PASS: (weight=1.0) testName4
-   PASS: (weight=0.5) testName5
-   PASS: (weight=1.0) testName6

## TestClassName1

Sometimes, we may also include some comments here for a section.
These are usually important, and you should read them.

Score: 0 / 10

-   FAIL: (weight=1.0) AllOk
        Test incorrectly reports correct implementation of ClassName1 as broken
        -   PASS: yourTestName1
        -   FAIL: yourTestName2
        -   PASS: yourTestName3

-   SKIP: (weight=1.0) BuggyCaseName1
-   SKIP: (weight=1.0) BuggyCaseName2

## TestClassName2

Score: 7 / 10

-   PASS: (weight=1.0) AllOk
-   PASS: (weight=1.0) BuggyCaseName3
-   FAIL: (weight=1.0) BuggyCaseName4
        Unable to find bug with ClassName2 with [bug description]

-   PASS: (weight=1.0) BuggyCaseName5

At the top of the file will be the group name and members, followed by the full commit hash of the graded commit. Afterwards are the scoring details. Note that the total score listed at the top of the file is only the raw score; in multi-part assignments, you will have to chance to earn points back by submitting a fixed version of code along with the next part.

Style grading is completely binary based on the output of Checkstyle. If there were any Checkstyle errors in the files you needed to change for the assignment, you will receive a 0 points; otherwise, you'll receive full credit. (You may ignore the weights listed in this section, as they aren't used for anything.)

Regular tests

The rest of the points will distributed between the different files in the assignment. Each section has some max score; if you pass all our provided tests and all our secret tests, you will get full credit; otherwise, you will lose points based on the tests you failed. The final score for each section is calculated as the weighted average of the tests for that section, scaled up to the maximum section score. In other words, the formula is as follows:

max_score * sum(weights_of_passing_tests) / sum(weights_of_all_tests)

This value is rounded down to the nearest integer.

For any tests you failed, the stack trace of the failure will also be included. Assersion errors will mean that your code failed some assertion in the test, whereas other errors and exceptions will show up if your code throws one when it shouldn't. If your test failed a timeout, however, this stack trace will probably not be very useful.

For secret tests, the test name and stack trace and will be all you have to go off of as you debug. We try to make sure our test names adequately describe the case being tested, but feel free to try asking at office hours if you need some more direction.

Test tests

If the assignment included a testing portion, it will show up below the regular class tests. The testing portion is slightly different in that the "tests" are really testing your tests by running different versions of code through your tests. The name of each test will be a brief description of the bug in our code, or AllOk if there is no bug.

Note that, as mentioned earlier, if the AllOk case does not pass, all other cases in the section will be skipped. The test results of our working code on the tests you submitted will be displayed instead. Also note that this section does not include stack traces for failures.

Other notes

Occasionally, we may need to make manual raw grade adjustments for groups; this won't happen very often, but when it does, we'll note the adjustment in a section at the top of the feedback file, including the score adjustment amount and reason.

Regrades

In multi-part projects (assignments that have two parts due on successive weeks), you will have the opportunity to regain up to half of the points lost in the first part by submitting a fixed version of them along with the second part. More precisely, your final part 1 grade will be calculated as follows:

max(raw_part_1_initial, mean(raw_part_1_initial, raw_part_1_regraded))

where raw_part_1_initial refers to the raw score for the initial version of the part 1 files, as written in the feedback file for part 1, and raw_part_1_regraded refers to the score for the version of those same files submitted when part 2 is turned in, as written in a separate regraded part 1 feedback file that will be pushed to your repo along with the part 2 feedback. This value is rounded up to the nearest integer.