Marvel Comics #1

Contents:

Introduction

In this assignment, you will put the graph you designed in the graph homework to use by modeling the Marvel Comics universe. By trying out your Graph ADT in a client application, you will be able to test the usability of your design as well as the correctness, efficiency, and scalability of your implementation.

This application builds a graph containing thousands of nodes and edges. The size of the graph might expose performance issues that weren't revealed by your unit tests on smaller graphs. With a well-designed implementation, your program will run in a matter of seconds. Bugs or suboptimal data structures can increase the runtime to anywhere from several minutes to 30 minutes or more. If this is the case, you may you will need to go back and revisit your graph implementation from earlier. Remember that different graph representations have widely varying time complexities for various operations and this, not a coding bug, may explain the slowness.

This Marvel dataset was also used by researchers at Cornell University, who published a research paper showing that the graph is strikingly similar to “real-life” social networks.

On top of this, you will start building a TypeScript version of the popular puzzle Connect the Dots using React. This part isn't related to any other parts of this assignment, but will help you in future assignments.

The MarvelPaths Application

In this application, your graph models a social network among characters in Marvel comic books. Each node in the graph represents a character, and an edge ⟨Char1,Char2⟩ indicates that Char1 appeared in a comic book that Char2 also appeared in. There should be a separate edge for every comic book any two characters appear in, labeled with the name of the book. For example, if Zeus and Hercules appeared in five issues of a given series, then Zeus would have five edges to Hercules, and Hercules would have five edges to Zeus.

You will write a class MarvelPaths (create the file yourself: MarvelPaths.java) that reads the Marvel data from a file (marvel.csv), builds a graph, and finds paths between characters in the graph.

As you complete this assignment, you may you will need to modify the implementation and/or the specification of your Graph ADT. You are welcome to do this - you're not "locked in" to any of your previous decisions. We encourage you to think carefully about any changes you might make, and use the same thought process you used when originally creating your design to evaluate any potential changes.

Leave your graph in the graph package where it was originally written, even if you modify it for this assignment. There is no need to copy files nor duplicate code! You can just import and use it in this assignment. If you do modify your graph code, be sure to commit those changes your repository.

For the summer 2021 offering of the course, we will also provide you with an api that computes the shortest path between two superheros using the Breadth-first search (BFS) algorithm for part 3. Additionally, there is no part 5 for this offering of the course.

Part 0: Understanding the Marvel Universe Data

Before you get started, look at the file with the Marvel data: hw-marvel/src/main/resources/data/marvel.csv. A CSV (“comma-separated values”) file consists of human-readable data delineated by commas, and can be opened in your favorite text editor, spreadsheet, or IntelliJ. Each line in marvel.csv is of the form

character,book

where character is the name of a character, book is the title of a comic book that the character appeared in, and the two fields are separated by a comma (,).

Part 1: Parsing the Data

The first step in your program is to parse the provided csv files. We have provided a starter file MarvelParser.java and you must complete the implementation.

We've provided a helper method readLines that returns a List<String> where each element in the list is one line of the file that has been opened. You can use this method to read in the raw data in the file and then process it according to the format of the file. You'll need to complete the implementation of parseData to parse the data from the file into some useful data structure(s) and return them to your main program (in MarvelPaths.java) to be assembled into your graph. We have provided a CSV parser demo here to help you get started.

Note: When the readLines method is called (such as in the provided code), we can just pass it the simple filename of the file and it'll take care of looking in the correct directory (src/resources/data/). For example: readLines("marvel.csv"). If you write any parsing code elsewhere in this course (this is unlikely for most students), you should write code similar to the implementation of readLines here to ensure that it works across projects/assignments and in testing. See the comments in readLines for more info.

Part 2: Building the Graph

The next step in your program is to construct your graph of the Marvel universe from a data file. Once you have completed MarvelParser.parseData, you need to convert the data from it into a Graph. This will be specific to your implementation of a Graph, but most likely will involve looping through all the data returned by the parser and calling methods on your Graph class(es) to construct nodes/edges and link them together appropriately. Remember that, for each comic book relationship between Character A and Character B, there should be two edges: one from Character A to Character B, and one the other way around, from Character B to Character A.

Note: You should not create edges from a character to themself as part of this process.

At this point, it's a good idea to test the parsing and graph-building operation in isolation. Verify that your program builds the graph correctly before continuing by completing the relevant parts of Part 4.

Part 3: Finding Paths

The real meat of MarvelPaths is the ability to find paths between two characters in the graph. Given the name of two characters, MarvelPaths should search for and return a path through the graph connecting them. How the path is subsequently used, or the format in which it is printed out, depends on the requirements of the particular application using MarvelPaths, such as your test driver (see below).

Your program should return the shortest path found via breadth-first search (BFS). A BFS from node u to node v visits all of u's neighbors first, then all of u's neighbors' neighbors, then all of u's neighbors' neighbors' neighbors, and so on until v is found or all nodes with a path from u have been visited. For this offering of CSE331, we are providing you with a library containing a method that computes the shortest path using BFS for you. The BFS method provided to you handles the sorting feature for you. You just need to make sure not to re-sort the results in a different order. However, you should still be able to check whether the results returned to you is correct by writing appropriate and sufficient tests. You may reference section 6 material for the BFS algorithm.

Many character pairs will be connected by multiple shortest paths with the same length. For grading purposes, your program should return the lexicographically (alphabetically) least path. More precisely, it should pick the lexicographically first character at each next step in the path, and if those characters appear in several comic books together, it should print the lexicographically least title of a comic book that they both appear in. The library provided to you handles the sorting feature for you. You just need to make sure not to re-sort the results in a different order.

Because this search is specific to this particular application, you should implement your BFS wrapper method in MarvelPaths rather than directly in your graph, as other hypothetical applications that might need BFS probably would not need this special ordering. Furthermore, other applications using the graph ADT might need to use a different search algorithm, so we don't want to hard-code a particular search algorithm in the graph ADT itself.

Using the full Marvel dataset, your program must be able to construct the graph and find a path in less than 30 seconds on attu. ScriptFileTests, the provided class used to run your script tests, is set to put a 30000 ms (30 second) timeout for each test. (For reference, the staff solution took around 5 seconds to run on attu).

If your Graph ADT has a particularly thorough or expensive checkRep function, it is possible that it could take too long to load the Marvel dataset when all checks are performed. A good checkRep function should include some way to disable particularly expensive checks by setting a compile-time or run-time variable. You must disable the expensive checkRep tests in the version you submit for grading so it will pass all tests in the allotted time. It's recommended to simply disable the entire checkRep.

Here are the steps you need to take to implement our library:

Inside your Graph ADT

  1. Change your graph implementation to implement the StringGraphNeighbors interface provided by us, by adding implements StringGraphNeighbors after public class [your graph class name] in your graph implementation. This will initially create an error as IntelliJ will notice that you need to implement the outgoingEdges method which is missing from your class. You will address this in the next step.
  2. Implement outgoingEdges from the StringGraphNeighbors interface using its documentation. You may find the documentation under docs/javadoc/index.html in your repository. Note that you will have to translate the edges returned from your implementation to be of type StringGraphNeighbors.Edge

Inside your MarvelPaths class

  1. Implement a path finding wrapper method that calls the static method Paths.shortestStringPath(). Note that you will have to translate the edges returned from our implementation back to edges in your implementation.

Note that the above interfaces/classes/methods in our library have a sufficient documentation. As a part of this assignment you need to read and understand the documentation in order to figure out how you need to use the provided code in your implementation.

Part 4: Testing Your Solution

Because the Marvel graph contains thousands of nodes and hundreds of thousands of edges, you should not use it for correctness testing (such as ensuring that your code computes the correct path between two characters), because it will be too hard to determine the correct path in most cases. By contrast, the full data set is useful for scalability testing or to detect corner cases, after you have performed correctness testing using much smaller graphs that you create. Additionally, it is important to be able to test your parsing/graph-building and BFS operations in isolation and separately from each other. For these reasons, you should use a test driver to verify the correctness of both your parser and BFS algorithm on much smaller graphs.

Note: all of your hw-graph tests still exist, so there's no need to re-create any of those tests for hw-marvel. Your hw-marvel tests should be focused on the behavior your MarvelPaths (and related) code has.

Script Tests

You should write script tests using test scripts similar to those you wrote in the graph homework. The format is defined in the test file format section below. In addition to writing *.test and *.expected files as before, you will write *.csv files in the format defined for marvel.csv to test your parser. All the *.csv files used for testing should be placed in the hw-marvel/src/main/resources/data directory. Place your test scripts in hw-marvel/src/test/resources/testScripts/.

You should create a class named MarvelTestDriver that runs tests in the same way that GraphTestDriver did. We have provided a starter file in hw-marvel/src/test/java/marvel/scriptTestRunner with method stubs and a small amount of starter code, but you can replace it. You may copy your graph/src/test/java/graph/scriptTestRunner/GraphTestDriver.java source file to hw-marvel/src/test/java/marvel/scriptTestRunner/MarvelTestDriver.java, change it to package marvel.scriptTestRunner, and revise it to match this assignment's file format specification. This will cause duplicated code between the two assignments, but avoids some tricky issues with subclassing (especially when trying to subclass test classes).

JUnit Tests

Your script tests will most likely cover the bulk of your testing. You may need to test additional behaviors specific to your implementation, such as handling of edge cases. If so, write these tests in a regular JUnit test class or classes in the marvel.junitTests package. If your data fails to load in your JUnit tests, see the File Paths section for information about resource files. Be sure to run validation on attu with ./gradlew hw-marvel:validate to verify that the data files are still found successfully under the configuration in which we will run your tests. (And to make sure that your test run in a reasonable amount of time.)

Running Tests

To run your tests, we recommend using the IntelliJ JUnit testing integration. Make sure you've set up your IntelliJ to properly run tests using gradle - see the Editing/Compiling/Running/Testing handout for more setup instructions. Once you've done so, you can right-click on any of your JUnit tests and choose "Run <filename>" to run the tests in that file. To run all the tests in a folder, you can right click a folder and choose "Run Tests in <folder>". The script tests (.test and .expected files) are run by ScriptFileTests, so right-click and run that file to run all of your script tests.

You can also always use the gradle hw-marvel:test target in the gradle window to run all tests. (Under cse331 > hw-marvel > Tasks > verification > test in the IntelliJ gradle window).

Please be extra sure that your IntelliJ is properly configured to use the gradle test runner before trying to run tests using any of these methods. (See the editing/compiling handout if you haven't already done this.) Running tests with the IntelliJ runner instead of the Gradle runner will likely result in "no tests found" when trying to run your script tests. This requires extra work to undo beyond just fixing the settings after the first test run, as IntelliJ will remember the original settings you had when you ran the tests the first time, even if you change them later. If you find yourself in this situation, please contact the staff for instructions on fixing IntelliJ so it will always use the Gradle runner.

Part 5: Running Your Solution From the Command Line

This part is omitted for the summer 2021 offering of the class.

Do not implement a main method as it would prevent our validation scripts from performing correctly.

A surprising number of students forget to do this part of the assignment. Please don't let that be you!

Add a main method to MarvelPaths that allows a user to interact with your program from the command line (i.e., your code should read user input from System.in). The program should read pairs of character names and then print to the console the path between those two characters if any, or else print a suitable message (your code should write to System.out). There is no rigid specification here for input or output formats, but especially creative ones may receive a small amount of extra credit.

At a minimum, your program should make it clear to the user how they are supposed to input the character names, and there should be some reasonable way to quit the program when they are done (that doesn't involve just crashing).

To run your program, a user would run the hw-marvel:runMarvel gradle task. For this task in particular, it is highly recommended that you run the gradle task using gradlew from the command line, as we've experienced bugs with IntelliJ's built-in grade task runner when reading input from System.in. Mac/Linux users should run the following from the IntelliJ Terminal window:

./gradlew hw-marvel:runMarvel

Windows users should run the following:

gradlew.bat hw-marvel:runMarvel

Part 6: HW Dots Part 1

You will start writing a User Interface(UI) to let users run a version of a Connect the Dots drawing puzzle.

You will start writing the UI in *.tsx files under hw-dots/src. You may create additional files as needed and may edit provided files. Each file may contain no more than a single React class/component. Be sure to add any new files to your repository as you commit your changes. You may also add .ts files if you choose to write any other code (i.e. helper functions) that aren't a part of a specific component. (You aren't required to have separate .ts files if you don't want to.)

The interface provided by the staff has the following elements:

Important Note

Much of the functionality in this starter code is only half-written, often with hard-coded values instead of actual interactivity. The code is full of unused variables and half-implemented functions for you to complete. For example, the “edges” text box currently can't be edited at all. It's your job to read and modify the provided code to complete the functionality of the application.

The starter code is half-written. This means you will need to add, change, or remove props, state, and functions in the components. You are not required to keep our existing code - please change it if you can think of a better design. Here is a demo you can reference when you are finishing this part of the assignment.

Tasks for Part 6

  1. Open your repository with IntelliJ. From the project root directory (which you can get to by opening the terminal window in IntelliJ), run the following two commands:
    cd hw-dots
    npm install
                

    It's not uncommon for this installation to take 15+ minutes, depending on the speed of your network connection and your computer, so don't worry if it seems to be taking a while. You may get a few warnings and messages, but nothing particularly tragic should be printed to the screen. You'll only need to do this installation once for each place you have your repository cloned. To test your installation, run npm start. This will run your React application, which currently only has the starter code that we provided. When it's done starting up, a browser window will automatically open the website.

    If the browser didn't automatically open, or the wrong browser opened, you can manually navigate to http://localhost:3000 to view your site. Now that the React server is up and running, you can edit the code and the website will auto-reload without you having to stop and re-start the React server. When you're done with development, you can stop the running react server by clicking in the terminal window where it's running and pressing Control-C.

  2. Read ALL of the code. Twice. Make sure you understand it. We can't stress that enough. If you make sure you fully understand how the data is flowing (or not flowing when it should be), it will be much easier to complete this assignment. We recommend you experiment by making small changes to the code to test your understanding and make sure things are behaving the way you think they should be. If it helps, draw a picture of how the components are related to each other and how the data travels between them. Once you are confident you understand how the basic functionality of this application works, you can move on to the next step. If you're not sure what's going on, re-read the Advanced User Interface section in the React handout, look at lecture or section materials, or ask a question on the discussion board or in office hours.

  3. Currently, the text field inside GridSizePicker is able to be edited. The actual number that was entered into the text box is stored in the state of App, in state.gridSize. A string version of it is passed down to be displayed by the GridSizePicker through its value prop, and a callback function that updates the state is passed into the GridSizePicker through the onChange prop. Whenever the text field has a change, the GridSizePicker's onInputChange method is called, which parses the text inside the text field into an integer, then calls the onChange function it got as a prop from App to tell it about the new data. (This is similar to the standard “fully-controlled component” model described in the Advanced User Interface section of the React handout.) The current value of the grid size is also passed as a prop to Grid, so it can be used when rendering the grid onto the canvas. It's your job to make use of that information so the Grid component automatically changes the way it's displayed based on the current grid size.

    1. Update the render method in Grid to make sure the “Current Grid Size: X” text properly displays the current value of the gridSize prop.

    2. Update the getCoordinates method in Grid so that it dynamically calculates the coordinates of all the points that need to be drawn by redraw, and returns those points in the same format as getCoordinates already does. If the current gridSize is 5, it should return an array of 25 2-element arrays, each with an x and y coordinate representing the location of one of the 25 points in a 5x5 grid. The exact spacing and arrangement of the grid is your choice, but it should be centered, and fill a reasonable amount of space on the image (i.e. not be tiny, but also fit completely within the canvas). Note that the drawCircle function already takes care of making the circles smaller or bigger depending on how many there are, though you're welcome to modify this behavior if you have a better idea. :)

    3. Your Grid component should react (get it? React? :P) immediately to the changing text field, no button presses or other user action should be required. (Understanding the React lifecycle methods will be helpful here.) Note: Most students will not need to modify their code after completing 2b to meet this requirement - this is just listed here to make sure everyone understands what the requirements are for their final product.

    4. You should provide additional input validation beyond what is already provided. The text area's max and min attributes prevent using the arrow-buttons inside the text field from going beyond the limit, but there's nothing stopping the user from typing whatever number they want into the text field unless you add code to protect your application from bad input. You should notify the user of their error and give them a chance to correct whatever incorrect input they caused. Your application should support at least a grid of 100x100, but be careful to create a sensible maximum — your app can crash if it tries to calculate and draw a 1000x1000 grid of 10^6 dots, for example.

    5. It should be possible to completely delete the text in the grid size picker without causing an error. This is for usability reasons, it's frustrating for a user to be unable to delete everything in a text box so they can enter a new value. You will likely need to test if the string is the empty string in your parsing/validation logic and have a special case for this. It's ok for the grid to have "size zero" (be blank) for this time, or just keep using whatever size the grid used to be before the user cleared the text box. In order to do this, you might need to separate the string being displayed in the text box from the number that's tracking the current grid size in App. (Right now, the string being displayed in the text box is just the result of calling toString on App's gridSize state.) For example, you could have a piece of state (a string) in GridSizePicker that keeps track of the current string in the text box, and then use the App state to track the current grid size, as it's already doing. That way, you could have App's current grid size be zero, for example, while the GridSizePicker's displayed string is the empty string.

    6. You now have the basic functionality of the Grid component complete. We will finish the rest of the UI in Part 2.

Using TypeScript

It can be tempting, when the TypeScript compiler is complaining about something, to simply add the any annotation to whatever is causing your problem to get the compiler to be quiet. There are plenty of reasonable uses for any, but you should always carefully evaluate whether you are simply masking a bug before you use it.

For example, it's generally fine (in this course) to replace complex names that come from React or HTML with any to keep the code simple. There's a trade-off here - the code may be somewhat simpler, but (a) you get less help from the TypeScript compiler, since it can't catch bugs that it can't see, and (b) you get less help from IntelliJ, since it doesn't know what kind of object it's dealing with. It's up to you to make a choice about what types you want to write out, and what types you'll leave as any. TypeScript is a tool - the more you use it, the more it will help you.

At a minimum, you should attempt to use explicit types for the following:

One note - some TypeScript compiler errors can be long or confusing. When you encounter errors like this, resist the urge to randomly change the code in the hopes that it will go away. By not understanding the error behind what you wrote, you are making yourself more likely to make the same mistake in the future, and you're robbing yourself the opportunity to understand why the compiler said what it said, and why the code that was there had a problem. When you're faced with an error you don't understand, here are some tips for figuring out what's going on:

Test script file format

The format of the test file is similar to the format described in the graph assignment. As before, the test driver manages a collection of named graphs and accepts commands for creating and operating on those graphs.

Each input file has one command per line, and each command consists of whitespace-separated words, the first of which is the command name and the remainder of which are arguments. Lines starting with # are considered comment lines and should be echoed (copied) to the output when running the test script. Lines that are blank should cause a blank line to be printed to the output.

The behavior of the testing driver on malformed input command files is not defined; you may assume the input files are well-formed.

In addition to all the same commands (and the same output) as in the graph assignment, the test driver for this homework accepts the following new commands:

LoadGraph graphName file.csv

Creates a new graph named graphName from file.csv, where file.csv is a data file of the format defined for marvel.csv and is located in the src/main/resources/data/ directory of your project. The command's output is

loaded graph graphName

You may assume file.csv is well-formed; the behavior for malformed input files is not defined.

Filenames supplied to the LoadGraph command should be simple, meaning they do not include the directory in which they are located. For example, marvel.csv is a simple filename; src/main/resources/data/marvel.csv is not.

FindPath graphName Char_1 Char_N

Find the shortest path from Char_1 to Char_2 in the graph using your breadth-first search algorithm.

Paths should be chosen using the lexicographic ordering described in Part 3. If a path is found, the command prints the path in the format:

path from CHAR_1 to CHAR_N:
CHAR_1 to CHAR_2 via BOOK 1
CHAR_2 to CHAR_3 via BOOK 2
...
CHAR_N-1 to CHAR_N via BOOK N-1

where CHAR_1 is the first node listed in the arguments to FindPath, CHAR_N is the second node listed in the arguments, and BOOK K is the title of a book that CHAR_K and CHAR_K+1 appeared in.

It is possible that given two character names there is no path in the data between them. If the user gives two valid node arguments CHAR_1 and CHAR_2 that have no path in the specified graph, print:

path from CHAR_1 to CHAR_2:
no path found

If a character name CHAR was not in the original dataset, print:

unknown: CHAR

If neither character is in the original dataset, print the line twice: first for CHAR 1, then for CHAR N (even if they are the same character). These should be the only lines your program prints in this case — i.e., do not print the regular "path not found" output, and do not print the "path from CHAR 1 to CHAR N" output.

What if the user asks for the path from a character in the dataset to itself? Print:

path from C to C:

but nothing else. (Hint: a well-written solution requires no special handling of this case.)

This applies only to characters in the dataset: a request for a path from a character that is not in the dataset to itself should print the the usual "unknown: C" output, including printing it twice, since "both" characters could not be found.

Sample testing files

Several sample test files demonstrating the new commands are provided in hw-marvel/src/test/resources/testScripts/. Your .test and .expected files should also go in this directory alongside the examples.

Paths to Files

Since your parsing code will likely always be using the CSV reader in MarvelParser, and that class is declared in src/main, it will always look for data files in src/main/resources/data/ instead of src/test/resources/data/. This might seem a little odd, since some of your testing files (the data files) will be in 'main' instead of 'test.' We've done this to greatly simplify the infrastructure required for getting these files to work across projects (as you start using your parsing code for future assignments.)

In an industry setting, you'd have a separate parsing system capable of finding files only in 'src/test', but for 331 you should have all your data files in src/main/resources/data.

Note that your .test and .expected files should be in 'src/test/...' as usual, it's just any .csv files you create that should go in 'src/main/...'.

Hints

Performance

If your program takes an excessively long time to construct the graph for the full Marvel dataset, first make sure that it parses the data and builds the graph correctly for a very small dataset. If it does, here are a few questions to consider:

Be aware that machine specs affect not only how fast your program runs but also how much memory it is allowed to use (more precisely, the maximum heap size).

Javascript Hints

Miscellaneous

As always, remember to:

How to Turn In Your Homework

Refer to the Assignment Submission Handout and closely follow the steps listed to submit your assignment. Do not forget to double check your submission as described in that handout - you are responsible for any issues if your code does not run when we try to grade it.

Use the tag name hw6-final for this assignment. To verify your assignment on attu, use the gradle task: hw-marvel:validate.

For Part 6, ./gradlew validate will not validate your homework assignment. The staff strongly recommends you use the latest, standard version of Google Chrome to test your code. Your UI must load when we use Google Chrome to visit http://localhost:3000 after running npm start from hw-dots/. For this part, there are no new files that you must submit as part of the assignment, unless you add any new files yourself. Make sure to commit any changes and additional files you create. Don't forget to note any additional functionality in extras.txt.

Your TA should be able to find the following in your repository: