CSE 558 (Winter 2011) Beyond Programmable Shading

Beyond Programmable Shading

Homework 2 (due Wednesday February 9, 2011)

Note 1: The homework needs to be coded individually, but you may discuss parallelization strategies on the discussion forum and between each other.

Note 2: You may read/copy text and code from any source *except* other students in this class; however, you must cite all of your sources.

The goal of this assignment is to implement two different parallel implementations of the Reinhard tone mapping algorithm using Cilk (CPU) and DirectCompute (GPU).

With this assignment, I want you to share parallelization strategies and also hold an ongoing performance competition while working on the assignment. As such, please post your CPU and GPU tone mapping performance results to the course discussion forum (as reported by the timer displayed in the UI) as you make progress and feel free to help each other with tips/tricks/strategies for getting the best performance.

Starting / Skeleton Code

You will find starting code in /cse/courses/cse558/hw2_skeleton_cse558.zip. Everyone must use this code as a starting point. This zip file contains a complete Microsoft Visual Studio 2010 solution and project that should build-and-run on your course computer with no changes.

The application is a deferred renderer that supports a single, very bright, directional light. All lighting is stored in a high-dynamic-range 32-bit floating-point (RGBAfp32) framebuffer that must be tone-mapped down to RGBA8 (8 bits per channel) to display the image.

The UI supports toggling between CPU and GPU tonemapping, and shows the time required to compute the currently selected tone mapping implementation for each frame (in the upper left of the screen). Note that the CPU tonemapping is implemented (albeit slowly) but the GPU version currently does nothing. As such, switching to GPU tonemapping will show you the "blown-out" clamped version of the image. The UI also supports an "exposure" slider that controls the key value in the tone mapping algorithm.

You will modify two of the provided classes to complete the homework, ToneMapperCPU and ToneMapperGPU. The method ToneMapperCPU::ComputeReinhardToneMap(...) is a non-parallelized CPU implementation of the algorithm. You can find the points that you will need to extend / optimize by searching all files in the MSVS solution for the string "HW2 TODO".

Note that you will have to use the Intel Compiler to use Cilk. The project is set up by-default to use the Intel Compiler. You can switch back to MSVS C++ compiler by right-clicking on the "hw2_skeleton" project in the MSVS 2010 solution and selecting "Intel C++ Composer XE 2011."

Project Details

- Parallelize the CPU tone mapping implementation using Cilk for task parallelism. In addition to Cilk, you may re-write the code in any way you want and use any other tools/languages/libraries to help optimize the code. Full credit on this portion requires that the CPU tone mapping algorithm take less than 4 milliseconds per frame (as reported by the tone mapping timer provided in the UI). (40%)

- Implement a parallelized GPU version of the tonemapping algorithm using ComputeShader. A "stubbed-out" ComputeShader already exists in ToneMapper.hlsl (called by ToneMapperGPU::ExecuteToneMappingComputeShader) that you can use to get started. Full credit for this portion requires a parallelized ComputeShader implementation (i.e., implementing the algorithm in a single workitem/CS-thread does not count). (60%)

Extra credit

- The fastest (correct) CPU tone mapping implementation will receive 10 extra points. The 2nd and 3rd fastest will each receive 5 extra points.

- The fastest (correct) GPU tone mapping imlementation will recieve 10 extra points. The 2nd and 3rd fastest will each receive 5 extra points.

- Implement asynchronous GPU-CPU memory transfers so that the CPU and GPU are not stalled reading back the color buffer to the CPU for the CPU tone mapping implementation. This should result in a measurable reduction in frame time for the CPU path. Note you will need to asynchronously enqueue all CPU work that is dependent on the asynchronous readback in order to realize the speedup. Note that you may (or may not?) need to limit Cilk to use 1-2 less threads than are available on your machine to make sure the GPU driver (CPU) thread(s) can continue working while Cilk is running (see the Cilk documentation).(15 points).

- Implement a parallelized ray tracer that adds ray traced shadows to the spheres scene (not the powerplant scene). Note that you only need to cast shadows from the spheres, not the groundplane. You will read back the depth buffer created in the "geometry" pass to the CPU and use these "eye ray intersection points" as the origins of your shadow ray tracer. You'll pass the shadow results back to the GPU in a screen-space "visibility" buffer that is sampled by the AccumulateLighting rendering pass. You will have to obtain the geometry from the CDXUTSDKMesh's accessor functions (caution that the indices in the index buffer are 16-bit integers). To receive full credit, your entire application (tone mapping + ray tracer) must run at at-least 30 fps. I'll give partial credit for slower-but-correct ray tracers. (30 points).

Submitting Your Homework

Please submit a zip file containing a README.txt file that describes which features you've implemented and documents your sources (books, web pages, etc), prebuilt binaries and your buildable source code (MSVS solution/projects, shaders, including shaders, models, textures, etc.). Note that your binary will likely depend on external shaders, meshes, textures, etc. so make sure the directory structure is correct in your ZIP package such that double-clicking the binary runs correctly. Upload your zip file to the Catalyst DropBox for this course at https://catalyst.uw.edu/collectit/dropbox/summary/alefohn/13901.


I will evaluate your project by running your submitted executable on my "official CSE558 computer" and evaluate each of the features. I will also read the shader and C++ code that you added/modified. The points for each portion of the assignment are listed above in the assignment description. Please test that your executable runs by double-clicking it before submitting.


Erik Reinhard's SIGGRAPH 2002 paper that introduced the algorithm: http://www.cs.utah.edu/~reinhard/cdrom/.

Dean Calver's chapter from the book, ShaderX2, describes a DX9-era GPU implementation of Erik's algorithm: Dean Calver's ShaderX2 chapter

The Cilk user's guide from Intel.

The DX SDK tutorials and samples are a good source of example ComputeShader code.


© 2011 Aaron Lefohn
Department of Computer Science and Engineering | University of Washington