CSE591N: Reconfigurable Systems Group

CSE 303 - Fridays 2:00-3:00

This quarter we will be reading a range of papers on high-level models and languages for designing complex hardware systems. We will steer clear (mostly) from the typical C-to-Hardware compilers and look more at system-level modeling and compilers.

Schedule and Assignments

Date	Leader	Paper
October 8	Carl	Intro and Organization A Model for Programming Large-Scale Configurable Computing Applications local copy - external link
October 15	Nate	Achieving Programming Model Abstractions for Reconfigurable Computing Andrews, D.; Sass, R.; Anderson, E.; Agron, J.; Peck, W.; Stevens, J.; Baijot, F. & Komp, E. local copy - external link
October 22	Andrew	Generating hardware from OpenMP programs Leow, Y.; Ng, C. & Wong, W. local copy - external link
October 29	Robin	Understanding Sources of Inefficiency in General-Purpose Chips Hameed, R.; Qadeer, W.; Wachs, M.; Azizi, O.; Solomatnikov, A.; Lee, B. C.; Richardson, S.; Kozyrakis, C. & Horowitz, M. local copy - external link
November 5	Jimmy	Concurrency and Communication: Lessons from the SHIM Project Software Edwards, S. local copy, local copy - external link
November 12	Corey	ASC: A stream compiler for computing with FPGAs Mencer, O. local copy - external link
November 19	Brandon	MPI as an Abstraction for Software-Hardware Interaction for HPRCs M. Saldana, A. Patel, C. Madill, D. Nunes, Danyao Wang, H. Styles, A. Putnam, R. Wittig, P. Chow. local copy - external link
November 26	Thanksgiving Holiday
December 3	Stephen	Memory - Sequoia: programming the memory hierarchy Fatahalian, K.; Horn, D. R.; Knight, T. J.; Leem, L.; Houston, M.; Park, J. Y.; Erez, M.; Ren, M.; Aiken, A.; Dally, W. J. & Hanrahan, P. local copy - external link
December 10	Maria	HPCS Programming Languages Lusk & Yehlick local copy - external link

*1. Mencer, O. ASC: A stream compiler for computing with FPGAs, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2006.

Abstract: A stream compiler (ASC) for computing with field programmable gate arrays (FPGAs) emerges from the ambition to bridge the hardware-design productivity gap where the number of available transistors grows more rapidly than the productivity of very large scale integration (VLSI) and FPGA computer-aided-design (CAD) tools. ASC addresses this problem with a softwarelike programming interface to hardware design (FPGAs) while keeping the performance of hand-designed circuits at the same time. ASC improves productivity by letting the programmer optimize the implementation on the algorithm level, the architecture level, the arithmetic level, and the gate level, all within the same C++ program. The increased productivity of ASC is applied to the hardware acceleration of a wide range of applications. Traditionally, hardware accelerators are tediously handcrafted to achieve top performance. ASC simplifies design-space exploration of hardware accelerators by transforming the hardware-design task into a software-design process, using only "GNU compiler collection (GCC)" and "make" to obtain a hardware netlist. From experience, the hardware-design productivity and ease of use are close to pure software development. This paper presents results and case studies with optimizations that are: 1) on the gate level-Kasumi and International Data Encryption Algorithm (IDEA) encryptions; 2) on the arithmetic level-redundant addition and multiplication function evaluation for two-dimensional (2-D) rotation; and 3) on the architecture level-Wavelet and Lempel-Ziv (LZ)-like compression.

2. Budiu, M. & Goldstein, S. C. Compiling Application-Specific Hardware, International Conference on Field Programmable Logic and Applications (FPL), 2002.

Abstract: In this paper we describe ASH, an architectural framework for implementing Application-Specific Hardware. ASH is based on automatic hardware synthesis from high-level languages. The generated circuits use only localized computation structures; in consequence, we expect these circuits to be fast, to use little power and to scale well with program complexity. par We present in detail CASH, a scalable compiler framework for ASH, which generates hardware from programs written in C. Our compiler exploits instruction level parallelism by using aggressive speculation and dynamic scheduling. Based on this compilation scheme, we evaluate the computational resources necessary for implementing complex integer-based programs, and we suggest architectural features that would support the ASH framework.

3. Saint-Mleux, X.; Feeley, M. & David, J.-P. SHard: a Scheme to Hardware Compiler, Proceedings of the 2006 Scheme and Functional Languages Workshop, 2006

4. Leow, Y.; Ng, C. & Wong, W. Generating hardware from OpenMP programs, Field Programmable Technology, 2006.

Abstract: Various high level hardware description languages have been invented for the purpose of improving the productivity in the generation of customized hardware. Most of these languages are variants, usually parallel versions, of popular software programming languages. In this paper, we describe our effort to generate hardware from OpenMP, a software parallel programming paradigm that is widely used and tested. We are able to generate FPGA hardware from OpenMP C programs via synthesizable VHDL and Handel-C. We believe that the addition of this medium-grain parallel programming paradigm will bring additional value to the repertoire of hardware description languages .

*5. Edwards, S. Lee, S. & Narasimhan, P.(ed.) Concurrency and Communication: Lessons from the SHIM Project Software, Technologies for Embedded and Ubiquitous Systems, 2009.

Abstract: Describing parallel hardware and software is difficult, especially in an embedded setting. Five years ago, we started the shim project to address this challenge by developing a programming language for hardware/software systems. The resulting language describes asynchronously running processes that has the useful property of scheduling-independence: the i/o of a shim program is not affected by any scheduling choices. This paper presents a history of the shim project with a focus on the key things we have learned along the way.

Vasudevan, N. & Edwards, S. A. Celling SHIM: Compiling deterministic concurrency to a heterogeneous multicore SAC '09: Proceedings of the 2009 ACM symposium on Applied Computing, ACM, 2009, 1626-1631

*6. Gailliard, G.; Balp, H.; Jouvray, C. & Verdier, F. Towards a common HW/SW interface-centric and component-oriented specification and design methodology, Specification, Verification and Design Languages, 2008. FDL 2008.

Abstract: In the scope of initial discussions about a standard OMG IDL-to-VHDL language mapping, we present some requirements and propose a configurable mapping. We demonstrate the advantages of a common component-oriented approach to specify HW and SW interfaces compared to previous object-oriented approaches. Our proposition is based on a family of hardware interfaces enabling to represent various interaction semantics and mapping configurations. Our approach is illustrated through the CORBA component model (CCM).

7. Gruttner, K.; Oppenheimer, F. & Nebel, W. OSSS methodology - system-level design and synthesis of embedded HW/SW systems in C++, Applied Sciences on Biomedical and Communication Technologies, 2008. ISABEL '08.

Abstract: The OSSS methodology defines a seamless design flow for embedded HW/SW systems. It enables the effective use of high-level SystemCTM and C++ features like classes (object-oriented design paradigm), templates and method based communication for the description of SW and HW. Furthermore, it supports the OSCI SystemC Synthesis Subset for low-level HW description and HW IP integration. With Fossy we provide a tool for the automatic transformation of a system description in OSSS to an implementation. In this paper we present a top-down design flow using the OSSS methodology for the implementation of an adaptive video filter. The last step of the proposed design flow has been performed automatically by Fossy. We have targeted a Xilinx FPGA to proof the usability of a physical implementation for future SoC designs.

*8. Andrews, D.; Sass, R.; Anderson, E.; Agron, J.; Peck, W.; Stevens, J.; Baijot, F. & Komp, E. Achieving Programming Model Abstractions for Reconfigurable Computing, IEEE Transactions on Very Large Scale Integration Systems, 2008.

Abstract: This paper introduces hthreads, a unifying programming model for specifying application threads running within a hybrid computer processing unit (CPU)/field-programmable gate-array (FPGA) system. Presently accepted hybrid CPU/FPGA computational models-and access to these computational models via high level languages-focus on programming language extensions to increase accessibility and portability. However, this paper argues that new high-level programming models built on common software abstractions better address these goals. The hthreads system, in general, is unique within the reconfigurable computing community as it includes operating system and middleware layer abstractions that extend across the CPU/FPGA boundary. This enables all platform components to be abstracted into a unified multiprocessor architecture platform. Application programmers can then express their computations using threads specified from a single POSIX threads (pthreads) multithreaded application program and can then compile the threads to either run on the CPU or synthesize them to run within an FPGA. To enable this seamless framework, we have created the hardware thread interface (HWTI) component to provide an abstract, platform-independent compilation target for hardware-resident computations. The HWTI enables the use of standard thread communication and synchronization operations across the software/hardware boundary. Key operating system primitives have been mapped into hardware to provide threads running in both hardware and software uniform access to a set of sub-microsecond, minimal-jitter services. Migrating the operating system into hardware removes the potential bottleneck of routing all system service requests through a central CPU.

9. Lusk & Yehlick. HPCS Programming Languages, 2007

Abstract: An overview of the features of the different parallel programming languages for HPC applications.

*10. Fatahalian, K.; Horn, D. R.; Knight, T. J.; Leem, L.; Houston, M.; Park, J. Y.; Erez, M.; Ren, M.; Aiken, A.; Dally, W. J. & Hanrahan, P. Memory - Sequoia: programming the memory hierarchy, SC, 2006.

Abstract: Sequoia is a programming language that allows the user to explicitly manage memory.

11. Philippsen, M.; Warschko, T.; Tichy, W. & Herter, C. Project Triton: Towards improved programmability of parallel machines, Proceeding of the Twenty-Sixth Hawaii International Conference on System Sciences, 1993.

Abstract: The approach taken in the Triton project is to let a high-level machine-independent parallel programming language drive the design of parallel hardware. This approach permits machine-independent parallel programs to be compiled into efficient machine code. The main results are as follows: (1) The parallel programming language Modula-2* extends Modula-2 with constructs for expressing a wide range of parallel algorithms in a high-level, portable, and readable way. (2) Techniques are used for efficiently translating Modula-2* programs to several modern parallel architectures and deriving recommendations for future parallel machine architectures. (3) Triton/1 is a scalable, mixed-mode SIMD/MIMD parallel computer with a highly efficient communications network. It overcomes several deficiencies of current parallel hardware and adequately supports high-level parallel languages.