Date | Leader | Paper |
October 8 | Carl | Intro and Organization A Model for Programming Large-Scale Configurable Computing Applications local copy - external link |
October 15 | Nate | Achieving Programming Model Abstractions for Reconfigurable Computing Andrews, D.; Sass, R.; Anderson, E.; Agron, J.; Peck, W.; Stevens, J.; Baijot, F. & Komp, E. local copy - external link |
October 22 | Andrew | Generating hardware from OpenMP
programs Leow, Y.; Ng, C. & Wong, W. local copy - external link |
October 29 | Robin | Understanding Sources of Inefficiency in General-Purpose Chips Hameed, R.; Qadeer, W.; Wachs, M.; Azizi, O.; Solomatnikov, A.; Lee, B. C.; Richardson, S.; Kozyrakis, C. & Horowitz, M. local copy - external link |
November 5 | Jimmy | Concurrency and
Communication: Lessons from the SHIM Project Software Edwards, S. local copy, local copy - external link |
November 12 | Corey | ASC: A stream compiler for computing with FPGAs Mencer, O. local copy - external link |
November 19 | Brandon | MPI as an Abstraction for Software-Hardware Interaction for HPRCs M. Saldana, A. Patel, C. Madill, D. Nunes, Danyao Wang, H. Styles, A. Putnam, R. Wittig, P. Chow. local copy - external link |
November 26 | Thanksgiving Holiday | |
December 3 | Stephen | Memory - Sequoia: programming the memory hierarchy Fatahalian, K.; Horn, D. R.; Knight, T. J.; Leem, L.; Houston, M.; Park, J. Y.; Erez, M.; Ren, M.; Aiken, A.; Dally, W. J. & Hanrahan, P. local copy - external link |
December 10 | Maria | HPCS Programming Languages Lusk & Yehlick local copy - external link |
Abstract: A stream compiler (ASC) for computing with field programmable gate arrays (FPGAs) emerges from the ambition to bridge the hardware-design productivity gap where the number of available transistors grows more rapidly than the productivity of very large scale integration (VLSI) and FPGA computer-aided-design (CAD) tools. ASC addresses this problem with a softwarelike programming interface to hardware design (FPGAs) while keeping the performance of hand-designed circuits at the same time. ASC improves productivity by letting the programmer optimize the implementation on the algorithm level, the architecture level, the arithmetic level, and the gate level, all within the same C++ program. The increased productivity of ASC is applied to the hardware acceleration of a wide range of applications. Traditionally, hardware accelerators are tediously handcrafted to achieve top performance. ASC simplifies design-space exploration of hardware accelerators by transforming the hardware-design task into a software-design process, using only "GNU compiler collection (GCC)" and "make" to obtain a hardware netlist. From experience, the hardware-design productivity and ease of use are close to pure software development. This paper presents results and case studies with optimizations that are: 1) on the gate level-Kasumi and International Data Encryption Algorithm (IDEA) encryptions; 2) on the arithmetic level-redundant addition and multiplication function evaluation for two-dimensional (2-D) rotation; and 3) on the architecture level-Wavelet and Lempel-Ziv (LZ)-like compression.
Abstract: In this paper we describe ASH, an architectural framework for implementing Application-Specific Hardware. ASH is based on automatic hardware synthesis from high-level languages. The generated circuits use only localized computation structures; in consequence, we expect these circuits to be fast, to use little power and to scale well with program complexity. par We present in detail CASH, a scalable compiler framework for ASH, which generates hardware from programs written in C. Our compiler exploits instruction level parallelism by using aggressive speculation and dynamic scheduling. Based on this compilation scheme, we evaluate the computational resources necessary for implementing complex integer-based programs, and we suggest architectural features that would support the ASH framework.
Abstract: Various high level hardware description languages have been invented for the purpose of improving the productivity in the generation of customized hardware. Most of these languages are variants, usually parallel versions, of popular software programming languages. In this paper, we describe our effort to generate hardware from OpenMP, a software parallel programming paradigm that is widely used and tested. We are able to generate FPGA hardware from OpenMP C programs via synthesizable VHDL and Handel-C. We believe that the addition of this medium-grain parallel programming paradigm will bring additional value to the repertoire of hardware description languages .
Abstract: Describing parallel hardware and software is difficult, especially in an embedded setting. Five years ago, we started the shim project to address this challenge by developing a programming language for hardware/software systems. The resulting language describes asynchronously running processes that has the useful property of scheduling-independence: the i/o of a shim program is not affected by any scheduling choices. This paper presents a history of the shim project with a focus on the key things we have learned along the way.
Vasudevan, N. & Edwards, S. A. Celling SHIM: Compiling deterministic concurrency to a heterogeneous multicore SAC '09: Proceedings of the 2009 ACM symposium on Applied Computing, ACM, 2009, 1626-1631
Abstract: In the scope of initial discussions about a standard OMG IDL-to-VHDL language mapping, we present some requirements and propose a configurable mapping. We demonstrate the advantages of a common component-oriented approach to specify HW and SW interfaces compared to previous object-oriented approaches. Our proposition is based on a family of hardware interfaces enabling to represent various interaction semantics and mapping configurations. Our approach is illustrated through the CORBA component model (CCM).
Abstract: The OSSS methodology defines a seamless design flow for embedded HW/SW systems. It enables the effective use of high-level SystemCTM and C++ features like classes (object-oriented design paradigm), templates and method based communication for the description of SW and HW. Furthermore, it supports the OSCI SystemC Synthesis Subset for low-level HW description and HW IP integration. With Fossy we provide a tool for the automatic transformation of a system description in OSSS to an implementation. In this paper we present a top-down design flow using the OSSS methodology for the implementation of an adaptive video filter. The last step of the proposed design flow has been performed automatically by Fossy. We have targeted a Xilinx FPGA to proof the usability of a physical implementation for future SoC designs.
Abstract: This paper introduces hthreads, a unifying programming model for specifying application threads running within a hybrid computer processing unit (CPU)/field-programmable gate-array (FPGA) system. Presently accepted hybrid CPU/FPGA computational models-and access to these computational models via high level languages-focus on programming language extensions to increase accessibility and portability. However, this paper argues that new high-level programming models built on common software abstractions better address these goals. The hthreads system, in general, is unique within the reconfigurable computing community as it includes operating system and middleware layer abstractions that extend across the CPU/FPGA boundary. This enables all platform components to be abstracted into a unified multiprocessor architecture platform. Application programmers can then express their computations using threads specified from a single POSIX threads (pthreads) multithreaded application program and can then compile the threads to either run on the CPU or synthesize them to run within an FPGA. To enable this seamless framework, we have created the hardware thread interface (HWTI) component to provide an abstract, platform-independent compilation target for hardware-resident computations. The HWTI enables the use of standard thread communication and synchronization operations across the software/hardware boundary. Key operating system primitives have been mapped into hardware to provide threads running in both hardware and software uniform access to a set of sub-microsecond, minimal-jitter services. Migrating the operating system into hardware removes the potential bottleneck of routing all system service requests through a central CPU.
Abstract: An overview of the features of the different parallel programming languages for HPC applications.
Abstract: Sequoia is a programming language that allows the user to explicitly manage memory.
Abstract: The approach taken in the Triton project is to let a high-level machine-independent parallel programming language drive the design of parallel hardware. This approach permits machine-independent parallel programs to be compiled into efficient machine code. The main results are as follows: (1) The parallel programming language Modula-2* extends Modula-2 with constructs for expressing a wide range of parallel algorithms in a high-level, portable, and readable way. (2) Techniques are used for efficiently translating Modula-2* programs to several modern parallel architectures and deriving recommendations for future parallel machine architectures. (3) Triton/1 is a scalable, mixed-mode SIMD/MIMD parallel computer with a highly efficient communications network. It overcomes several deficiencies of current parallel hardware and adequately supports high-level parallel languages.