Hardware Acceleration

Summary: Application developers are increasingly looking towards hardware acceleration to realize high performance and low energy systems. Field programmable gate arrays (FPGAs) and general purpose graphics processor units (GPGPUs) are seeing widespread usage in data centers, AI, automotive, vision, security, and networking. Hardware acceleration provides orders of magnitude performance and power advantages. Yet, they remain challenging to program; designers often still use Verilog to program FPGAs! Our research has a strong focus on system building and the goal of democratizing hardware design to allow more programmers to take advantage of custom computing machines.

As with all of our research, we take an application driven approach where we build prototype systems for a variety of industrial companies, government agencies, and other organizations. We have built custom systems for NOAA (fishery management), US Dept. of Agriculture (ecological monitoring), National Geographic (remote sensing), San Diego Zoo (ecological monitoring), Cognex (machine vision), Lumedyne (MEMS sensors), Cytovale (cell sorting), and Bionano Genomics (DNA sequencing). We have received grants or gifts from all of these companies and agencies for our research and development. In the following we describe some of the major research vectors that our group studies related to hardware acceleration.

Parallel Programming for FPGAs (http://hls.ucsd.edu): Programming hardware accelerators is difficult. Traditionally, this requires writing code in Verilog, managing complex tool chains, and building up some of the infrastructure from scratch. Clearly we cannot expect widespread usage of these powerful architectures without providing a better development tools. High level synthesis tools raise the abstraction level — allowing designers to program in languages like C and OpenCL instead of register transfer level hardware languages. Yet, designers still need to understand core hardware design concepts like parallelism, data partitioning, pipelining, memory management, and interfaces. I developed a complete curriculum and open-source book [B4] around high level synthesis. We released the book as an experiment — a living book of sorts. We will continually update the materials, labs, and text. Further, we are taking contributions from the broader community.  To date, these have mostly been in the form of correctly typos, but we are encouraging the adopters of the book to contribute full chapters.

RIFFA (http://riffa.ucsd.edu): RIFFA is a simple framework for communicating between software running on a CPU and an FPGA over PCIe [J43, C98, C111]. It was born out of immense frustration of the lack of simple support for developing full-stack applications using FPGAs for hardware acceleration. This open source framework was funded by Xilinx and Altera to enable the design of hardware accelerators on FPGAs. RIFFA has seen substantial use by the community usage and upkeep. As recognition of this, RIFFA was given the “FPL Community Award”, which recognizes “significant contribution to the community by providing some material or knowledge in an open format that benefits the rest of the community” (see blog post for more details)

Linear Systems and Polynomials: Linear systems and polynomials are found in a wide range of applications including computer graphics, wireless communication, compression, and cryptographic operations. Thus, designers often are challenged with implementing these in a high performance and efficient manner. We developed of design techniques aimed at developing efficient implementations of linear systems and polynomials in hardware and software [J15, J18, J22, C25, C27, C28, C29, C30, C45, C52]. This was a close collaboration with Farzan Fallah at Fujitsu Laboratories of America and formed the basis of Anup Hosangadi’s PhD thesis [T4]. It was the subject of two preliminary patents; one of those was granted and licensed by Fujitsu. The book titled “Arithmetic Optimization Techniques for Hardware and Software Design” [B2] provides the most complete description of this research.

Ant Colony Optimization (ACO): ACO is a meta-heuristic to solve problems by applying the same logic used by ants to find the shortest path to a food source. ACO is a relatively new meta-heuristic (at least compared to its counterparts like simulated annealing and genetic algorithms) and shown to be very effective at solving general optimization problems. We were the first to apply this towards operation scheduling, resource allocation, and binding — the core algorithms for high level synthesis [J13, J16, J17, C23, C26, C33, C48]. These techniques were quite efficient at handling a wide range of problems that we threw at them. They solved both the resource constrained and latency constrained problems better than commonly used algorithms. We extended it to handle complex scheduling constraints, pipelining, and conditional execution. During his PhD, Wenrui Gong worked at Mentor Graphics on the Catapult C HLS team. He integrated our ideas into their commercial HLS framework and tested the results on a large number of industrial applications. Our techniques outperformed their algorithms; see Wenrui Gong’s PhD thesis for details [T7]. Ultimately, they didn’t use these techniques for non-technical reasons. Gang Wang initially came up with the idea of exploring ACO for HLS and this formed the core of his PhD thesis [T6].

FPGA Architecture: “In the beginning”, FPGA architectures consisted of mostly homogenous arrays of programmable logic blocks and bit level interconnect. Today, FPGA are an extremely heterogeneous mix of data paths, memories, and I/O intermixed with programmable logic. My doctoral work [B1] largely focused on the benefits of hardening FPGAs, i.e., adding custom data paths, memories, and I/O, which makes the FPGAs more efficient but simultaneously makes them less flexible. Thus, there is a fundamental tradeoff between customization and programmability that I explored during my PhD research and early academic career [B1, B2, J6, J14, C15, C18, C20, C32].

Community Service: I am extremely active in the FPGA and hardware acceleration technical community. As General Chair, I am leading the organization of the 2019 IEEE International Symposium On Field-Programmable Custom Computing Machines (https://www.fccm.org/). I was the the Technical Program Committee Chair for IEEE International Conference on Application-specific Systems, Architectures and Processors (http://asapconference.org/2017.html). And I have served on the technical program committees for many of the major conferences including the International Symposium on Field Programmable Gate Arrays, International Conference on Application-specific Systems, Architectures and Processors, International Conference on Field Programmable Logic and Applications (FPL), and IEEE International Symposium on Field-Programmable Custom Computing Machines.