FastWave: A Hardware Architecture for Audio Neural Networks

When Siri, Alexa, Cortana, Google Assistant or your other favorite digital assistant talk to you, they rely on neural networks to create the audio file that speaks to you. WaveNet is a deep neural network for generating audio that provides amazingly accurate results. Yet, this process is slow and cannot be performed in real-time. Our FastWave hardware architecture accelerates this process providing a 10x decrease in the time required to generate the audio file as compared to a state of the art GPU solution. This is the first hardware accelerated platform for autoregressive convolutional neural networks.

FastWave is being presented at the International Conference on Computer-aided Design (ICCAD). ICCAD is one of the top conferences for topics related to hardware design automation. The paper was developed as a project in my CSE 237C class, which teaches hardware design and prototyping using high level synthesis. Shehzeen Hussain, Mojan Javaheripi, and Paarth Neekhara developed the initial idea as a final class project. They continued their work after class and the end result is the paper, FastWave: Accelerating Autoregressive Convolutional Neural Networks on FPGA.

Underwater Localization Research Garners JASA Top Pick

Our research that shows it is possible to determine the position of underwater vehicles using only ambient ocean sounds was selected as a top pick in the signal processing technical area for the Journal of Acoustical Society of America (JASA). Our algorithm provides a position estimate for underwater vehicles using ambient acoustic ocean noise as recorded by a single hydrophone onboard each vehicle.

To test our positioning algorithm, we deployed eight underwater vehicles off the coast of San Diego (shown in the red box in the figure). The vehicles are programmed to keep a depth of 6m, but otherwise drift with the ocean currents. The positions derived using only ambient underwater noise was compared with those calculated using an array of acoustic pingers (shown by green diamonds). While the vehicles were drifting, a boat circled the drifting vehicles twice (once at approximately 11 m/s and once at approximately 4 m/s); the trajectory of the boat is shown with the start and end positions indicated. The right panel shows a close up of the AUE trajectories where the red bounding box matches the box on the left panel. Deployment times for both the boat and AUE trajectories are shown by the colorbar on the right. Our position estimates using only the underwater microphones are comparable to the much more complex, difficult to deploy, nad costly localization infrastructure that uses the five buoys.

Our techniques that enable low-cost positioning of underwater vehicles have been documented before. In our previous work, we have shown how to use snapping shrimp for underwater vehicle localization. Here we show how to use other other naturally occurring ocean sounds to perform localization. All of this work, was lead by Dr. Perry Naughton and written up in his PhD thesis.

Real-time Dense SLAM

Quentin receiving the Best Paper Award

Simultaneous localization and mapping (SLAM) is a common technique for robotic navigation, 3D modeling, and virtual/augmented reality. It employs a compute-intensive process that fuses images from a camera into a virtual 3D space. It does this by looking for common features (edges, corners, and other distinguishing image landmarks) over time and uses those to infer the position of the camera (i.e., localization). At the same time, it creates a virtual 3D map of the environment (i.e., mapping). Since this is a computationally expensive task, real-time applications typically can only create a crude or sparse 3D maps.

To address this problem, we built a custom computing system that can create dense 3D maps in real-time. Our system using an FPGA as a base processor. With careful design, it is possible to use an FPGA that it very efficient. “Careful design” typically equates to a painstaking, time consuming manual process to specify the low level architectural details needed to obtain efficient computation.

Our paper “FPGA Architectures for Real-time Dense SLAM” (Quentin Gautier, Alric Althoff, and Ryan Kastner) describes the results of our system design. This was recently published, presented, and awarded the best paper at the IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP). The paper details the different parts of the InfinitiTAM algorithm, describes the best techniques for accelerating the algorithms, and presents two open-source complete end-to-end system implementations for the SLAM algorithms. The first targets a lower power and portable programmable SoC (Terasic DE1) and the other a more powerful desktop solution (Terasic DE5).

We have found that many SLAM algorithms barely run out of the box. And it is even more challenging to get a hardware accelerated version working. We hope that our open-source repository will be valuable to the broader community and enable them to develop even more powerful solutions.

Links:
Dense SLAM for FPGAs Repository and Technical Paper.

Viva Las Vegas

As part of this year’s Design Automation Conference, I participated on the panel “Architecture, IP, or CAD: What’s Your Pick for SoC Security?”. That’s a bunch of acronyms and buzzwords related to the question of how to build more secure computer chips. DAC is one of the oldest, largest and most prestigious conferences in electronics design. It was also the first big research conference that I attended; I went to DAC in New Orleans in 1999 as an undergraduate (which was an eye opening experience in many regards), so I guess this was my 20th DAC anniversary.

I’ve been doing research in the hardware security space for a while now (more than 15 years!). I’ve seen this community grow from a niche academic community into a major focus at DAC (there were security sessions almost non-stop this year). And it was nice to see more hardware security companies on the floor including the amazing Tortuga Logic (full disclosure: I am a co-founder). Security clearly has become a major research and market push for the semiconductor and EDA industries.

I was the “academic” on this panel with two folks from industry — Eric Peeters from Texas Instruments and Yervant Zorian from Synopsys. Serge Leef from DARPA was the other panelist. Serge just went to DARPA from Mentor Graphics and is looking to spend a lot of our taxpayers money on hardware security. A very wise investment in my totally impartial opinion. I’m guessing that most of the audience was there to hear what Serge had to say and to see if any money fell out of his pockets as he left the room.

The panel started with short (5 min) presentations from each panelist and then there was a lot of time for Q&A from the moderator (the great Swarup Bhunia) and the audience.

My presentation talking points focused on how academics, industry, and government should interact in this space. My answer: industry and government should give lots of funding for academic research (again, I’m totally not biased here…). I also argued that there really isn’t all that much interesting research left in hardware IP security, which I defined as Trojans, PUFs, obfuscation, and locking. Finally, I gave some research areas that are more interesting for research including formalizing threat models and figuring out how to debug hardware security vulnerabilities. Both are no small tasks, and my research group is making strides in both.

During the open discussion there were many other interesting points related to industry’s main interests (root of trust, not Trojans, …), the number of hardware vulnerabilities there are in the wild, metrics, hardware security lifecycle, and so on.

It was a quick visit Vegas (~1 day), but you brought back some good memories, gave me some great food, and didn’t take too much of my money. All and all, a successful trip.

-Ryan

Opportunistically “Crowdsurfing” Oceanographic Data

The SmartFin is a surfboard surfin embedded with a number of sensors that allow it to be opportunistically used to gather oceanographic data. The idea is to “crowdsource” the data from surfers all over the globe. This allows us to create fine-grained spatial and temporal sampling strategies to provide data that will ultimately help us better understand complex near-shore environment.

UCSD CSE undergraduate and Engineers for Exploration leader Jasmine Simmons is leading a team in our Engineers for Exploration program working to make the SmartFin even smarter. She has been working closely with oceanographer Phil Bresnahan to create the next version of the SmartFin. One of the major goals is to add the ability to use the SmartFin as a wave sensor. The goal to extract information about the ocean waves (frequency, amplitude, …) from the data gathered from the SmartFin inertial measurement unit (IMU). This is a challenging problem since the IMU data is noisy and the surfer may not always be in a position to collect good data about ocean waves. They are working on developing digital signal processing algorithms to extract the wave data from the sensors on the SmartFin.

Get SMART: Peter Tueller Awarded Prestigious DoD Scholarship

The Science Mathematics and Research for Transformation (SMART) program is a US Department of Defense (DoD) scholarship aimed at training top talent in science, technology, engineering and mathematics. SMART fellows are paired with a DoD institution where they spend the summers working on research and then transition into those labs after graduation.

As part of the scholarship, Peter will continue his research with the Naval Information Warfare Center (NIWC) Pacific. Peter’s research looks at how to better use autonomous vehicles (drones and underwater vehicles) to create large scale 3D models. He was been doing research with NIWC Pacific — a large San Diego Navy research facility — for the past couple of years. SMART will allow him to continue this collaboration both during his PhD and after.

Peter is not the first SMART student in our lab. Dr. Chris Barngrover was given a SMART scholarship to fund his PhD thesis on developing novel technologies for finding mines in sonar images. Chris also worked with NIWC Pacific (then SPAWAR).

Higher-Order Functions in Hardware Development

Higher-Order Functions are a common and convenient way to encapsulate design patterns in software development. However, they are not readily available in hardware design tools. This is because they rely on memory allocators to implement dynamic lists, polymorphism, and looping. These concepts are not very amenable to hardware synthesis.

Dustin presented a solution to this problem in the paper “Higher-Order Functions for C++ Synthesis” at CODES+ISSS as part of Embedded Systems Week. The project develops a library of higher-order functions for C++ High-Level Synthesis tools. We open-sourced these libraries and we hope that you use them. Check that out on github: github.com/drichmond/hops (hops stands for Higher-Order PatternS and also an ode to one of our favorite beverages.). ESWeek was in Torino, Italy so Dustin also found some free time to visit Mont Blanc (pictured).

Two Papers at Top European FPGA Conference

Once again we had a good showing at 28th International Conference on Field Programmable Logic and Applications (FPL 2018). FPL is the premier European venue for publishing research results in the field of FPGAs and reconfigurable systems. This year, the conference was held in Dublin Ireland. Michael Barrow made the trip to present our two papers.

The first paper was “Everyone’s a Critic: A Tool for Exploring RISC-V Projects”, which describes a tool that provides a way to compare and evaluate the different RISC V architectures with a streamlined suite of tutorials, drivers, and deployment packages on the Pynq development board. The tool is open source at on the github repo: https://github.com/drichmond/RISC-V-On-PYNQ . This project was lead by Dr. Dustin Richmond with contributions by Michael Barrow.

The second paper was “A FPGA Accelerator for Real-Time 3D Non-Rigid Registration Using Tree Reweighted Message Passing and Dynamic Markov Random Field Generation”. This project developed a novel reconfigurable system that performs real-time 3D registration — a fundamental computer vision problem with applications in augmented reality, 3D modeling, and computer vision. The architecture was developed with Stephen Burns from Intel (and the result of the time project lead Michael Barrow internship there). Our system is more energy efficient and higher performing than comparable software or hardware approaches with a minimal reduction in registration accuracy.

Hardware Design Doesn’t Need to be Hard

Hardware design is not easy. It typically involves writing code in low-level languages like Verilog where you must specify how every operation works at every cycle. Modern processors perform billions of operations per second making this is a very difficult task! Yet, hardware design has become increasingly important and more pervasive with the advent of custom accelerators which are used in phones, cars, and in the cloud. We need more hardware designers, but unfortunately, hardware design is hard.

Dr. Dustin Richmond recently defended his PhD thesis that tackled this problem — increasing the accessibility of hardware development to non-hardware engineers through the use of common parallel patterns. As part of this, Dustin developed RIFFA (abstracting communication patterns) [1] and created a framework for synthesizing higher-order functions to hardware (abstracting computational patterns) [2].

As with most PhD students in our research group, Dustin had many side projects to distract him during his PhD career. Dustin played a key role in developing our 3D imaging system for creating 3D scans of Maya archaeological sites. This involved expeditions to Guatemala to scan ancient Maya structures, a run-in with a large black snake, and a publication in “Advances in Archaeological Practice” [3]. Dustin also built the hardware for a high framerate 3D imager in one of our first projects with Cognex [4]. This ultimately helped inform Cognex on how to build this sensor which is now a product. Dustin spent two separate internships at Altera (now Intel) and Xilinx. I’m not entirely sure how he fit all of that into one PhD, but certainly, it is impressive.

While PhD defense is mostly focused on research, it should be noted that Dustin has an equally impressive record with university service and teaching. His efforts to our community have been documented in other posts (CSE award and UCSD Graduate Student Association Awards). As a TA, he took on a major revision of our hardware curriculum in the Wireless Embedded Systems Masters Program. He introduced the Xilinx Pynq platform with a series of labs, lectures, and assignments. For the final project, he organized a hackathon where each group was able to make an impressive project in less than two weeks. We will continue to use this curriculum moving forward in that and other classes.

Dustin will continue on the academic route moving back to the Pacific Northwest to be a postdoctoral scholar with Michael Taylor and Luis Ceze. Look for him on the academic job market in 1-2 years.

-Ryan

How Secure is Your Hardware?

Dr. Alric Althoff successfully defended his PhD thesis “Statistical Metrics of Hardware Security”, which helps answer a fundamental question: How secure is your hardware? This is a difficult task — defining what it means to be secure is something that the computer security field has grappled with for decades.

There has been a bevy of high profile attacks on hardware most famously Spectre and Meltdown. It is no longer a question of is your hardware secure (that is easy to answer — it is not), but rather how do we know whether a mitigation technique or run-time vulnerability detection mechanism is effective? Alric developed a set of metrics aimed at answering this question. These metrics enable you to rank when your design is most vulnerable to a power side channel attack, answer questions about the randomness of your random number generator, and determine how hardware optimizations and design decisions affect the leakage of secure information.

While we are on the topic of metrics and definitions, I do not yet know how to define “data science” (nor do I think that term will be properly defined for some time), but I do know that Alric is an exemplar of a data scientist. He is able to quickly understand a problem and come up with elegant solutions to those problems. Thus, it is not surprising that Alric has a been a tour de force for our research group playing prominent roles in almost all of our projects. One of my mantras for the past several years has been “You really should talk to Alric about this.”. His thesis is impressive, and yet this is only a small subset of his research during his PhD tenure.

Luckily (for us) Alric is not moving far; he took a position at Leidos just across the street from campus. Hopefully, we can continue to leverage his expertise going forward.

Congrats Dr. Althoff, best of luck in the future, and don’t be a stranger!

-Ryan