Demos and Projects Presentation

Demos and Projects Presentation

ReConFig's attendees and exhibitors are encouraged to bring their hardware/software for display at the ReConFig 2018 Demo Night that will be held on December 3 during the Conference Cocktail. There is no exhibition fee.

Attendees are also invited to present a poster to share your latest projects, research results, ideas or any other relevant material you would like to share with other attendees.

Demos and project presentations should not necessarily be related to papers presented at the conference. If you have something interesting to show, we encourage you to participate. 

For demos, please send us the following information:
- The name of your demo
- The name and affiliation of the authors of the demo
- A title and brief description of the demo (250 words max).

For posters, please send us the following information:
- The title of your poster/project/idea etc
- The name and affiliation of the authors/presenters


Confirmed Demos and Project Presentations

DEMO: Integrating the AWS Cloud with Responsive Xilinx Machine Learning at the Edge

Hugo Andrade
Xilinx, USA

In this demo, we show how to integrate FPGA based low-latency edge machine learning with an IIoT enabled distributed control application that leverages the massive scale of AWS Cloud analytics, machine learning model building, application provisioning, and system dashboards.
A Zynq Ultrascale+ MPSoC is used as a PYNQ-enabled edge node to demonstrate the integration of AWS Greengrass and Deephi based edge machine learning, while a remote Zynq-7000 device running Amazon FreeRTOS is used as a Distributed Intelligent I/O node collecting data from a thermocouple, a pressure sensor, and a humidity sensor.

DEMO: AI Figure Recognition Demo

Digilent Design Contest Winner, presented by
Jarrell Sultemeier, Thomas Kappenman

Digilent, USA

This hardware-accelerated artificial neural network is trained to recognize hand-written digits. The image is fed through a MIPI CSI-2 interface to the FPGA, where scaling and image conditioning is performed. The resulting 28x28 capture is then provided to some neurons implemented in programmable logic. The outcome is read out from the 10 nodes in the output layer and shown on the touch screen.

DEMO: MoDesA: Model-Based Design Automation of Hardware/Software Co-Designs for Xilinx Zynq PSoCs

Franz-Josef Streit, Martin Letras, Benjamin Hackenberg, Stefan Wildermann and Jürgen Teich
Friedrich-Alexander-University Erlangen-Nuremberg (FAU), Germany

Shorter design cycles in FPGA-based Programmable System-on-Chips (PSoCs) development require a higher level of design automation, which has led to a wide acceptance of model-driven engineering.
However, design and implementation of applications on such heterogeneous PSoC platforms still demand a comprehensive expertise in hardware/software co-design. To ease the development process, we present MoDesA an automated systematic design flow that couples model-based design and simulation with High-Level Synthesis (HLS) for hybrid hardware and software implementations. The design flow makes use of the modeling and simulation environment MATLAB/Simulink, a de facto standard in model-based development. MoDesA is used for designing, simulating, prototyping, and testing hybrid hardware/software solutions for Xilinx Zynq PSoC architectures. Based on a tagging scheme, a manual partitioning of Simulink blocks is possible. Thereby, the proposed methodology enables control and system engineers to automatically explore different hardware and software implementation variants from a behavioral Simulink model. The tool flow presents itself as a very powerful and user-friendly design tool for improving productivity. In the demonstration, we showcase the generation and system integration of MATLAB/Simulink models in Xilinx Zynq architectures. Finally, we present real-world case studies and synthesis results. 

DEMO: Demonstrating Horizontal Attacks using IHP’s Side Channel Analysis Tool

Ievgen Kabin, Dan Klann, Zoya Dyka, Anton Datsuk and Peter Langendoerfer
IHP – Leibniz-Institut für innovative Mikroelektronik Frankfurt (Oder), Germany

Side Channel Analysis (SCA) attacks exploit the fact that physical effects such as time, power consumption and electromagnetic radiation of cryptographic devices can be measured while performing cryptographic operations. The measured traces can be analysed with the goal to reveal the private key. If the main operation of elliptic curve cryptography kP is implemented in hardware, the processing of the secret k is performed usually bitwise. If the processing of different bit values of k is distinguishable from each other the secret k can be easily revealed by visualizing a trace (simple SCA attack). If this is not the case k can be revealed attacking a single trace by applying statistical analysis methods (horizontal differential SCA attacks).
Using the IHP [1] Side Channel Analysis Tool we demonstrate a simple and several differential horizontal attacks. We attack different hardware accelerators for the Elliptic Curve Digital Signature Algorithm for the NIST elliptic curves B-233 and B-283. We analyse power and electromagnetic traces measured during a signature generation and verification The tool provides the possibility to perform a horizontal attack in the frequency and time domains. Six different types of trace compression are available for the time domain attack as well as a possibility of a noise reduction. Input data and results visualization helps to find sources of information leakage and improve implemented designs.
In future work a development and integration of a thermal SCA is planned as an extension of the Tool.
[1] IHP - Innovations for High Performance Microelectronics,

DEMO: CMOS Annealing Boost

Normann Mertig (1,*), Takashi Takemoto (1,*), Hayashi Masato (2)
(1) Research and Development Group, Center for Exploratory Research, Hokkaido University Laboratory, Hitachi Ltd. Tokyo, Japan
(2) Research & Development Group, Center for Technology Innovation – Electronics, Hitachi Ltd., Tokyo, Japan

The leap of recent annealing processors and their potential to solve NP-hard problems in combinatorial optimization orders of magnitude faster than conventional CPU heuristics, has triggered a wealth of proposals for exploring their computational resource in Machine Learning. A promising candidate among these proposals is the Machine Learning algorithm QBoost – a supervised learning algorithm designed for constructing high quality classifiers. Despite its promising potential, efficient QBoost implementations on quantum annealers remain illusive, due to repeated and time-consuming communication between the host PC and the remote quantum annealer, which is exclusively accessible via cloud services.
Our demo presents the first implementation of QBoost, built around an FPGA-based annealing processor. An integrated co-design comprising the FPGA-based annealing processor and the modules of the hyperparameter search, eliminates excessive communication overhead and accelerates the hyperparameter search from the order of hours (on CPUs and quantum annealers) down to a few seconds. Ultimately, this results in the first QBoost implementation which compares competitively to AdaBoost and gradient tree boosting with respect to both accuracy and runtime.
Demo experience:
* Experience speed of real time FPGA-based hyperparameter search
* Experience improved hardware embedding
* Experience FPGA-accelerated QBoost performance on MNIST data

DEMO: MDC + ARTICo3: a Multi-Grain Reconfiguration Approach for CPS

Tiziana Fanni (1), Alfonso Rodríguez (2,*), Carlo Sau (1), Leonardo Suriano (2), Francesca Palumbo (3), Luigi Raffo (1), and Eduardo de la Torre (2,*)
(1) DIEE, Università degli Studi di Cagliari
(2) Centro de Electrónica Industrial, Universidad Politécnica de Madrid
(3) IDEA Lab, Università degli Studi di Sassari

In this demo, a run-time adaptive image processing application running on a Zynq-7000 device is presented. The hardware-based system in the Programmable Logic relies on ARTICo3, a processing architecture that uses Dynamic and Partial Reconfiguration (DPR) to perform run-time tradeoffs between computing performance, energy consumption and fault tolerance. Moreover, its execution model uses a dynamic multi-accelerator approach to provide SIMD-like and/or redundant (i.e., DMR and TMR) execution on demand. Each of the hardware accelerators connected to the ARTICo3 infrastructure corresponds to an image processing kernel and has been implemented using MDC. MDC is an application-to-hardware design framework that can generate Coarse-Grain Reconfigurable (CGR) accelerators from multiple high-level dataflow specifications that are first combined in a single network, and then implemented as a resource-efficient processing element.
The combination of MDC and ARTICo3 renders a multi-grain reconfigurable solution, where time-multiplexing of resources can be achieved through two different hardware reconfiguration techniques: on the one hand, through DPR, which is time consuming but enables highly flexible changes in the computing substrate; on the other hand, through CGR, which is fast but able to execute only a limited set of functionalities. This feature is exploited to achieve both functional (e.g., switching between Sobel and Roberts operators in edge detection applications) and non-functional (e.g., increasing performance or fault tolerance by loading more copies of a given accelerator at run time) adaptation, a basic requirement in CPS deployments. A GUI is also provided to show run-time monitoring measurements of application execution time and hardware reconfiguration overheads.

DEMO: The BRISC-V Toolbox

Alan Ehret, Donato Kava, Sahan Bandara, Mihailo Isakov, Michel A. Kinsy
Adaptive and Secure Computing Systems (ASCS) Laboratory, Department of Electrical and Computer Engineering, Boston University

The BRISC-V toolbox is the Boston University RISC-V architecture design exploration suite. The toolbox is built around the Boston University RISC-V Processor Set (BRISC-V), a parameterized set of modules for design space exploration using RISC-V ISA based architectures. Included with the BRISC-V Tool box are (i) numerous RISC-V cores with different levels of complexity (e.g., single-cycle, pipelined, and out of order cores), (ii) a programmable memory system with reconfigurable multi-level cache subsystems, (iii) the BRISC-V explorer, a GUI based program to configure architecture parameters and (iv) the BRISC-V emulator for software RISC-V instruction emulation. The toolbox provides an easy to use, open-source, parameterized, fully synthesizable, platform for students and researchers experimenting with the RISC-V ISA features to quickly bring up a complete and functional architecture ready to support customization and modification.

Project Presentation: Low Power Image Processing Applications on FPGAs using Dynamic Voltage Scaling and Partial Reconfiguration

Ariel Podlubne, Julian Haase, Lester Kalms, Gökhan Akgün, Muhammad Ali, Habib ul hasan Khan, Ahmed Kamal and Diana Göhringer
Technische Universität Dresden, Germany

The TULIPP project aims to facilitate the development of embedded image processing systems with real-time and low-power constraints. In this work, several adaptive dynamic run-time techniques for reconfigurable SoCs are described. These methods are used for low power image processing applications on high-performance embedded platforms. Dynamic Voltage Scaling (DVS) and Dynamic Partial Reconfiguration (DPR) target the low-power requirements of the embedded systems while debugging supports the fast development on the hardware side of the system. The proposed techniques were tested and verified using an own developed custom SDSoC image processing library.

DEMO: Gemini LRU board liquid cooling Heat Sink visualization

Paula Fusiara (1,*), Gijs Schoonderbeek (1), Menno Schuil (2)
(1) ASTRON Netherlands Institute for Radio Astronomy, The Netherlands
(2) NOVA Optical Infra Red Instrumentation, The Netherlands

During this demo you will get a hands on experience of how the first hardware prototype of the Gemini LRU board Liquid Cooling Heat Sink looks like and how it performs. The hardware will be accompanied by a flow visualization video and Computer Assisted Manufacturing time-laps of the heat sink fabrication as performed by a 5 axis simultaneous milling machine.

DEMO: A zero-latency video transmission system based on a ZynqBerry

Mario Ruiz (*), Tobías Alonso, Gustavo Sutter, Jorge López de Vergara
* Presenter
Madrid Autonomous University, Spain

In this demo, we present an end-to-end video transmission system, using the low-end ZynqBerry board. In the programmable logic we have developed an efficient hardware implementation of a video encoder optimized for ultra low-latency, using the Logarithmic Hop Encoding (LHE) algorithm, which works in the time domain, meaning that no domain transformation is necessary. The frame is split in blocks: scanlines or rectangles. First, the RGB pixels are mapped to YUV color space. Then, the LHE quantizer generates the hops from the YUV pixel stream. After that, the Entropy encoder maps hops into variable length codes and aligns them in a configurable width output word. Those blocks are moved to main memory using a data mover. Then, the processing system packetizes the encoded video blocks and transmits them through a network interface using a small C library written to provide the means to interface the compressed stream and configure the encoder. Hardware and software are synchronised using a set of registers. Finally, a software decoder was developed to visualize the captured video. The hardware design provides the following features: (i) A maximum marginal output latency of 23 clock cycles, (ii) small area requirements, (iii) proven rate up to 100 Millions of pixels per second in a low-end FPGA, (iv) on-the-fly configuration, (v) scalable architecture. The hardware design takes advantages of all SoC capabilities, making possible to transmit FHD HDR vídeo. The demo will show this zero-latency video transmission system, useful for delay-sensitive applications, such as drone driving.

DEMO: Hw/Sw-Toolflow for Generation and Configuration of a CGRA-Architecture

Florian Fricke (1,*), André Werner (1), Keyvan Shahin(1), Michael Hübner (2)
* Presenter
(1) Ruhr-University Bochum, Germany
(2) Brandenburg University of Technology Cottbus-Senftenberg, Germany

The demo about our VCGRA Toolflow shows an architecture generation tool, which is able to create the VHDL description of a Coarse-Grained Reconfigurable Array, following the specification given by the user. Furthermore, we present a whole toolchain which targets the CGRA-Architecture and is able to create the configuration for the CGRA based on the specification of an algorithm in a high-level programming language (C/C++) or a Data-Flow-Graph (DFG). The toolchain includes tools to convert the algorithm into a DFG, partition the DFG to fit onto the architecture if necessary, map the DFG or its partitions onto the CGRA and to create the bitstreams to apply the configuration to the CGRA at runtime. All tools are integrated and controllable using a common command-line based user interface. The automation level is high, so that only a minimum of user interaction is necessary to bring a whole system (hardware and configuration(s)) to an FPGA Evaluation board for testing. In the demo we would show the whole toolchain and present the individual tools on an example from the image-processing domain.