Empowering FPGAs For Massively Parallel Applications

SHIDDIBHAVI, SUHAS ASHOK

Empowering FPGAs For Massively Parallel Applications

Search for this publication on Google Scholar

SHIDDIBHAVI, S. U. H. A. S. A. S. H. O. K. (2018). Empowering FPGAs For Massively Parallel Applications. Unc Charlotte Electronic Theses And Dissertations.

Download PDF

Analytics

107 views ◎
83 downloads ⇓

Abstract

The availability of OpenCL High-Level Synthesis (OpenCL-HLS) has made FPGAs an attractive platform for power-efficient high-performance execution of massively parallel applications. FPGAs with their customizable data-path, deep pipelining abilities and enhanced power efficiency features are the most viable solutions for programming and integrating them with heterogeneous platforms. At the same time, OpenCL for FPGAs raises many challenges which require in-depth understanding to better utilize their enormous capabilities. While OpenCL has been mainly practiced for GPU devices, research is required to further study the efficiency of OpenCL written codes on FPGAs and develop a framework which can help categorize OpenCL parallelism potentials to the fullest. Aim of this work is to identify, analyze and categorize the semantic differences between the OpenCL parallelism and the execution model on FPGAs. As an end result we propose a generic taxonomy for classifying FPGAs based on available support from the OpenCL-HLS tool-chain. At the same time, new design challenges emerge for massive thread-level parallelism on FPGAs. One major execution bottleneck is the high number of memory stalls exposed to data-path which overshadows the benefits of data-path customization.We introduce a novel approach for hiding the memory stalls on FPGAs when running massively parallel applications. The proposed approach is based on sub-kernel parallelism to decouple the actual computation from memory data access (memory read/write). This approach overlaps the computation of current threads with the memory access of future threads (memory pre-fetching at large scale). At the same time, this work proposes a LLVM-based static analyzer to detect the prefetchable data of OpenCL kernels with the capability to be integrated into commercial OpenCL-HLS tools. This approach leverages the OpenCL pipe semantic to realize the sub-kernel parallelism. The experimental results of Rodinia benchmarks on Intel Stratix-V FPGA demonstrate significant performance and energy improvement over the baseline implementation using Intel OpenCL SDK. The proposed sub-kernel parallelism achieves more than 2x speedup, with only 3\% increase in resource utilization, and 7\% increase in power consumption which reduces the overall energy consumption more than 40\%.To overcome the bottlenecks observed in the commercial OpenCL-HLS tool we propose an integrated tool chain for OpenCL-HLS. The new tool-chain is combination of already existing tool-chains for CPU, GPUs where LLVM acts as an intermediate machine level representation to translate from OpenCL to RTL. This open source tool chain is a proposed future extension of our work and we will be releasing it as an open source tool as a contribution of this thesis.

Details

Author: SHIDDIBHAVI, SUHAS ASHOK
Title: Empowering FPGAs For Massively Parallel Applications
Physical Description: 1 online resource (74 pages) : PDF
Date: 2018
Degree Granting Institution: University of North Carolina at Charlotte
Abstract: The availability of OpenCL High-Level Synthesis (OpenCL-HLS) has made FPGAs an attractive platform for power-efficient high-performance execution of massively parallel applications. FPGAs with their customizable data-path, deep pipelining abilities and enhanced power efficiency features are the most viable solutions for programming and integrating them with heterogeneous platforms. At the same time, OpenCL for FPGAs raises many challenges which require in-depth understanding to better utilize their enormous capabilities. While OpenCL has been mainly practiced for GPU devices, research is required to further study the efficiency of OpenCL written codes on FPGAs and develop a framework which can help categorize OpenCL parallelism potentials to the fullest. Aim of this work is to identify, analyze and categorize the semantic differences between the OpenCL parallelism and the execution model on FPGAs. As an end result we propose a generic taxonomy for classifying FPGAs based on available support from the OpenCL-HLS tool-chain. At the same time, new design challenges emerge for massive thread-level parallelism on FPGAs. One major execution bottleneck is the high number of memory stalls exposed to data-path which overshadows the benefits of data-path customization.We introduce a novel approach for hiding the memory stalls on FPGAs when running massively parallel applications. The proposed approach is based on sub-kernel parallelism to decouple the actual computation from memory data access (memory read/write). This approach overlaps the computation of current threads with the memory access of future threads (memory pre-fetching at large scale). At the same time, this work proposes a LLVM-based static analyzer to detect the prefetchable data of OpenCL kernels with the capability to be integrated into commercial OpenCL-HLS tools. This approach leverages the OpenCL pipe semantic to realize the sub-kernel parallelism. The experimental results of Rodinia benchmarks on Intel Stratix-V FPGA demonstrate significant performance and energy improvement over the baseline implementation using Intel OpenCL SDK. The proposed sub-kernel parallelism achieves more than 2x speedup, with only 3\% increase in resource utilization, and 7\% increase in power consumption which reduces the overall energy consumption more than 40\%.To overcome the bottlenecks observed in the commercial OpenCL-HLS tool we propose an integrated tool chain for OpenCL-HLS. The new tool-chain is combination of already existing tool-chains for CPU, GPUs where LLVM acts as an intermediate machine level representation to translate from OpenCL to RTL. This open source tool chain is a proposed future extension of our work and we will be releasing it as an open source tool as a contribution of this thesis.
Genre: masters theses
Subjects--Topics: Electrical engineering
Engineering
Computer engineering
Degree: M.S.
Keywords: Decoupling/Prefetching
Fpgas for Massively Parallel Applications
OpenCL on FPGA
OpenCL Pipe/channel
Opensource open-HLS
Taxonomy for FPGAs
Subject Area: Electrical Engineering
Advisor(s): Tabkhi, Hamed
Committee Members: Sass, Ron
Saule, Erik
Degree Note: Thesis (M.S.)--University of North Carolina at Charlotte, 2018.
Rights Statement: This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s). For additional information, see http://rightsstatements.org/page/InC/1.0/.
Rights Holder Information: Copyright is held by the author unless otherwise indicated.
Identifier: SHIDDIBHAVI_uncc_0694N_11703
Permalink: http://hdl.handle.net/20.500.13093/etd:1723

J. Murrey Atkins Library

J. Murrey Atkins Library