SHIDDIBHAVI, SUHAS ASHOK
Empowering FPGAs For Massively Parallel Applications
1 online resource (74 pages) : PDF
University of North Carolina at Charlotte
The availability of OpenCL High-Level Synthesis (OpenCL-HLS) has made FPGAs an attractive platform for power-efficient high-performance execution of massively parallel applications. FPGAs with their customizable data-path, deep pipelining abilities and enhanced power efficiency features are the most viable solutions for programming and integrating them with heterogeneous platforms. At the same time, OpenCL for FPGAs raises many challenges which require in-depth understanding to better utilize their enormous capabilities. While OpenCL has been mainly practiced for GPU devices, research is required to further study the efficiency of OpenCL written codes on FPGAs and develop a framework which can help categorize OpenCL parallelism potentials to the fullest. Aim of this work is to identify, analyze and categorize the semantic differences between the OpenCL parallelism and the execution model on FPGAs. As an end result we propose a generic taxonomy for classifying FPGAs based on available support from the OpenCL-HLS tool-chain. At the same time, new design challenges emerge for massive thread-level parallelism on FPGAs. One major execution bottleneck is the high number of memory stalls exposed to data-path which overshadows the benefits of data-path customization.We introduce a novel approach for hiding the memory stalls on FPGAs when running massively parallel applications. The proposed approach is based on sub-kernel parallelism to decouple the actual computation from memory data access (memory read/write). This approach overlaps the computation of current threads with the memory access of future threads (memory pre-fetching at large scale). At the same time, this work proposes a LLVM-based static analyzer to detect the prefetchable data of OpenCL kernels with the capability to be integrated into commercial OpenCL-HLS tools. This approach leverages the OpenCL pipe semantic to realize the sub-kernel parallelism. The experimental results of Rodinia benchmarks on Intel Stratix-V FPGA demonstrate significant performance and energy improvement over the baseline implementation using Intel OpenCL SDK. The proposed sub-kernel parallelism achieves more than 2x speedup, with only 3\% increase in resource utilization, and 7\% increase in power consumption which reduces the overall energy consumption more than 40\%.To overcome the bottlenecks observed in the commercial OpenCL-HLS tool we propose an integrated tool chain for OpenCL-HLS. The new tool-chain is combination of already existing tool-chains for CPU, GPUs where LLVM acts as an intermediate machine level representation to translate from OpenCL to RTL. This open source tool chain is a proposed future extension of our work and we will be releasing it as an open source tool as a contribution of this thesis.
DECOUPLING/PREFETCHINGFPGAS FOR MASSIVELY PARALLEL APPLICATIONSOPENCL ON FPGAOPENCL PIPE/CHANNELOPENSOURCE OPEN-HLSTXONOMY FOR FPGAS
Sass, RonSaule, Erik
Thesis (M.S.)--University of North Carolina at Charlotte, 2018.
This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s). For additional information, see http://rightsstatements.org/page/InC/1.0/.
Copyright is held by the author unless otherwise indicated.