Page Banner

United States Department of Agriculture

Agricultural Research Service

Research Project: IMPROVING COMPUTATIONAL MODELING IN SUPPORT OF BETTER EROSION AND SEDIMENT MOVEMENT CONTROL IN AGRICULTURAL WATERSHEDS

Location: Watershed Physical Processes Research Unit

Title: Parallelizing alternating direction implicit solver on GPUs

Authors
item Wei, Z -
item Jang, B. -
item Zhang, Y -
item Jia, Y. -

Submitted to: Procedia Computer Science
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: March 1, 2013
Publication Date: June 1, 2013
Citation: Wei, Z., Jang, B., Zhang, Y., Jia, Y. 2013. Parallelizing alternating direction implicit solver on GPUs. Procedia Computer Science. 18:389-398. Available http://www.journals.elsevier.com/procedia-computer-science/

Interpretive Summary: In this paper, we presented an improved parallel alternating direction implicit (ADI) solver that harnesses modern graphics process units (GPUs) for solving the two-dimensional heat conduction equation. Our improvements include 1) new thread mappings which address the hardware resource constraints, 2) reduced shared memory usage which helps launch more threads, 3) two memory optimization techniques which improve the efficiency of off-chip memory accesses. With these improvements, our proposed parallel ADI solver demonstrates a significant speedup across large computation domain sizes. Our future work includes developing more advanced GPU optimization techniques and applying our improved ADI solver to more complex systems.

Technical Abstract: We present a parallel Alternating Direction Implicit (ADI) solver on GPUs. Our implementation significantly improves existing implementations in two aspects. First, we address the scalability issue of existing Parallel Cyclic Reduction (PCR) implementations by eliminating their hardware resource constraints. As a result, our parallel ADI, which is based on PCR, no longer has the maximum domain size limitation. Second, we optimize inefficient data accesses of parallel ADI solver by leveraging hardware texture memory and matrix transpose techniques. These memory optimizations further make already parallelized ADI solver twice faster, achieving overall more than 100 times speedup over a highly optimized CPU version. We also present the analysis of numerical accuracy of the proposed parallel ADI solver.

Last Modified: 12/18/2014
Footer Content Back to Top of Page