Skip to main content
ARS Home » Southeast Area » Oxford, Mississippi » National Sedimentation Laboratory » Watershed Physical Processes Research » Research » Publications at this Location » Publication #307912

Title: Parallelizing alternating direction implicit solver on GPUs

Author
item WEI, Z - University Of Mississippi
item JANG, B. - University Of Mississippi
item ZHANG, Y - University Of Mississippi
item JIA, Y. - University Of Mississippi

Submitted to: Procedia Computer Science
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 3/1/2013
Publication Date: 6/1/2013
Citation: Wei, Z., Jang, B., Zhang, Y., Jia, Y. 2013. Parallelizing alternating direction implicit solver on GPUs. Procedia Computer Science. 18:389-398. Available http://www.journals.elsevier.com/procedia-computer-science/

Interpretive Summary: In this paper, we presented an improved parallel alternating direction implicit (ADI) solver that harnesses modern graphics process units (GPUs) for solving the two-dimensional heat conduction equation. Our improvements include 1) new thread mappings which address the hardware resource constraints, 2) reduced shared memory usage which helps launch more threads, 3) two memory optimization techniques which improve the efficiency of off-chip memory accesses. With these improvements, our proposed parallel ADI solver demonstrates a significant speedup across large computation domain sizes. Our future work includes developing more advanced GPU optimization techniques and applying our improved ADI solver to more complex systems.

Technical Abstract: We present a parallel Alternating Direction Implicit (ADI) solver on GPUs. Our implementation significantly improves existing implementations in two aspects. First, we address the scalability issue of existing Parallel Cyclic Reduction (PCR) implementations by eliminating their hardware resource constraints. As a result, our parallel ADI, which is based on PCR, no longer has the maximum domain size limitation. Second, we optimize inefficient data accesses of parallel ADI solver by leveraging hardware texture memory and matrix transpose techniques. These memory optimizations further make already parallelized ADI solver twice faster, achieving overall more than 100 times speedup over a highly optimized CPU version. We also present the analysis of numerical accuracy of the proposed parallel ADI solver.