Author
WEI, Z - University Of Mississippi | |
JANG, B. - University Of Mississippi | |
ZHANG, Y - University Of Mississippi | |
JIA, Y. - University Of Mississippi |
Submitted to: Procedia Computer Science
Publication Type: Peer Reviewed Journal Publication Acceptance Date: 3/1/2013 Publication Date: 6/1/2013 Citation: Wei, Z., Jang, B., Zhang, Y., Jia, Y. 2013. Parallelizing alternating direction implicit solver on GPUs. Procedia Computer Science. 18:389-398. Available http://www.journals.elsevier.com/procedia-computer-science/ Interpretive Summary: In this paper, we presented an improved parallel alternating direction implicit (ADI) solver that harnesses modern graphics process units (GPUs) for solving the two-dimensional heat conduction equation. Our improvements include 1) new thread mappings which address the hardware resource constraints, 2) reduced shared memory usage which helps launch more threads, 3) two memory optimization techniques which improve the efficiency of off-chip memory accesses. With these improvements, our proposed parallel ADI solver demonstrates a significant speedup across large computation domain sizes. Our future work includes developing more advanced GPU optimization techniques and applying our improved ADI solver to more complex systems. Technical Abstract: We present a parallel Alternating Direction Implicit (ADI) solver on GPUs. Our implementation significantly improves existing implementations in two aspects. First, we address the scalability issue of existing Parallel Cyclic Reduction (PCR) implementations by eliminating their hardware resource constraints. As a result, our parallel ADI, which is based on PCR, no longer has the maximum domain size limitation. Second, we optimize inefficient data accesses of parallel ADI solver by leveraging hardware texture memory and matrix transpose techniques. These memory optimizations further make already parallelized ADI solver twice faster, achieving overall more than 100 times speedup over a highly optimized CPU version. We also present the analysis of numerical accuracy of the proposed parallel ADI solver. |