Publication : USDA ARS

ARS Home » Northeast Area » Beltsville, Maryland (BARC) » Beltsville Agricultural Research Center » Soybean Genomics & Improvement Laboratory » Research » Publications at this Location » Publication #273926

Title: A parallelized binary search tree

Author

	FENG, JIAN - Washington State University
	NAIMAN, DANIEL - Johns Hopkins University
	Cooper, Bret

Submitted to: Journal of Information Technology and Software Engineering
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 11/17/2011
Publication Date: 11/19/2011
Citation: Feng, J., Naiman, D., Cooper, B. 2011. A parallelized binary search tree. Journal of Information Technology and Software Engineering. 1:103.

Interpretive Summary: As the amounts of data grow, scientists are ever more reliant upon computer processing speed to make timely discoveries. One way to cut in half the time it takes a computer to process data is to double the number of its processors, but this can only be achieved through parallel processing, a strategy that ensures that computations are made across the processors evenly. Unfortunately, not all computations can be easily “parallelized,” so maximum data analysis speeds are not always achieved. To improve the speed of one part of data computation, we parallelized a common organizational structure called a binary search tree that allows a piece of information to be held in computer memory and searched later. This achievement allows efficient searching for information using modern computers with multiple, integrated processors. We implemented this parallelized binary search tree in a program called PTTRNFNDR which has allowed us to more quickly analyze DNA and protein sequences from soybeans and other plants for patterns associated with important traits such as plant growth. The algorithm will enable government, academic and private researchers to implement software packages to search large biological data sets or public records.

Technical Abstract: PTTRNFNDR is an unsupervised statistical learning algorithm that detects patterns in DNA sequences, protein sequences, or any natural language texts that can be decomposed into letters of a finite alphabet. PTTRNFNDR performs complex mathematical computations and its processing time increases when input texts become large. To achieve better speed performance, several strategies were applied in the implementation of the program, including parallel operations of binary search trees. A standard binary search tree is not thread-safe due to its dynamic insertion and deletions. Here, we adjusted the standard binary search tree for parallelized operations to achieve improved performance of the PTTRNFNDR algorithm. The method can be applied to other software platforms to quicken data searching through parallel operations of binary search trees when several conditions are met.