We have observed that many of the data we might represent in a program have a hierarchical structure to them, e.g., employees in a company, governmental structures, or biological systems. These relationships are best represented using trees. However, with sufficient ingenuity, we can envision more abstract kinds of hierarchical relationships that might allow us to obtain different effects with trees that we might not otherwise realize possible.
For example, consider the following tree of numbers:
4
/ \
/ \
/ \
2 6
/ \ / \
/ \ / \
1 3 5 7
What is the relationship between the different “parents” and “children” of this tree?
The relationship is a bit more “abstract,” but a powerful one to recognize.
For any given subtree with a root or parent value v
:
v
.v
.When a tree has this property, we call it a binary search tree or “bst” for short. At first glance, it seems like we are overcomplicating the situation. We could have stored these values in a list, maintaining the sorted order of the values:
[1, 2, 3, 4, 5, 6, 7]
What did we gain by storing the elements in a tree? Imagine trying to find an element in this list. Even though we know that the list is sorted, that information doesn’t help us narrow the search for the element. The nature of a list forces us to scan it left-to-right. In contrast, a binary search tree allows us to take advantage of “sortedness” to make finding elements quicker.
To see this, imagine looking for 5
in our bst:
4
/ \
/ \
/ \
2 6
/ \ / \
/ \ / \
1 3 5 7
How might we do this, starting from the root of the tree, 4
?
4
are in the left subtree and all the elements to the right of 4
are in the right subtree.
Because 5
is greater than 4
, we know that we can search the right subtree rooted at 6
for the element.6
.
5
is less than 6
, so we can refine our search to the subtree rooted at 5
.5
is the element we were looking for, so we can conclude that 5
is indeed in the bst!If we were to look for 5
in our list, we would have needed to check 5
elements: 1
, 2
, 3
, 4
, and 5
.
In contrast, this search only took three steps.
This is because at every step, the choice of which subtree to explore next meant that we didn’t have to explore the other subtree!
It isn’t obvious from our small example, but this results in a significant savings in time, especially as the number of elements in our collection grows!
We’ll dive into the specifics of this time savings when we talk about computational complexity, i.e., how we characterize the performance aspects of an algorithm.
Implementing a binary search tree so that we can reap these benefits is straightforward in Scheme. We’ll use exactly the same binary search tree setup we introduced in our discussion of binary trees. However, we will enforce the binary search tree invariant:
For any subtree with root value
v
contained within a binary search tree, the following properties hold:
- All elements in the left subtree are less than
v
.- All elements in the right subtree are greater than
v
.
First, let’s consider the “payoff” function for bsts, search. In our regular binary tree, we had to recursively search both the left-hand and right-hand children of the tree since we had no guarantees where data was stored. The binary search tree invariant allows us to refine our search accordingly:
To find a value
v
in a binary search treet
:
- If
t
is a leaf, then the tree is empty and thus it doesn’t containv
.- If
t
is a node, then it has a valueu
and a left subtreel
and right subtreer
.
- If
v = u
, thent
definitively containsv
.- If
v < u
, then (recursively) searchl
forv
.- Otherwise
v > u
. (Recursively) searchr
forv
.
This recursive decomposition can be directly translated into code as follows:
(struct leaf ()) (struct node (value left right)) (define singleton (lambda (value) (node value (leaf) (leaf)))) (define example-bst (node 4 (node 2 (singleton 1) (singleton 3)) (node 6 (singleton 5) (singleton 7)))) (define bst-contains (lambda (t v) (match t [(leaf) #f] [(node value left right) (cond [(equal? v value) #t] [(< v value) (bst-contains left v)] [else (bst-contains right v)])]))) (bst-contains example-bst 3) (bst-contains example-bst 8)
In lab, we’ll look at other operations over binary search trees: insertion into a bst and converting a bst to a (sorted) list.
Consider the following binary search tree:
7
/ \
/ \
/ \
4 9
/ \ \
/ \ \
2 6 10
Using the search algorithm for bsts described above, for each of the given target values, determine the sequence of elements that you touch while searching the bst for that target.
Recall that we defined the following binary search tree invariant:
For any subtree with root value
v
contained within a binary search tree, the following properties hold:
- All elements in the left subtree are less than
v
.- All elements in the right subtree are greater than
v
.
This definition assumes that our bsts contain unique elements, i.e., they have no duplicates. If we allowed for duplicates, we might augment our definition to say that:
- All elements in the right subtree are greater than or equal to
v
.
Does our search algorithm still work in this case? In a sentence or two, describe why or why not?