The "big three" list operations

Summary
We continue our exploration Scheme’s list data type, focusing on “the big three” list processing operations: map, filter, and reduce.

We’ve started to see some significant power in using two “higher order” list operations, map and apply. These are called “higher order” procedures because they take procedures as inputs.

map is particularly useful for a variety of reasons. First, it lets us do a form of repetition: We use map to repeatedly apply a procedure to different values (the elements of the list or lists we map over). But it’s more than that. Experience suggests that using map leads to a different way of thinking about repetition than some of the other mechanisms, such as “for loops”. (Don’t worry if you’ve never heard of for loops. You’ll learn about them in another CS class.) As importantly, map permits some cool implementations. Since map does not specify the order in which the elements are processed, one can implement map so that all (or at least many) of the applications can be done at the same time. (Computer scientists say “in parallel”.)

As we start to reach the limits on the speed of one processing unit, the way we make large computations faster is to parallelize them, to make them run on multiple processing units. While there are explicit ways to break up a program, by relying on procedures like map, we can trust the underlying system to parallelize sensibly, without worrying about particular details. In fact, map is so important that it’s a core part of the parallelization technologies that both Google and Amazon use.

As you might expect, map alone does not suffice. As you’ve already seen, it helps to be able to combine the elements of a list into a single value. We’ve used apply for that, but, as we shall soon see, it does not suffice for all cases. In addition, we may find that not all of the values in a list are worth considering. So let’s consider some other useful procedures.

Combining the elements in a list

When we began our exploration of numbers, we used a variety of unary (one parameter) procedures, such as those above. But we also used some binary (two parameter) operations, such as addition or multiplication. Can we also use those with lists? It seems like we’d want to. For example, if we wanted to compute mean value in a collection of numbers, we want to add up all of the elements in the collection and then divide by the length of the collection.

We’ve seen one way to do so. We can use apply. For example, we can sum the elements in a list with (apply + lst). But what happens if we don’t have an “n-ary” procedure (one that takes arbitrarily many inputs). Consider, for example, the problem of reversing each word in a string consisting of a sequence of words separated by spaces.

The first two steps seem relatively straightforward. We use string-split to break the string into words. We use string-reverse (or our anonymous version of string-reverse) to reverse each word.

(map (lambda (s)
       (list->string (reverse (string->list s))))
     (string-split "this is a sample set of words" " "))

How do we get them back together? We could try string-append.

(define reversed-words
  (map (lambda (s)
       (list->string (reverse (string->list s))))
     (string-split "this is a sample set of words" " ")))

(apply string-append reversed-words)

Whoops! We’ve lost the spaces. What to do? What to do?

Fortunately, there’s a standard approach that involves a different kind of decomposition. Rather than thinking about putting everything together all at once, we write a procedure that combines neighboring elements and then repeatedly apply it to neighboring pairs until we’re down to a single element. There are two common terms for that work, “folding” and “reducing”. We’ll use the latter.

First, let’s build our helper procedure. While it will eventually be an anonymous procedure, we’ll start by naming it for convenience.

(define combine-with-space
  (lambda (s1 s2)
    (string-append s1 " " s2)))

(combine-with-space "a" "b")
(combine-with-space "Hello" "Goodbye")
 

Now, we can use the reduce procedure with combine-with-space. (reduce FUN LST), converts LST to a single value by repeatedly applying FUN to neighboring pairs of values, replacing the pair with the result of the function.

(define combine-with-space
  (lambda (s1 s2)
    (string-append s1 " " s2)))

(reduce combine-with-space (list "a" "b" "c" "d"))

How does that work? Well, we said that reduce repeatedly applies the function to neighboring pairs of values. Let’s consider what happens.

  • We start with four values to combine: "a", "b", "c", and "d".
  • We combine "a" and "b" with a space, yielding "a b". We now have three values to combine: "a b", "c", and "d".
  • We combine the "a b" and the "c" with a space in between, yielding "a b c". We now have two values to combine. "a b c" and "d".
  • We combine the "a b c" and the "d" with a space in between, yielding "a b c d". We are now down to one value.

Of course, we could also combine the values in other ways.

  • We start with four values to combine: "a", "b", "c", and "d".
  • We combine "c" and "d" with a space, yielding "c d". We now have three values to combine: "a", "b", and "c d".
  • We combine "a" and "b" with a space, yielding "a b". We now have two values to combine: "a b" and "c d".
  • We combine the “a b” and the “c d” with a space, yielding "a b c d". We are now down to one value.

Fortunately, we end up with the same value either way.

So we can now go back to our original problem: Creating a new string with reversed versions of all the original words.

(define combine-with-space
  (lambda (s1 s2)
    (string-append s1 " " s2)))

(define reversed-words
  (map (lambda (s)
       (list->string (reverse (string->list s))))
     (string-split "this is a sample set of words" " ")))

(reduce combine-with-space reversed-words)

Note that we expect these auxiliary function definitions, combine-with-space and reversed-words, to be local to this computation and not particularly useful outside of this computation. We, therefore, could inline our definitions to create one large expression to avoid binding excessive names.

(reduce
  (lambda (s1 s2) (string-append s1 " " s2))
  (map (lambda (s)
         (list->string (reverse (string->list s))))
       (string-split "this is a sample set of words" " ")))

Like map, reduce provides some advantages to the computer scientist, programmer, or software engineer. First, it encourages you to think in terms of decomposition. Rather than dealing with the whole list at once, you simply think of what to do with neighboring pairs and then rely on reduce to do the heavy lifting. And, once again, we can gain some efficiencies. If it doesn’t matter which order we do the operations, we can do some of them simultaneously and otherwise find orderings that are a bit more efficient. (Yes, it turns out that there are orderings that are more efficient.) And string-append (or, in our case, combine-with-space) is not the only operation in which the order of operations doesn’t matter. The same holds (more or less) for addition and multiplication. For those of you with a mathematical mindset, reduce works well with any associative binary operation.

We can, of course, use reduce in many other ways. To find the largest value in the list, we reduce with max.

(reduce max (list 3 1 5 10 3 2))
(reduce min (list 3 1 5 10 3 2))

The filter procedure

There’s one more “big” higher-order list-processing functional procedure, (filter pred? lst). filter takes two parameters, a unary (one-parameter) predicate and a list of values, and selects all the values for which the predicate holds.

(define stuff (list -5 10 18 23 14.0 87 (/ 1 2) 0.5 -12.2))
(filter negative? stuff)
(filter integer? stuff)
(filter (lambda (n) (and (<= 0 n) (<= n 10))) stuff)

That seems pretty powerful, doesn’t it? Believe it or not, but by the end of this course, you’ll be able to write filter yourself. (You’ll also be able to write map and reduce, as well as other higher-order list procedures you design yourself.)

And, as in other cases we’ve seen, combining filter with other procedures can let us do new, clever things. For example, suppose we want to add up the digits in a string containing both digits and non-digits.

We know that we can convert a string to a list of characters.

(string->list "a 1 and a 2 and a 3")

We can convert a digit character to the digit by getting its collating sequence number and subtracting the collating sequence number of zero.

(map (lambda (c)
       (- (char->integer c) (char->integer #\0)))
     (list #\0 #\1 #\2 #\3 #\4 #\5 #\6 #\7 #\8 #\9))

We can extract all the digits from the list of characters with filter.

(filter char-numeric? (string->list "a 1 and a 2 and a 3"))

(map (lambda (c)
       (- (char->integer c) (char->integer #\0)))
     (filter char-numeric? (string->list "a 1 and a 2 and a 3")))

And then we add them all up.

(reduce +
  (map (lambda (c)
         (- (char->integer c) (char->integer #\0)))
       (filter char-numeric? (string->list "a 1 and a 2 and a 3"))))

You’ll see this combination of “the big three” fairly frequently. We filter, we map, then we reduce. Together, they bring great power (and the accompanying great responsibility).

Using map with multiple lists

We’ve seen one way to use binary procedures with lists: We can reduce a list of values to a single value by repeatedly combining pairs of values with a function. But there’s another. Just as we can use map to create a new list of values by applying a unary procedure to each element of a list, we can also use a more generalized version of map that grabs values from multiple lists and combines them into values in a new list. In particular, map can also build a new list by applying the procedure to the corresponding elements of all the lists. For example,

(define increment
  (lambda (n)
    (+ n 1)))

(map * (list 1 2 3) (list 4 5 6))

(map + (list 1 2) (list 3 4) (list 5 6))

(map list (range 10) (map increment (range 10)) (map square (range 10)))

(define first-names (list "Addison" "Bailey" "Casey" "Devon" "Emerson"))

(define last-names (list "Smith" "Jones" "Smyth" "Johnson" "Doe"))

(map (lambda (first last) (string-append first " " last)) first-names last-names)

(map (lambda (last first) (string-append last ", " first)) last-names first-names)

You may be starting to see some interesting possibilities. If you are not, stay tuned.

Self checks

Check 1: Pipelining (‡)

In the last example, we used map to map two lists of names into a single list of names of the form "last, first".

(define first-names (list "Addison" "Bailey" "Casey" "Devon" "Emerson"))

(define last-names (list "Smith" "Jones" "Smyth" "Johnson" "Doe"))

(map (lambda (last first) (string-length (string-append last ", " first))) last-names first-names)

Extend this expression with filter and reduce so that:

a. The list only contains "last, first" formatted names that are at least 13 characters in length (including the , and space). b. We obtain the sum of the lengths of all the names in "last, first" format.