MIT OpenCourseWare
http://ocw.mit.edu


6.001 Structure and Interpretation of Computer Programs, Spring 2005

Transcript – 3B: Symbolic Differentiation; Quotation



[MUSIC PLAYING] PROFESSOR: Well, Hal just told us how you build robust systems. The
key idea was-- I'm sure that many of you don't really assimilate that yet-- but the key idea

is that in order to make a system that's robust, it has to be insensitive to sm all changes,
that is, a small change in the problem should lead to only a small change in the solution.

There ought   to be a continuity. The space of solutions ought to be continuous in this space
of problems.





The way he was explaining how to do that was instead of solving a particular problem at
every level of decomposition of the problem at the subproblems, where you solve the class

of problems, which are a neighborhood of the particular problem that you're trying to solve.
The way you do that is by producing a language at that level of detail in which the solutions

to that class of problems is representable in that language. Therefore when you makes
more changes to the problem you're trying to solve, you generally have to make only small

local changes to the solution you've constructed, because at the level of detail you're

working, there's a language where you can express the various solutions to alternate
problems of the same type.




Well that's the beginning of a very important idea, the most important perhaps idea that

makes computer science more powerful than most of the other kinds of engineering
disciplines we know about. What we've seen so far is sort of how to use embedding of

languages. And, of course, the power of embedding languages partly comes from

procedures like this one that I showed you yesterday.




What you see here is the derivative program that we described yesterday. It's a procedure
that takes a procedure as an argument and returns a procedure as a value. And using such

things is very nice. You can make things like push combinators and all that sort of wonderful

thing that you saw last time.




However, now I'm going to really muddy the waters. See this confuses the issue of what's
the procedure and what is data, but not very badly. What we really want to do is confuse it

very badly. And the best way to do that is to get involved with the manipulation of the

algebraic expressions that the procedures themselves are expressed in.
So at this point, I want to talk about instead of things like on this slide, the derivative

procedure being a thing that manipulates a procedure-- this is a numerical method you see
here. And what you're seeing is a representation of the numerical approximation to the

derivative. That's what's here. In fact what I'd like to talk about is instead things that look
like this. And what we have here are rules from a calculus book.





These are rules for finding the derivatives of the expressions that one might write in some
algebraic language. It says things like a derivative of a constant is 0. The derivative of the

valuable with respect to which you are taking the derivative is 1. The derivative of a
constant times the function is the constant times the derivative of the function, and things

like that. These are exact expressions. These are not numerical approximations. Can we
make programs? And, in fact, it's very easy to make programs that manipulate these

expressions.




Well let's see. Let's look at these rules in some detail. You all have seen these rules in your

elementary calculus class at one time or another. And you know from calculus that it's easy
to produce derivatives of arbitrary expressions. You also know from your elementary

calculus that it's hard to produce integrals. Yet integrals and derivatives are opposites of

each other. They're inverse operations. And they have the same rules. What is special about
these rules that makes it possible for one to produce derivatives easily and integrals why it's

so hard? Let's think about that very simply.




Look at these rules. Every one of these rules, when used in the direction for taking

derivatives, which is in the direction of this arrow, the left side is matched against your
expression, and the right side is the thing which is the derivative of that expression. The

arrow is going that way. In each of these rules, the expressions on the right -hand side of
the rule that are contained within derivatives are subexpressions, are proper

subexpressions, of the expression on the left-hand side.




So here we see the derivative of the sum, with is the expression on the left-hand side is the

sum of the derivatives of the pieces. So the rule of moving to the right are reduction rules.
The problem becomes easier. I turn a big complicated problem it's lots of smaller problems

and then combine the results, a perfect place for recursion to work.




If I'm going in the other direction like this, if I'm trying to produce integrals, well there are

several problems you see here. First of all, if I try to integrate an expression like asum,
more than one rule matches. Here's one that matches. Here's one that matches. I don't

know which one to take. And they may be different. I may get to explore different things.
Also, the expressions become larger in that direction. And when the expressions become
larger, then there's no guarantee that any particular path I choose will terminate, because

we will only terminate by accidental cancellation. So that's why integrals are complicated
searches and hard to do.





Right now I don't want to do anything as hard as that. Let's work on derivatives for a while.
Well, these roles are ones you know for the most part hopefully. So let's see if we can write

a program which is these rules. And that should be very easy. Just write the program. See,
because while I showed you is that it's a reduction rule, it's something appropriate for a

recursion. And, of course, what we have for each of these rules is we have a case in some
case analysis.





So I'm just going to write this program down. Now, of course, I'm going to be saying
something you have to believe. Right? What you have to believe is I can represent these

algebraic expressions, that I can grab their parts, that I can put them together. We've
invented list structures so that you can do that. But you don't want to worry about that

now. Right now I'm going to write the program that encapsulates these rules independent of
the representation of the algebraic expressions.





You have a derivative of an expression with respect to a variable. This is a different thing
than the derivative of the function. That's what we saw last time, that numerical

approximation. It's something you can't open up a function. It's just the answers. The
derivative of an expression is the way it's written. And therefore it's a syntactic

phenomenon. And so a lot of what we're going to be doing today is worrying about syntax,

syntax of expressions and things like that.




Well, there's a case analysis. Anytime we do anything complicated thereby a recursion, we
presumably need a case analysis. It's the essential way to begin. And that's usually a

conditional of some large kind. Well, what are their possibilities? the first rule that you saw

is this something a constant? And what I'm asking is, is the expression a constant with
respect to the variable given? If so, the result is 0, because the derivative represents the

rate of change of something.




If, however, the expression that I'm taking the derivative of is the variable I'm varying,
then this is the same variable, the expression var, then the rate of change of the expression

with respect to the variable is 1. It's the same 1.
Well now there are a couple of other possibilities. It could, for example, be a sum. Well, I

don't know how I'm going to express sums yet. Actually I do. But I haven't told you yet. But
is it a sum? I'm imagining that there's some way of telling. I'm doing a dispatch on the type

of the expression here, absolutely essential in building languages. Languages are made out
of different expressions. And soon we're going to see that in our more powerful methods of

building languages on languages.




Is an expression a sum? If it's a sum, well, we know the rule for derivative of the sum is the

sum of the derivatives of the parts. One of them is calledthe addend and the other is the
augend. But I don't have enough space on the blackboard to such long names. So I'll call

them A1 and A2.




I want to make a sum. Do you remember which is the sum for end or the menu end? Or

was it the dividend and the divisor or something like that? Make sum of the derivative of the
A1, I'll call it. It's the addend of the expression with respect to the variable, and the

derivative of the A2 of the expression, because the two arguments, the addition with
respect to the variable.





And another rule that we know is product rule, which is, if the expression is a product. By
the way, it's a good idea when you're defining things, when you're defining predicates, to

give them a name that ends in a question mark. This question mark doesn't mean anything.
It's for us as an agreement. It's a conventional interface between humans so you can read

my programs more easily. So I want you to, when you write programs, if you define a

predicate procedure, that's something that rings true of false, it should have a name which
ends in question mark. The list doesn't care. I care.




I want to make a sum. Because the derivative of a product is the sum of the first times the

derivative of the second plus the second times the derivative of the first.Make a sum of two

things, a product of, well, I'm going to say the M1 of the expression, and the derivative of
the M2 of the expression with respect to the variable, and the product of the derivative of

M1, the multiplier of the expression, with respect to the variable. It's the product of that
and the multiplicand, M2, of the expression. Make that product. Make the sum. Close that

case.




And, of course, I could add as many cases as I like here for a complete set of rules you

might find in a calculus book. So this is what it takes to encapsulate those rules. And you
see, you have to realize there's a lot of wishful thinking here. I haven't told you anything

about how I'm going to make these representations. Now, once I've decided that this is my
set of rules, I think it's time to play with the representation. Let's attack that/
Well, first of all, I'm going to play a pun. It's an important pun. It's a key to a sort of
powerful idea. If I want to represent sums, and products, and differences, and quotients,

and things like that, why not use the same language as I'm writing my program in? I write
my program in algebraic expressions that look like the sum of the product on a and the

product of x and x, and things like that. And the product of b and x and c, whatever, make

that a sum of the product. Right now I don't want to have procedures with unknown
numbers of arguments, a product of b and x and c.




This is list structure. And the reason why this is nice, is because any one of these objects

has a property. I know where the car is. The car is the operator. And the operands are the

successive cdrs the successive cars of the cdrs of the list that this is. It makes it very
convenient. I have to parse it. It's been done for me. I'm using the embedding and Lisp to

advantage.




So, for example, let's start using list structure to write down the representation that I'm

implicitly assuming here. Well I have to define various things that are implied in this
representation. Like I have to find out how to do a constant, how you do same variable.

Let's do those first. That's pretty easy enough. Now I'm going to be introducing lots of
primitives here, because these are the primitives that come with list structure.





OK, you define a constant. And what I mean by a constant, an expression that's constant
with respect to a veritable, is that the expression is something simple. I can't take it into

pieces, and yet it isn't that variable. I can't break it up, and yet it isn't that variable. That
does not mean that there may be other expressions that are more complicated that are

constants. It's just that I'm going to look at the primitive constants in this way.




So what this is, is it says that's it's the and. I can combine predicate expressions which

return true or false with and. Something atomic, The expression is atomic, meaning it
cannot be broken into parts. It doesn't have a car and a cdr. It's not a list. It adds a special

test built into the system. And it's not identically equal to that variable. I'm representing my
variable by things that are symbols which cannot be broken into pieces, things like x, and y,

things like this. Whereas, of course, something like this can be broken up into pieces.




And the same variable of an expression with respect to a variable is, in fact, an atomic

expression. I want to have an atomic expression, which is identical. I don't want to look
inside this stuff anymore. These are primitive maybe. But it doesn't matter. I'm using things

that are given to me with a language. I'm not terribly interest in them
Now how do we deal with sums? Ah, something very interesting will happen. A sum is
something which is not atomic and begins with the plus symbol. That's what it means. So

here, I will define. An question is a sum if and it's not atomic and it's head, it's beginning,
its car of the expression is the symbol plus.





Now you're about to see something you haven't seen before, this quotation. Why do I have
that quotation there? Say your name,





AUDIENCE: Susanna.




PROFESSOR: Louder.




AUDIENCE: Susanna





PROFESSOR: Say your name.




AUDIENCE: Your name.




PROFESSOR: Louder.





AUDIENCE: Your name.




PROFESSOR: OK. What I'm showing you here is that the words of English are ambiguous. I

was saying, say your name. I was also possibly saying say, your name. But that cannot be
distinguished in speech. However, we do have a notation in writing, which is quotation for

distinguishing these two possible meanings.




In particular, over here, in Lisp we have a notation for distinguish ing these meetings. If I

were to just write a plus here, a plus symbol, I would be asking, is the first element of the
expression, is the operator position of the expression, the addition operator? I don't know. I
would have to have written the addition operator there, which I can't write. However, this

way I'm asking, is this the symbolic object plus, which normally stands for the addition
operator? That's what I want. That's the question I want to ask. Now before I go any

further, I want to point out the quotation is a very complex concept, and adding it to a
language causes a great deal of troubles.





Consider the next slide. Here's a deduction which we should all agree with. We have, Alyssa
is smart and Alyssa is George's mother. This is an equality, is. From those two, we can

deduce that George's mother is smart. Because we can always substitute equals for equals
in expressions. Or can we?





Here's a case where we have "Chicago" has seven letters. The quotation means that I'm
discussing the word Chicago, not what the word represents. Here I have that Chicago is the

biggest city in Illinois. As a consequence of this, I would like to deduce that the biggest city
in Illinois has seven letters. But that's manifestly false. Wow, it works.





OK, so once we have things like that, our language gets much more complicated. Because
it's no longer true that things we tend to like to do with languages, like substituting equals

for equals and getting right answers, are going to work without being very careful. We c an't
substitute into what's called referentially opaque contexts, of which a quotation is the

prototypical type of referentially opaque context. If you know what that means, you can
consult a philosopher. Presumably there is one in the room.





In any case, let's continue now, now that we at least have an operational understanding of
a 2000-year-old issue that has to do with name, and mention, and all sorts of things like

that. I have to define what I mean, how to make a sum of two things, an a1 and a2. And
I'm going to do this very simply. It's a list of the symbol plus, and a1, and a2. And I can

determine the first element. Define a1 to be cadr. I've just introduced another primitive.

This is the car of the cdr of something.




You might want to know why car and cdr are names of these primitives, and why they've
survived, even though they're much better ideas like left and right. We could have called

them things like that. Well, first of all, the names come from the fact that in the great past,
when Lisp was invented, I suppose in '58 or something, it was on a 704 or something like

that, which had a machine. It was a machine that had an address register and a decrement

register. And these were the contents of the address register and the decrement register.
So it's an historical accident.
Now why have these names survived? It's because Lisp programmers like to talk to each

other over the phone. And if you want to have a long sequence of cars and cdrs you might
say, cdaddedr, which can be understood. But left of right or right of left is not so clear if you

get good at it. So that's why we have these words. All of them up to four deep are defined
typically in a Lisp system. A2 to be-- and, of course, you can see that if I looked at one of

these expressions like the sum of 3 and 5, what that is is a list containing the symbol plus,
and a number 3, and a number 5. Then the car is the symbol plus. The car of the cdr. Well I

take the cdr and then I take the car. And that's how I get to the 3. That's the first

argument. And the car of the cdr of the cdr gets me to this one, the 5.




And similarly, of course, I can define what's going on with products. Let's do that very
quickly. Is the expression a product? Yes if and if it's true, that's it's not atomic and it's EQ

quote, the asterisk symbol, which is the operator for multiplication.




Make product of an M1 and an M2 to be list, quote, the asterisk operation and M1 and M2.

and I define M1 to be cadr and M2 to be caddr. You get to be a good Lisp programmer
because you start talking that way. I cdr down lists and console them up and so on.





Now, now that we have essentially a complete program for finding derivatives, you can add
more rules if you like. What kind of behavior do we get out of it? I'll have to clear that x.

Well, supposing I define foo here to be the sum of the product of ax square and bx plus c.
That's the same thing we see here as the algebraic expression written in the more

conventional notation over there.




Well, the derivative of foo with respect to x, which we can see over here, is this horrible,

horrendous mess. I would like it to be 2ax plus b. But it's not. It's equivalent to it. What is
it? I have here, what do I have? I have the derivative of the product of x and x. Over here

is, of course, the sum of x times 1 and 1 times x.




Now, well, it's the first times the derivative of the second plus the second times the

derivative of the first. It's right. That's 2x of course. a times 2x is 2ax plus 0X square
doesn't count plus B over here plus a bunch of 0's. Well the answer is right. But I give

people take off points on an exam for that, sadly enough. Let's worry about that in the next
segment. Are there any questions? Yes?
AUDIENCE: If you had left the quote when you put the plus, then would that be referring to

the procedure plus and could you do a comparison between that procedure and some other
procedure if you wanted to?





PROFESSOR: Yes. Good question. If I had left this quotation off at this point, if I had left
that quotation off at that point, then I would be referring here to the procedure which is the

thing that plus is defined to be.




And indeed, I could compare some procedures with each other for identity. Now what that

means is not clear right now. I don't like to think about it. Because I don't know exactly
what it would need to compare procedures. There are reasons why that may make no sense

at all. However, the symbols, we understand. And so that's why I put that quote in. I want
to talk about the symbol that's apparent on the page. Any other questions? OK. Thank you.

Let's take a break.




[MUSIC PLAYING]




PROFESSOR: Well, let's see. We've just developed a fairly plausible program for computing

the derivatives of algebraic expressions. It's an incomplete program, if you would like to add

more rules. And perhaps you might extend it to deal with uses of addition with any number
of arguments and multiplication with any of the number of arguments. And that's all rather

easy.




However, there was a little fly in that ointment. We go back to this slide. We see that the

expressions that we get are rather bad. This is a rather bad expression. How do we get such
an expression? Why do we have that expression? Let's look at this expression in some

detail. Let's find out where all the pieces come from. As we see here, we have a sum-- just
what I showed you at the end of the last time-- of X times 1 plus 1 time X. That is a

derivative of this product. The produce of a times that, where a does not depend up on x,
and therefore is constant with respect to x, is this sum, which goes from here all the way

through here and through here. Because it is the first thing times the derivative of the
second plus the derivative of the first times the second as the program we wrote on the

blackboard indicated we should do.




And, of course, the product of bx over here manifests itself as B times 1 plus 0 times X

because we see that B does not depend upon X. And so the derivative of B is this 0, and the
derivative of X with respect itself is the 1. And, of course, the derivative of the sums over

here turn into these two sums of the derivatives of the parts.




So what we're seeing here is exactly the thing I was trying to tell you about with Fibonacci

numbers a while ago, that the form of the process is expanded from the local rules that you
see in the procedure, that the procedure represents a set of local rules for the expansion of

this process. And here, the process left behind some stuff, which is the answer. And it was
constructed by the walk it takes of the tree structure, which is the expression. So every part

in the answer we see here derives from some part of the problem.




Now, we can look at, for example, the derivative of foo, which is ax square plus bx plus c,

with respect to other things, like here, for example, we can see that the derivative of foo
with respect to a. And it's very similar. It's, in fact, the identical algebraic expression,

except for the fact that theses 0's and 1's are in different places. Because the only degree of
freedom we have in this tree walk is what's constant with respect to the variable we're

taking the derivative with respect to and was the same variable.




In other words, if we go back to this blackboard and we look, we have no choice what to do

when we take the derivative of the sum or a product. The only interesting place here is, is
the expression the variable, or is the expression a constant with respect to that variable for

very, very small expressions? In which case we get various1's and 0's, which if we go back
to this slide, we can see that the 0's that appear here, for example, this 1 over here in

derivative of foo with respect to A, which gets us an X square, because that 1 gets the

multiply of X and X into the answer, that 1 is 0. Over here, we're not taking the derivative
of foo with respect to c. But the shapes of these expressions are the same. See all those

shapes. They're the same.




Well is there anything wrong with our rules? No. They're the right rules. We've been through

this one before. One of the things you're going to begin to discover is that there aren't too
many good ideas. When we were looking at rational numbers yesterday, the problem was

that we got 6/8 rather then 3/4. The answer was unsimplified. The problem, of course, is
very similar. There are things I'd like to be identical by simplification that don't become

identical. And yet the rules for doing addition a multiplication of rational numbers were
correct.





So the way we might solve this problem is do the thing we did last time, which always
works. If something worked last time it ought to work again. It's changed representation.

Perhaps in the representation we could put in a simplification step that produces a simplified
representation. This may not always work, of course. I'm not trying to say that it always

works. But it's one of the pieces of artillery we have in our war against complexity.




You see, because we solved our problem very carefully. What we've done, is we've divided

the world in several parts. There are derivatives rules and general rules for algebra of some
sort at this level of detail. and i have an abstraction barrier. And i have the representation

of the algebraic expressions, list structure.




And in this barrier, I have the interface procedures. I have constant, and things like same   -

var. I have things like sum, make-sum. I have A1, A2. I have products and things like that,
all the other things I might need for various kinds of algebraic expressions.




Making this barrier allows me to arbitrarily change the representation without changing the

rules that are written in terms of that representation. So if I can make the problem go

away by changing representation, the composition of the problem into these two parts has
helped me a great deal.




So let's take a very simple case of this. What was one of the problems? Let's go back to this

transparency again. And we see here, oh yes, there's horrible things like here is the sum of

an expression and 0. Well that's no reason to think of it as anything other than the
expression itself. Why should the summation operation have made up this edition? It can be

smarter than that. Or here, for example, is a multiplication of something by 1. It's another
thing like that. Or here is a product of something with 0, which is certainly 0. So we won't

have to make this construction.




So why don't we just do that? We need to change the way the representation works, almost

here. Make-sum to be. Well, now it's not something so simple. I'm not going to make a li st
containing the symbol plus and things unless I need to.





Well, what are the possibilities? I have some sort of cases here. If I have numbers, if
anyone is a number-- and here's another primitive I've just introduced, it's possible to tell

whether something's number-- and if number A2, meaning they're not symbolic
expressions, then why not do the addition now? The result is just a plus of A1 and A2.




I'm not asking if these represent numbers. Of course all of these symbols represent

numbers. I'm talking about whether the one I've got is the number 3 right now. An d, for
example, supposing A1 is a number, and it's equal to 0, well then the answer is just A2.

There is no reason to make anything up. And if A2 is a number, and equal A20, then the
result is A1.





And only if I can't figure out something better to do with this situation, well, I can start a
list. Otherwise I want the representation to be the list containing the quoted symbol plus,

and A1, and A2.




And, of course, a very similar thing can be done for products. And I think I'll avoid boring

you with them. I was going to write it on the blackboard. I don't think it's necessary. You
know what to do. It's very simple.




But now, let's just see the kind of results we get out of changing our program in this way.

Well, here's the derivatives after having just changed the constructors for expressions. The

same foo, aX square plus bX plus c, and what I get is nothing more than the derivative of
that is 2aX plus B.




Well, it's not completely simplified. I would like to collect common terms and sums. Well,

that's more work. And, of course, programs to do this sort of thing are huge and

complicated. Algebraic simplification, it's a very complicated mess. There's a very famous
program you may have heard of called Maxima developed at MIT in the past, which is 5,000

pages of Lisp code, mostly the algebraic simplification operations.




There we see the derivative of foo. In fact, X is at something I wouldn't take off more than

1 point for on an elementary calculus class. And the derivative of foo with respect to a, well
it's gone down to X times X, which isn't so bad. And the derivative of foo with respect to b is

just X itself. And the derivative of foo with respect to c comes out 1. So I'm pretty pleased
with this.





What you've seen is, of course, a little bit contrived, carefully organized example to show
you how we can manipulate algebraic expressions, how we do that abstractly in terms of

abstract syntax rather than concrete syntax and how we can use the abstraction to control
what goes on in building these expressions.




But the real story isn't just such a simple thing as that. The real story is, in fact, that I'm

manipulating these expressions. And the expressions are the same expressions-- going back
to the slide-- as the ones that are Lisp expressions. There's a pun here. I've chosen my

representation to be the same as the representation in my language of similar things. By
doing so, I've invoked a necessity. I created the necessity to have things like quotation

because of the fact that my language is capable of writing expressions that talk about
expressions of the language. I need to have something that says, this is an expression I'm

talking about rather than this expression is talking about something, and I want to talk
about that.





So quotation stops and says, I'm talking about this expression itself. Now, given that power,
if I can manipulate expressions of the language, I can begin to build even much more

powerful layers upon layers of languages. Because I can write languages that not only are
embedded in Lisp or whatever language you start with, but languages that are completely

different, that are just, if we say, interpreted in Lisp or something like that.




We'll get to understand those words more in the future. But right now I just want to leave

you with the fact that we've hit a line which gives us tremendous power. And this point
we've bought a sledgehammer. We have to be careful to what flies when we apply it. Thank

you.




[MUSIC PLAYING]
MIT OpenCourseWare
http://ocw.mit.edu


6.001 Structure and Interpretation of Computer Programs, Spring 2005





Please use the following citation format:

       Eric Grimson, Peter Szolovits, and Trevor Darrell, 6.001 Structure and

       Interpretation of Computer Programs, Spring 2005. (Massachusetts Institute
       of Technology: MIT OpenCourseWare).    http://ocw.mit.edu (accessed MM DD,
       YYYY). License: Creative Commons Attribution-Noncommercial-Share Alike.


Note: Please use the actual date you accessed thismaterial in your citation.



For more information about citing these materials or our Terms of Use, visit:
http://ocw.mit.edu/terms