Understanding the Meaning of the Taylor Series

December 2, 2011

One of my favorite things I learned in math was the the Taylor/Maclaurin series, so I spent a lot of time thinking about how they worked.

I think that this is remarkable.

Approximating functions using the Taylor series is much like breaking up a function into its components of y intercept, slope, and concavity. You’re looking at the function’s constituent parts.

We can approximate a function for an entire domain simply by ensuring that the value of the function at a point is equal to the value of a polynomial at the same point.

We can even better approximate a function for an entire domain simply by ensuring that its first derivative is the same as a polynomial’s first derivative at a point as well as ensuring that the functions are equal at the same point.

We can yet again better approximate the function for an entire domain simply by ensuring its second and first derivative are equal to a polynomial’s second and first derivative at the same one point. As well as ensuring that the functions are equal at the same point.

Now, which points are the best to do this with? Are all points equally valid for approximation if the function is infinitely differentiable? Do some points yield a better approximation than others? In essence, do some points represent the function better than other points? It’s as if each point of the function has encoded in it information about all of the other points. The function was bred from DNA and every point has a piece of that DNA in it. Each point has a first, second, and third derivative as well as much more.

This is strange. Obviously we cannot figure out anything about a function using one point alone. We know nothing if we are given one point. Two points give us one additional piece of information: the slope at that point.

This is strange. Now that I’m thinking about it, you need an infinite number of points to say anything about the function as a whole: the concavity at one point tells us nothing really about the concavity at another point. But, this is only true for functions who have infinitely changing behavior.

It’s as if taking more and more derivatives explain the function’s behavior at further and further points. Once I get to the derivative whose value is 0, that means that the function essentially stops evolving. It means that the section you’ve covered matches the behavior of the function for the entire domain.

Possible hypothesis: For any three points, only one two degree polynomial can connect all three. It’s like this: just as two lines can intersect a maximum of one time, two parabolas (seem to only be able to) intersect a maximum of two times. This means that three points can uniquely define a parabola. There is a unique mix of position, slope, and concavity that yields a parabola.

We can approximate polynomials with a finite number of terms using the Taylor series.

Let’s say we have the function

f(x) = .5x^3

Well, let’s say we want to look at x=2 on f(x). We know the following things about f(2):

Value = 4
Slope = 6
Concavity = 6
Jerk = 3

How do we know these things? Well, we know them from the equation (taking 1st, 2nd, 3rd derivatives and evaluating) but they could just as easily be approximated if we were given 4 points near x=2 and told that they all fell o n a cubic function.

Now, we have to find an equation p(x) that takes into account the concavity, slope, value, and jerk to model f(x).

In order to make a good approximation, we should have f(2)=p(2)

So let’s just start with p(x)=4

This is a terrible approximation so we introduce the slope as well (we want f’(2)=p’(2)) so p(x) = 4+6(x-2) which gets us at 4 when x=2 and for every little bit x is greater than 2, p(x) is 6 times that amount so we’ve now made a tangent line to f(x) at x=2.

But, because there is concavity and jerk in f(x), we have to include those if we want to get a good approximation. So, we need f’’(2)=p’’(2), which means a little bit of finagling: because the concavity is dependent on an x^2 term, we need to make sure the coefficients of x are just so that we have f’’(2)=p’’(2).

p(x)=4 + 6(x-2)+(6/2)(x-2)^2 is so that the value of the second derivative of p(x) will be 6 because the 2 will come down and multiply.

Then we have: p(x) = 4 + 6(x-2)+(6/2)(x-2)^2+(3/(3*2))(x-2)^3 because the 3 will come down to multiply and so will the two after that. The general equation for this is the Taylor series. We notice that only polynomials will have a limited number of terms.

My side question is do these points have to be very close to each other for one to be able to find the slope, concavity, and jerk for the entire domain? Another side question: why is x^2 in the concavity term? Another side question: how can we explain the average value using this relationship between a function’s definition and its subcomponents (concavity, jerk, slope, etc).

Explanations to Alan and Jonathan

Now, what I just learned and I'm thinking about is this series that allows you to approximate any function using a string of terms

Essentially a really really long polynomial.

Now, what's interesting is that the fact that this can happen means that you can approximate the future behavior of a function by evaluating localized parameters.

Like, let's say you know the concavity, slope, and value of a function at x=0

Well, this says that you can construct a polynomial that will very closely approximate that function near x=0

That's cool

But what's amazing

is that if you calculate what's after concavity, and after that and so forth for the area around x=0, you can approximate the entire function for all of x!

So, you have some function that you want to approximate

You just want to represent it using something else.

What you do is you start at a point and you say: "Well, if I want the approximation to be good I want the function I'm using to approximate to be equal to the actual function at this point"

Let's call the original function f(x)

and the function you're using to approximate g(x)

You say: "ok, that's very well, now I've set g(x) to be a constant so that it's equal to f(x) at the point I wanted it to be"

"But, that doesn't tell me much about the entire domain of f(x)"

"So, let me try a bit harder: at that same point, let's make the first derivative of g(x) be equal to the slope of f(x) at that point"

"While still maintaining that they evaluate to the same number at that point, of course"

Now you've got a tangent line to the function at that point.

It's a much better approximation to f(x) than that constant, non-sloping line you had before.

But, you realize that's not enough because your current g(x) approximate well for more distant values

So then you do the same thing for the third, fourth, and fifth derivative etc

It turns out that as long as you keep doing this, you can approximate almost any function with just a polynomial.

Of course, the polynomial may have infinitely many terms.

To say that you all you need are three points to uniquely describe a parabola means that hardcoded into every point is fragmented information about the entire function as a whole. When these points are put together, you can describe the entire polynomial.