# Understanding differentiation from first principle.

What is first principle? Why does it look nothing like the differentiation I had been doing for the past few years?

Differentiation from first principle is the main idea behind differentiation, a technique we employ to measure instantaneous rate of change. By now, you probably recognize "rate of change" as being synonymous to the term "gradient" or "average speed/distance" in context of several word problems. It is, therefore, not surprising that the definition of first principle is very similar to the gradient formula!

Let us recall: the gradient of line connecting two points (x1,y1) and (x2,y2) is given by the equation grad = (y1-y2)/(x1-x2). The definition of first principle says f'(x) = lim_{dx->0} (f(x)-f(x+dx))/dx. (Due to limitation of software, we'll refer to "delta x" by "dx" here for convenience.)

Let's examine my claim that both formulas are very similar. Suppose I have a graph y=f(x). Pick a point on the graph and name it (x1,y1). Now, to find the **instantaneous **rate of change at point (x1,y1), what can we do? Without the knowledge of the equation of the graph, we cannot perform differentiation using the rules that we'd learned. So, instead of trying to get the correct answer, let's approximate. Let's pick a point **as cloes to (x1,y1) as we can** and name this point (x2,y2). Because these points are very close together, it is safe to say that the line connecting them is a good approximation of the *tangent line to point (x1,y1). *In that case, we can use the gradient formula to find an approximate solution to the** instantaneous** gradient at (x1,y1)! By letting the distance between x1 and x2 be dx, we have: x2=x1+dx. Plug this into the gradient formula and we will get

approx. grad = (y1-y2)/(x1-x2) = (f(x1) - f(x2))/(x1 - (x1+dx)) = (f(x1) - f(x1+dx))/dx.

There is still something different - the "lim_{dx -> 0}" notation. What does it mean anyway? Reading the symbols out it says "the limit of the expression as "dx" approaches zero". You see, in order to improve accuracy of our gradient's approximation, we only need to pick (x2,y2) to be even closer to (x1,y1). Theoretically, then, if the distance between x1 and x2 is **so close** that it is **almost zero**, then our approximation would be the exact solution. Hence, if we allow "dx" to approach zero, then we can confidently change the left-hand-side of the approximation to the exact value, in other words:

f'(x1) = lim_{dx->0} (f(x1)-f(x1+dx))/dx.

Since (x1,y1) is just a name we gave to the point that we are interested in, we may substitute it with (x,y) (to make the formula applicable in a generic x-y plot) to obtain the definition of first principle as presented in our textbooks.