# How do you differentiate by first principles?

Firstly, what are we trying to achieve by differentiating a function? Differentiation allows us to compute the rate of change of a function at any point along a well behaved function.

Consider the curve y = f(x)

Imagine moving along the curve to a point with coordinates (x_{1},y_{1}) and then moving to a new point (x_{2},y_{2}). What is the gradient between these points? Trivially, it is y_{2}-y_{1}/x_{2}-x_{1}

What happens if we draw a line between those two points? Assuming the points are a suitable distance apart and the curve is suitably curvy and well behaved, this line is unlikely to follow the curve, nor be tangent to the curve. Let's think about our definition of differentiation again, we want to describe/compute the rate of change (or gradient) of the function at a point. This is the tangent to the function. Why is it the tangent though? Since we require the rate of change at a point, we need the gradient at a point. But surely that is contradictory? In fact yes, there cannot be a gradient at a true point, but... There is a gradient between a point and a point infinitessimally nearby. Now we are getting somewhere. So in order to differentiate we must choose the coordinates (x_{2},y_{2}) such that:

x_{2} = x_{1} + dx where dx is infinitessimally small.

y_{2} = f(x_{2}) = f(x_{1 }+ dx)

Then trivially the gradient is simply:

( (f(x_{1}+dx)-f(x_{1}) ) / (x_{1}+dx-x_{1}) = ( f(x + dx) - f(x) ) / dx

(note: for clarity I have rewritten x_{1} as x)

For example lets plug in a simply function like f(x) = x^{2}

The numerator then looks like:

x^{2} + 2x*dx + dx^{2} - x^{2}

The divisor:

dx

The whole fraction is then just:

2x + dx.

Since dx is infinitessimal we recover the expected result of 2x using the 'bring down power rule'

Try to convince yourself the power rule holds for all x^{n} where n is an integer using binomial expansion and taking suitable approximations (hint: dx^{n} is always negligable!