deep learning _ study notes [1 not finished]

iktueren semih
3 min readMay 31, 2023

--

[logistic regression model, loss function, cost function, vectorization]

[dall-e]

Let’s consider the task of creating a model that tells us whether an image is a cat or not, based on a traditional example. Let X be the input data, and y^ represent the predicted value that tells us if X is a cat or not.

y^ = P(y = 1|X) predicted vaule is y^ which we want it to y=1 according to X varriable. Assume that X is a vector of real numbers, with a size of 64x64 pixels and divided into RGB three color channels. X ∈ R^(nx)

We assume that we have two parameters: w ∈ R^(nx) and b ∈ R.

The output y^ = w^TX + b formulation cannot give us the desired result. We want y^ to take a value between 0 and 1, representing the probability of X being a cat.

If we introduce the logistic regression structure into the sigmoid function: w^TX + b = z

S(z) = 1 / (1 + e^(-z))

If the z value is a very large positive number, the result will approach 1.

If the z value is a very large negative number, the result will approach 0.

In deep learning, we need X0 parameter.

the b and w parameters

y^ = S(w^TX+b), where S(z) = 1/(1+e^-z)

Loss (error) function L(y^,y) =1/2(y^-y)² but in logistic regression people usually do not use this function this is costing optimization problem, we will get there later.

the useful version of loss function in this regression is as
L(y^,y)= -(ylogy^ + (1-y)log(1-y^))

if y=1>> result will be -logy^ and we deserve it to as big as possible

if y=0>> result will be ^log(1-y) we deserve as small as possible

if we apply it i=1 to m serie we will get the cost function and it is as below

Gradient Descent

We want to minimize J(w, b) considering the three-dimensional axis, where w and b represent (x,y axis) the height of the value is J(w,b). Our goal is to find the minimum value of J(w, b) since it is a convex function. The initialization method does not matter as it is convex the correct result will be found.

consider we have x1 and x2, surely we will have w1 and w2 and b value. In logistic regression our aim is reduce the w and b.

z=w1x1+w2x2+b >>a=S(z) >> L(a,y) [we use a as varriable isntead of y^]
da=dL(a,y)/da
d/d(a)(-ylog(a)) = -y * (1/a)
d/d(a)((1 - y)log(1 - a))
u = 1 - a
v = log(u)
d/d(a)((1 - y)log(1 - a)) = d(v)/d(u) * d(u)/d(a) = (1 - y) * (1/u) * (-1)
d/d(a)((1 - y)log(1 - a)) = (1 - y) * (1/(1 - a)) * (-1) = -(1 - y)/(1 - a)
dL(a, y)/da = -y * (1/a) - (1 - y)/(1 - a)
dL(a, y)/da = (-y/a) + (1 - y)/(1 - a)
(-y/a) + (1 - y)/(1 - a)


dL/da = -y/a + (1 - y)/(1 - a)
da/dz = d(σ(z))/dz = σ(z) * (1 - σ(z))
dz = dL/da * da/dz
= (-y/a + (1 - y)/(1 - a)) * (σ(z) * (1 - σ(z)))
= (-y/a + (1 - y)/(1 - a)) * (a * (1 - a))
= (-y + a(1 - y))/(a(1 - a))
= (a - y)/(a(1 - a))
dz = (a - y)/(a(1 - a))
L(a, y) = -ylog(a) + (1 - y)log(1 - a)
=a-y

for i=1 to m example

w1=w1-alpha.dw1

w2=w2-alpha.dw2

b=b-alpha.db

for a simple calculation there should be 4 times for loops which will kill the performce of model. preventing this vectorisation is the key.

--

--

iktueren semih

An engineering-based software developer who constantly learns new things and develops himself in the magical world of 0's and 1s's, the limits of imagination.