<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:cc="http://cyber.law.harvard.edu/rss/creativeCommonsRssModule.html">
    <channel>
        <title><![CDATA[Stories by Konstantinos Gyftodimos on Medium]]></title>
        <description><![CDATA[Stories by Konstantinos Gyftodimos on Medium]]></description>
        <link>https://medium.com/@konstantinos.gyftodimos?source=rss-62b26cb442e7------2</link>
        <image>
            <url>https://cdn-images-1.medium.com/fit/c/150/150/1*VYcfTtiagPkSbpWR8pSC0A.png</url>
            <title>Stories by Konstantinos Gyftodimos on Medium</title>
            <link>https://medium.com/@konstantinos.gyftodimos?source=rss-62b26cb442e7------2</link>
        </image>
        <generator>Medium</generator>
        <lastBuildDate>Sat, 23 May 2026 16:03:26 GMT</lastBuildDate>
        <atom:link href="https://medium.com/@konstantinos.gyftodimos/feed" rel="self" type="application/rss+xml"/>
        <webMaster><![CDATA[yourfriends@medium.com]]></webMaster>
        <atom:link href="http://medium.superfeedr.com" rel="hub"/>
        <item>
            <title><![CDATA[1 — Pinhole Camera & Perspective Projection]]></title>
            <link>https://medium.com/@konstantinos.gyftodimos/1-pinhole-camera-perspective-projection-806dbad907f?source=rss-62b26cb442e7------2</link>
            <guid isPermaLink="false">https://medium.com/p/806dbad907f</guid>
            <category><![CDATA[pinhole-camera]]></category>
            <category><![CDATA[computer-vision]]></category>
            <category><![CDATA[pinhole-photography]]></category>
            <dc:creator><![CDATA[Konstantinos Gyftodimos]]></dc:creator>
            <pubDate>Sun, 15 Jan 2023 17:06:10 GMT</pubDate>
            <atom:updated>2023-01-15T17:08:11.339Z</atom:updated>
            <content:encoded><![CDATA[<h3>Computer Vision Quick Series— Pinhole Camera &amp; Perspective Projection</h3><p><strong>Theory :</strong></p><p>A pinhole camera is a simple camera without a lens, where an aperture, called a pinhole, serves as the aperture to control the amount of light entering the camera. The image is formed by light passing through the pinhole and projecting an inverted image onto the opposite side of the camera called image plane.</p><p>In order to understand how a 3D object can be projected on the image plane through a pinhole, one has to observe Figure 1 where:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*KK1exgzNSv7v4z0VzgDeaw.png" /><figcaption>Figure 1 : Pinhole Camera Components</figcaption></figure><ul><li><strong>Optical Axis</strong>: The axis with normal vector perpendicular to the image plane.</li><li><strong>Pinhole</strong>: Optical axis’ center.</li><li><strong>Effective focal length</strong>: Distance of pinhole and image plane along the z − axis.</li><li><strong>P0, P1</strong>: Real and projected point that are described by the vectors r0 and r1 respectively.</li></ul><p>To find the relationship between the real point P0 on the 3D object, and the projected point P1 on the image plane, one has to notice the similar triangles in the figure above:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/688/1*wUcZYhXHZ6B0KEz4qe9t0Q.png" /><figcaption>Calculation of x1, y1 coordinates of the projected 3D object on the image plane.</figcaption></figure><p><strong>Code:</strong></p><p>A simple pinhole camera model can be directly modeled and visualized in the Python code below:</p><pre>import cv2<br>import numpy as np<br><br># Create a blank image with a black background<br>width, height = 640, 480<br>image = np.zeros((height, width, 3), np.uint8)<br><br># Define the camera matrix<br>focal_length = 1<br>center = np.array([width/2, height/2])<br>camera_matrix = np.array([[focal_length, 0, center[0]],<br>                        [0, focal_length, center[1]],<br>                        [0, 0, 1]], dtype = &quot;double&quot;)<br><br># Create a 3D point in the world space<br>world_points = np.array([[0, 0, 0]], dtype=&#39;double&#39;)<br><br># Project the 3D point onto the image plane<br>projected_points, _ = cv2.projectPoints(world_points, np.zeros((3,1)), np.zeros((3,1)), camera_matrix, None)<br><br># Draw the projected point on the image<br>cv2.circle(image, tuple(np.squeeze(projected_points[0]).astype(int)), 5, (0, 255, 0), -1)<br><br>cv2.imshow(&quot;Pinhole Camera&quot;, image)<br>cv2.waitKey(0)<br>cv2.destroyAllWindows()</pre><p><strong>Outro:</strong></p><p>Hope this was helpful!</p><p>I can personally code your A.I project! Hire me via Fiverr:</p><p><a href="https://www.fiverr.com/share/98GZLA">https://www.fiverr.com/share/98GZLA</a></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=806dbad907f" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Optical Flow with Python & OpenCV]]></title>
            <link>https://medium.com/@konstantinos.gyftodimos/optical-flow-with-python-opencv-d93ace9a9784?source=rss-62b26cb442e7------2</link>
            <guid isPermaLink="false">https://medium.com/p/d93ace9a9784</guid>
            <category><![CDATA[optical-flow]]></category>
            <category><![CDATA[opencv-python]]></category>
            <category><![CDATA[python]]></category>
            <category><![CDATA[opencv]]></category>
            <category><![CDATA[computer-vision]]></category>
            <dc:creator><![CDATA[Konstantinos Gyftodimos]]></dc:creator>
            <pubDate>Sat, 07 Jan 2023 16:22:13 GMT</pubDate>
            <atom:updated>2023-01-07T16:22:52.481Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/639/1*Art3QRZWqrOX0mxhQpINyw.png" /><figcaption>Optical Flow between car pixels in consequtive frames.</figcaption></figure><p><strong>Table of Contents:</strong></p><ul><li>Intro</li><li>Lucas-Kanade method <em>(explanation &amp; code)</em></li><li>Horn-Schunck method <em>(explanation &amp; code)</em></li><li>Farneback method <em>(explanation &amp; code)</em></li><li>Outro</li></ul><p><strong>Intro</strong></p><p>In this article 3 different methods for optical flow will be briefly explained and implemented.</p><p>Optical flow is a technique used to measure the motion of objects in an image or video. It is based on the idea that the apparent motion of objects in an image can be used to estimate the underlying motion of those objects in the real world. Optical flow algorithms are used in a wide range of applications, including video compression, object tracking, and image registration.</p><p>To understand optical flow, it is helpful to consider a simple example. Suppose that you are watching a video of a car driving down a road. As the car moves from one frame of the video to the next, the pixels that make up the car will also move. If you were to plot the positions of these pixels in each frame of the video, you would see that they form a curve. This curve is called the “optical flow” of the car.</p><p><strong>Lucas-Kanade method <em>(explanation &amp; code)</em></strong></p><p>One way to estimate the optical flow of objects in an image or video is to use the Lucas-Kanade method. This method is based on the assumption that the motion of objects in an image can be approximated by a small displacement vector, which describes the change in position of the objects from one frame to the next. To estimate the optical flow using the Lucas-Kanade method, you would need to compute these displacement vectors for each pixel in the image.</p><p>Here is some example Python code that demonstrates how to use the Lucas-Kanade method to estimate the optical flow of a simple image:</p><pre>import cv2<br>import numpy as np<br><br># Read the first frame of the video<br>prev_frame = cv2.imread(&#39;frame1.jpg&#39;)<br><br># Convert the frame to grayscale<br>prev_gray = cv2.cvtColor(prev_frame, cv2.COLOR_BGR2GRAY)<br><br># Read the second frame of the video<br>next_frame = cv2.imread(&#39;frame2.jpg&#39;)<br><br># Convert the frame to grayscale<br>next_gray = cv2.cvtColor(next_frame, cv2.COLOR_BGR2GRAY)<br><br># Compute the Lucas-Kanade Optical Flow<br>flow = cv2.calcOpticalFlowFarneback(prev_gray, next_gray, None, 0.5, 3, 15, 3, 5, 1.2, 0)<br><br># Convert the flow to x and y coordinates<br>flow_x = flow[:, :, 0]<br>flow_y = flow[:, :, 1]<br><br># Calculate the magnitude and angle of the flow vectors<br>magnitude, angle = cv2.cartToPolar(flow_x, flow_y)<br><br># Draw the flow vectors on the frame<br>h, w = prev_gray.shape[:2]<br>fx, fy = flow[:, :, 0], flow[:, :, 1]<br>lines = np.vstack([fx, fy, np.ones(fx.shape)])<br>[vx, vy, x, y] = np.linalg.lstsq(lines.T, np.ones(fx.shape), rcond=None)[0]<br>result = cv2.warpAffine(prev_frame, cv2.getRotationMatrix2D((x, y), angle, 1.0), (w, h), flags=cv2.INTER_LINEAR + cv2.WARP_INVERSE_MAP)<br>result[vy:vy + h, vx:vx + w] = next_frame<br><br># Display the resulting frame<br>cv2.imshow(&#39;Optical Flow&#39;, result)<br>cv2.waitKey(0)<br>cv2.destroyAllWindows()</pre><p><strong>Horn-Schunck method <em>(explanation &amp; code)</em></strong></p><p>The Horn-Schunck method is another technique used to estimate the optical flow between two images. It is based on the assumption that the flow is smooth, which means that pixels that are close to each other in the image will have similar flow vectors.</p><p>To compute the optical flow using the Horn-Schunck method with Python and OpenCV, you can use the calcOpticalFlowHS function. This function takes in the previous frame, the current frame, and some parameters that control the smoothness of the flow and the accuracy of the computation. It returns the flow field, which is a 2D array with the flow vectors for each pixel in the image.</p><p>Here is an example of how to use the Horn-Schunck method to compute the optical flow between two images:</p><pre>import cv2<br>import numpy as np<br><br># Read the first frame of the video<br>prev_frame = cv2.imread(&#39;frame1.jpg&#39;)<br><br># Convert the frame to grayscale<br>prev_gray = cv2.cvtColor(prev_frame, cv2.COLOR_BGR2GRAY)<br><br># Read the second frame of the video<br>next_frame = cv2.imread(&#39;frame2.jpg&#39;)<br><br># Convert the frame to grayscale<br>next_gray = cv2.cvtColor(next_frame, cv2.COLOR_BGR2GRAY)<br><br># Compute the Horn-Schunck Optical Flow<br>flow = cv2.calcOpticalFlowHS(prev_gray, next_gray, 0, None, (cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 10, 1))<br><br># Convert the flow to x and y coordinates<br>flow_x = flow[:, :, 0]<br>flow_y = flow[:, :, 1]<br><br># Calculate the magnitude and angle of the flow vectors<br>magnitude, angle = cv2.cartToPolar(flow_x, flow_y)<br><br># Draw the flow vectors on the frame<br>h, w = prev_gray.shape[:2]<br>fx, fy = flow[:, :, 0], flow[:, :, 1]<br>lines = np.vstack([fx, fy, np.ones(fx.shape)])<br>[vx, vy, x, y] = np.linalg.lstsq(lines.T, np.ones(fx.shape), rcond=None)[0]<br>result = cv2.warpAffine(prev_frame, cv2.getRotationMatrix2D((x, y), angle, 1.0), (w, h), flags=cv2.INTER_LINEAR + cv2.WARP_INVERSE_MAP)<br>result[vy:vy + h, vx:vx + w] = next_frame<br><br># Display the resulting frame<br>cv2.imshow(&#39;Optical Flow&#39;, result)<br>cv2.waitKey(0)<br>cv2.destroyAllWindows()</pre><p><strong>Farneback method <em>(explanation &amp; code)</em></strong></p><p>The Farneback method is a technique used to estimate the optical flow between two images. It is based on the assumption that the flow is approximately a quadratic function of the pixel intensities.</p><p>To compute the optical flow using the Farneback method with Python and OpenCV, you can use the calcOpticalFlowFarneback function. This function takes in the previous frame, the current frame, and some parameters that control the smoothness of the flow and the accuracy of the computation. It returns the flow field, which is a 2D array with the flow vectors for each pixel in the image.</p><p>Here is an example of how to use the Farneback method to compute the optical flow between two images:</p><pre>Copy code<br>import cv2<br>import numpy as np<br><br># Read the first frame of the video<br>prev_frame = cv2.imread(&#39;frame1.jpg&#39;)<br><br># Convert the frame to grayscale<br>prev_gray = cv2.cvtColor(prev_frame, cv2.COLOR_BGR2GRAY)<br><br># Read the second frame of the video<br>next_frame = cv2.imread(&#39;frame2.jpg&#39;)<br><br># Convert the frame to grayscale<br>next_gray = cv2.cvtColor(next_frame, cv2.COLOR_BGR2GRAY)<br><br># Compute the Farneback Optical Flow<br>flow = cv2.calcOpticalFlowFarneback(prev_gray, next_gray, None, 0.5, 3, 15, 3, 5, 1.2, 0)<br><br># Convert the flow to x and y coordinates<br>flow_x = flow[:, :, 0]<br>flow_y = flow[:, :, 1]<br><br># Calculate the magnitude and angle of the flow vectors<br>magnitude, angle = cv2.cartToPolar(flow_x, flow_y)<br><br># Draw the flow vectors on the frame<br>h, w = prev_gray.shape[:2]<br>fx, fy = flow[:, :, 0], flow[:, :, 1]<br>lines = np.vstack([fx, fy, np.ones(fx.shape)])<br>[vx, vy, x, y] = np.linalg.lstsq(lines.T, np.ones(fx.shape), rcond=None)[0]<br>result = cv2.warpAffine(prev_frame, cv2.getRotationMatrix2D((x, y), angle, 1.0), (w, h), flags=cv2.INTER_LINEAR + cv2.WARP_INVERSE_MAP)<br>result[vy:vy + h, vx:vx + w] = next_frame<br><br># Display the resulting frame<br>cv2.imshow(&#39;Optical Flow&#39;, result)<br>cv2.waitKey(0)<br>cv2.destroyAllWindows()</pre><p><strong>Outro</strong></p><p>I hope this article helped you have a quick look at some of the optical flow methods!</p><p>I can personally code your A.I project! Hire me via Fiverr:</p><p><a href="https://www.fiverr.com/share/98GZLA">https://www.fiverr.com/share/98GZLA</a></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=d93ace9a9784" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Vision Transformer for Binary Classification of Custom Dataset with PyTorch]]></title>
            <link>https://medium.com/@konstantinos.gyftodimos/vision-transformer-for-binary-classification-of-custom-dataset-hands-on-fdcd162e605e?source=rss-62b26cb442e7------2</link>
            <guid isPermaLink="false">https://medium.com/p/fdcd162e605e</guid>
            <category><![CDATA[computer-vision]]></category>
            <category><![CDATA[artificial-intelligence]]></category>
            <category><![CDATA[vision-transformer]]></category>
            <dc:creator><![CDATA[Konstantinos Gyftodimos]]></dc:creator>
            <pubDate>Thu, 15 Dec 2022 13:22:55 GMT</pubDate>
            <atom:updated>2022-12-15T14:33:45.477Z</atom:updated>
            <content:encoded><![CDATA[<h3>Contents:</h3><ul><li><strong>Short description</strong>: A short description of ViT.</li><li><strong>Coding part</strong>: Binary Classification with ViT for Custom Dataset.</li><li><strong>Appendix</strong>: ViT hypermeters explanation.</li></ul><h3>Short description:</h3><p>Vision transformers are one of the popular transformers in the field of deep learning. Before the origin of the vision transformers, we had to use convolutional neural networks in computer vision for complex tasks. With the introduction of vision transformers, we got one more powerful model for computer vision tasks as we have <a href="https://analyticsindiamag.com/a-beginners-guide-to-text-classification-using-bert-features/">BERT</a> and <a href="https://analyticsindiamag.com/openai-dumps-its-own-gpt-3-for-something-called-instructgpt-and-for-right-reason/">GPT </a>for complex <a href="https://analyticsindiamag.com/most-popular-nlp-papers-of-2021/">NLP </a>tasks. In this article, we will learn how can we use a vision transformer for an image classification task. For this purpose, we will demonstrate a hands-on implementation of a <a href="https://analyticsindiamag.com/complete-guide-to-t2t-vit-training-vision-transformers-efficiently-with-minimal-data/">vision transformer</a> for image classification.</p><p>The Vision Transformer classification process is summarized in the image below:</p><iframe src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fgfycat.com%2Fifr%2Fpolishednauticalaustralianfreshwatercrocodile&amp;display_name=Gfycat&amp;url=https%3A%2F%2Fgfycat.com%2Fpolishednauticalaustralianfreshwatercrocodile&amp;image=https%3A%2F%2Fthumbs.gfycat.com%2FPolishedNauticalAustralianfreshwatercrocodile-size_restricted.gif&amp;key=a19fcc184b9711e1b4764040d3dc5c07&amp;type=text%2Fhtml&amp;schema=gfycat" width="852" height="480" frameborder="0" scrolling="no"><a href="https://medium.com/media/a3607c15937007cb5f7283273f9a74f4/href">https://medium.com/media/a3607c15937007cb5f7283273f9a74f4/href</a></iframe><h3>Coding part:</h3><p><strong>Step 1 </strong>: <strong>Create an anaconda environment and set-up required libraries.</strong></p><p>Download Anaconda for Windows and then Create Anaconda Environment and activate it via “Anaconda Prompt”:</p><pre>!conda create --name vit_project python=3.8<br>!conda activate vit_project</pre><p>Download <strong>requirements.txt</strong> (<em>link below</em>), put it in your VIT-related project folder, activate the anaconda environment:</p><p><a href="https://drive.google.com/uc?export=download&amp;id=14xiSObMiBNRPSbwyevZ_hRRk7V3R-txF">https://drive.google.com/uc?export=download&amp;id=14xiSObMiBNRPSbwyevZ_hRRk7V3R-txF</a></p><pre>!pip install -r requirements.txt</pre><p><strong>Step 2 : Folder structure for your custom dataset.</strong></p><p>Make sure the folder structure for your classification dataset is the same as the one in the image below:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/460/1*MzA_PXMgLi8yp2mlfLwZyA.png" /><figcaption>Structure your binary data like in the image above</figcaption></figure><p><strong>Step 3 : Coding Finally Begins.</strong></p><p>Libraries:</p><pre>from __future__ import print_function<br>import matplotlib.pyplot as plt<br>import numpy as np<br>import pandas as pd<br>import torch<br>import torch.nn as nn<br>import torch.optim as optim<br>from linformer import Linformer<br>from PIL import Image<br>from torch.optim.lr_scheduler import StepLR<br>from tqdm.notebook import tqdm<br>from vit_pytorch.efficient import ViT<br>from sklearn.metrics import roc_curve, roc_auc_score<br>from sklearn.metrics import confusion_matrix<br>import torch.utils.data as data<br>import torchvision<br>from torchvision.transforms import ToTensor<br>torch.cuda.is_available()</pre><p>Hyperparameters:</p><pre># Hyperparameters:<br>batch_size = 64 <br>epochs = 20<br>lr = 3e-5<br>gamma = 0.7<br>seed = 142<br>IMG_SIZE = 128<br>patch_size = 16<br>num_classes = 2</pre><p>Optional — Automatic Random Dataset Split:</p><pre># input_folder = &quot;dataset_new/&quot;<br># splitfolders.ratio(input_folder, output = &quot;dataset_new_split&quot;, <br>#                    seed = 42, ratio = (.80, 0.10, .10), <br>#                    group_prefix = None)</pre><p>Tensor Transforms &amp; Data Loaders:</p><pre># Tensor Transforms (with Augmentation) and Pytorch Preprocessing:<br>train_ds = torchvision.datasets.ImageFolder(&quot;dataset_new_split/train&quot;, transform=ToTensor())<br>valid_ds = torchvision.datasets.ImageFolder(&quot;dataset_new_split/val&quot;, transform=ToTensor())<br>test_ds = torchvision.datasets.ImageFolder(&quot;dataset_new_split/test&quot;, transform=ToTensor())</pre><pre># Data Loaders:<br>train_loader = data.DataLoader(train_ds, batch_size=batch_size, shuffle=True,  num_workers=4)<br>valid_loader = data.DataLoader(valid_ds, batch_size=batch_size, shuffle=True,  num_workers=4)<br>test_loader  = data.DataLoader(test_ds, batch_size=batch_size, shuffle=True, num_workers=4)</pre><p>Model Building:</p><pre># Training device:<br>device = &#39;cuda&#39;<br><br># Linear Transformer:<br>efficient_transformer = Linformer(dim=128, seq_len=64+1, depth=12, heads=8, k=64)<br><br># Vision Transformer Model: <br>model = ViT(dim=128, image_size=128, patch_size=patch_size, num_classes=num_classes, transformer=efficient_transformer, channels=3).to(device)<br><br># loss function<br>criterion = nn.CrossEntropyLoss()<br><br># Optimizer<br>optimizer = optim.Adam(model.parameters(), lr=lr)<br><br># Learning Rate Scheduler for Optimizer:<br>scheduler = StepLR(optimizer, step_size=1, gamma=gamma)</pre><p>Custom Training:</p><pre># Training:<br>for epoch in range(epochs):<br>    epoch_loss = 0<br>    epoch_accuracy = 0<br>    for data, label in tqdm(train_loader):<br>        data = data.to(device)<br>        label = label.to(device)<br><br>        output = model(data)<br>        loss = criterion(output, label)<br><br>        optimizer.zero_grad()<br>        loss.backward()<br>        optimizer.step()<br><br>        acc = (output.argmax(dim=1) == label).float().mean()<br>        epoch_accuracy += acc / len(train_loader)<br>        epoch_loss += loss / len(train_loader)<br><br>        with torch.no_grad():<br>            epoch_val_accuracy = 0<br>            epoch_val_loss = 0<br>            <br>        for data, label in valid_loader:<br>            <br>            data = data.to(device)<br>            label = label.to(device)<br><br>            val_output = model(data)<br>            val_loss = criterion(val_output, label)<br><br>            acc = (val_output.argmax(dim=1) == label).float().mean()<br>            epoch_val_accuracy += acc / len(valid_loader)<br>            epoch_val_loss += val_loss / len(valid_loader)<br><br>    print(<br>        f&quot;Epoch : {epoch+1} - loss : {epoch_loss:.4f} - acc: {epoch_accuracy:.4f} - val_loss : {epoch_val_loss:.4f} - val_acc: {epoch_val_accuracy:.4f}\n&quot;<br>    )</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/613/1*uVjWM6v1Ok4RLmccHMyqYw.png" /><figcaption>Training Preview</figcaption></figure><p>Model Saving &amp; Loading for future use:</p><pre># Save Model:<br>PATH = &quot;epochs&quot;+&quot;_&quot;+str(epochs)+&quot;_&quot;+&quot;img&quot;+&quot;_&quot;+str(IMG_SIZE)+&quot;_&quot;+&quot;patch&quot;+&quot;_&quot;+str(patch_size)+&quot;_&quot;+&quot;lr&quot;+&quot;_&quot;+str(lr)+&quot;.pt&quot;<br>torch.save(model.state_dict(), PATH)</pre><pre># load saved model:<br>PATH = &quot;epochs&quot;+&quot;_&quot;+str(epochs)+&quot;_&quot;+&quot;img&quot;+&quot;_&quot;+str(IMG_SIZE)+&quot;_&quot;+&quot;patch&quot;+&quot;_&quot;+str(patch_size)+&quot;_&quot;+&quot;lr&quot;+&quot;_&quot;+str(lr)+&quot;.pt&quot;<br>efficient_transformer = Linformer(dim=128, seq_len=49+1, depth=12, heads=8, k=64)<br>model = ViT(image_size=224, patch_size=32, num_classes=2, dim=128 ,transformer=efficient_transformer, channels=3)<br>model.load_state_dict(torch.load(PATH))</pre><p>Model Evaluation — Accuracy:</p><pre># Performance on Valid/Test Data<br>def overall_accuracy(model, test_loader, criterion):<br>    <br>    &#39;&#39;&#39;<br>    Model testing <br>    <br>    Args:<br>        model: model used during training and validation<br>        test_loader: data loader object containing testing data<br>        criterion: loss function used<br>    <br>    Returns:<br>        test_loss: calculated loss during testing<br>        accuracy: calculated accuracy during testing<br>        y_proba: predicted class probabilities<br>        y_truth: ground truth of testing data<br>    &#39;&#39;&#39;<br>    <br>    y_proba = []<br>    y_truth = []<br>    test_loss = 0<br>    total = 0<br>    correct = 0<br>    for data in tqdm(test_loader):<br>        X, y = data[0].to(&#39;cpu&#39;), data[1].to(&#39;cpu&#39;)<br>        output = model(X)<br>        test_loss += criterion(output, y.long()).item()<br>        for index, i in enumerate(output):<br>            y_proba.append(i[1])<br>            y_truth.append(y[index])<br>            if torch.argmax(i) == y[index]:<br>                correct+=1<br>            total+=1<br>                <br>    accuracy = correct/total<br>    <br>    y_proba_out = np.array([float(y_proba[i]) for i in range(len(y_proba))])<br>    y_truth_out = np.array([float(y_truth[i]) for i in range(len(y_truth))])<br>    <br>    return test_loss, accuracy, y_proba_out, y_truth_out<br><br><br>loss, acc, y_proba, y_truth = overall_accuracy(model, test_loader, criterion = nn.CrossEntropyLoss())<br><br><br>print(f&quot;Accuracy: {acc}&quot;)<br><br>print(pd.value_counts(y_truth))</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/625/1*ecwqyGyQwBcDM3v-t8tI_w.png" /><figcaption>Accuracy Preview</figcaption></figure><p>Model Evaluation — ROC Curve:</p><pre># Plot ROC curve:<br><br>def plot_ROCAUC_curve(y_truth, y_proba, fig_size):<br>    <br>    &#39;&#39;&#39;<br>    Plots the Receiver Operating Characteristic Curve (ROC) and displays Area Under the Curve (AUC) score.<br>    <br>    Args:<br>        y_truth: ground truth for testing data output<br>        y_proba: class probabilties predicted from model<br>        fig_size: size of the output pyplot figure<br>    <br>    Returns: void<br>    &#39;&#39;&#39;<br>    <br>    fpr, tpr, threshold = roc_curve(y_truth, y_proba)<br>    auc_score = roc_auc_score(y_truth, y_proba)<br>    txt_box = &quot;AUC Score: &quot; + str(round(auc_score, 4))<br>    plt.figure(figsize=fig_size)<br>    plt.plot(fpr, tpr)<br>    plt.plot([0, 1], [0, 1],&#39;--&#39;)<br>    plt.annotate(txt_box, xy=(0.65, 0.05), xycoords=&#39;axes fraction&#39;)<br>    plt.title(&quot;Receiver Operating Characteristic (ROC) Curve&quot;)<br>    plt.xlabel(&quot;False Positive Rate (FPR)&quot;)<br>    plt.ylabel(&quot;True Positive Rate (TPR)&quot;)<br>#     plt.savefig(&#39;ROC.png&#39;)<br>plot_ROCAUC_curve(y_truth, y_proba, (8, 8))</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/626/1*nFJnOGVED4VOHeg9Vqal_g.png" /><figcaption>ROC Curve</figcaption></figure><p>Model Evaluation Confusion Matrix</p><pre>from sklearn.metrics import confusion_matrix<br>import seaborn as sn<br>import pandas as pd<br><br>y_pred = []<br>y_true = []<br><br>net = model<br># iterate over test data<br>for inputs, labels in test_loader:<br>        output = net(inputs) # Feed Network<br><br>        output = (torch.max(torch.exp(output), 1)[1]).data.cpu().numpy()<br>        y_pred.extend(output) # Save Prediction<br>        <br>        labels = labels.data.cpu().numpy()<br>        y_true.extend(labels) # Save Truth<br><br># constant for classes<br>classes = (&#39;cats&#39;, &#39;dogs&#39;)<br><br># Build confusion matrix<br>cf_matrix = confusion_matrix(y_true, y_pred)<br>df_cm = pd.DataFrame(cf_matrix/np.sum(cf_matrix), index = [i for i in classes],<br>                     columns = [i for i in classes])<br>plt.figure(figsize = (12,7))<br>sn.heatmap(df_cm, annot=True)<br># plt.savefig(&#39;cm.png&#39;)</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/595/1*s24kqZiPT0Q4euYUK0dXKQ.png" /><figcaption>Confusion Matrix for cats-dogs dataset</figcaption></figure><p>Model Inference on New Images:</p><pre># Inference on Single Images (cats-dogs):<br>test_image = &quot;new_cat_image.jpg&quot;<br>test_image_null = &quot;new_dog_image.png&quot;<br>image = Image.open(test_image)<br>image_null = Image.open(test_image_null)<br><br># Define tensor transform and apply it:<br>data_transform = transforms.Compose([transforms.Resize((224, 224)), transforms.ToTensor()])<br>image_t = data_transform(image).unsqueeze(0)<br>image_null_t = data_transform(image_null).unsqueeze(0)<br><br># Labels:<br>for inputs, labels in test_loader:<br>        labels = labels.data.cpu().numpy()<br><br># Prediction:<br>out_cat = model(image_t)<br>out_dog= model(image_null_t)<br>print(&quot;predicted cat tensor:&quot;, out_cat)<br>print(&quot;predicted dog tensor:&quot;, out_dog)<br>print(&quot;&quot;)<br># Print:<br>if(labels[out_cat.argmax()]== 0):<br>    print(&quot;smoke&quot;)<br>else:<br>    print(&quot;else&quot;)<br>    <br># Show Image:<br>plt.figure(figsize=(2, 2))<br>plt.imshow(image)<br>plt.show()<br># Print:<br>if(labels[out_dog.argmax()]== 0):<br>    print(&quot;cat&quot;)<br>else:<br>    print(&quot;dog&quot;)<br>    <br># Show Image Null:<br>plt.figure(figsize=(2, 2))<br>plt.imshow(image_null)<br>plt.show()</pre><h3>Appendix :</h3><p><strong>ViT Hyper-Parameters:</strong></p><blockquote>1. image_size: int (max size of w or h)<br>2. patch_size: int (# of patches, image_size must be dividable with patch_size, MUST be greater than 16)<br>3. num_classes: int (# of classes)<br>4. dim: int (last dimension of output tensor after linear transformation nn.Linear(..,dim))<br>5. depth: int (# of transformer blocks)<br>6. heads: int (# of heads in Multi-head Attention layer)<br>7. mlp_dim: int (dimension of the MLP-feedforward layer)<br>8. channels: int (image channels = 3)<br>9. dropout: float (between [0,1] — dropout rate of neurons)<br>10. emb_dropout (between [0,1] — dropout rate of embeddings — usually is 0)</blockquote><p><strong>ViT Learning Rate &amp; Loss Function:</strong></p><blockquote>Optimizer: ADAM</blockquote><blockquote>Learning Rate: StepLR (decays LR by gamma every #(step_size) of epochs)</blockquote><blockquote>Loss Function: CrossEntropy (remember to try BinaryCrossEntropy also: nn.BCELoss())</blockquote><h3>Outro:</h3><p>I can personally code your A.I project! Hire me via Fiverr:</p><p><a href="https://www.fiverr.com/share/98GZLA">https://www.fiverr.com/share/98GZLA</a></p><p>I hope my tutorial was of help!</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=fdcd162e605e" width="1" height="1" alt="">]]></content:encoded>
        </item>
    </channel>
</rss>