satojkovic – Medium

satojkovic

satojkovic

Paper Review: Pixel Aligned Language Models

In previous research on vision and language alignment, most studies have used the entire image as input. In contrast, this paper proposes…

5d ago

Paper Review: Pixel Aligned Language Models

5d ago

satojkovic

Prompt Ensemble in Zero-shot Classification using CLIP

CLIP is pretrained to predict whether an image and text are paired within a dataset. As shown in (2) and (3) of the diagram, for zero-shot…

May 6

Prompt Ensemble in Zero-shot Classification using CLIP

May 6

satojkovic

Scenic: A JAX Library for Computer Vision Research and Beyond

Scenic is an open-source JAX library focused on Transformer-based models. I recently saw the library while reading a paper on video…

Mar 20

Scenic: A JAX Library for Computer Vision Research and Beyond

Mar 20

satojkovic

Create your own GPT and generate text with OpenAI’s pre-trained parameters

The first entry for 2024 is about GPT. Create your own GPT model, load the pre-trained parameters published by OpenAI, and perform a series…

Jan 14

Jan 14

satojkovic

Paper Review: Video-LLaMA

Research on LLM seems to be accelerating with the release of LLaMA and LLaMA2 by Meta, and I read Video-LLaMA in a study on the…

Sep 10, 2023

Paper Review: Video-LLaMA

Sep 10, 2023

satojkovic

JAX and composable program transformations

The About section of https://github.com/google/jax states the following.

May 21, 2023

May 21, 2023

satojkovic

Vision Transformer from scratch (JAX/Flax)

Recently, I started to use JAX/Flax. Here, I would like to show how I implemented Vision Transformer (ViT) using JAX/Flax and how to train…

Jan 23, 2023

An overview of Vision Transformer

Jan 23, 2023

satojkovic

Kickstart2020 RoundA: Plates

Summary of the problem

Nov 3, 2021

Nov 3, 2021

satojkovic

satojkovic

Research Engineer, Computer Vision and Machine Learning.

Following

Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams