Google Summer of Code (GSOC) sample proposal
I have been receiving a lot of requests to share my GSOC proposal which got selected in GSOC 2015 by VTK. Hence below, I reproduce almost my entire application in entirety. I was really tempted to change some of the points below but I decided to let them be so that you can know of the mistakes that I commited and not give you a fake belief that this was perfect.
Before you get started, let me give you some more background about my application. It took me about a week to research about this application and make this proposal. I started pretty late since I was not that interested in GSOC but was more interested in getting Eulerian Motion Magnificaiton out to more people in the world. Thanks to Rohan Prinja for referring me to this project in GSOC. Kindly use this application only as a template, do not try to stick too hard to this. Be creative and novel! Get in touch if you have any queries.
Eulerian-Motion-Magnification
Personal Details
Name: Ashray Malhotra
University: Indian Institute of Technology, Bombay
Email: redacted (connect with me on linkedin)
Telephone: redacted
Instant Messaging: redacted
Twitter: https://twitter.com/ashray_malhotra
Country of Residence: India
Timezone: IST (GMT + 0530)
Primary Language: English
I am a fourth year dual degree student pursuing B.Tech. + M.Tech. in Electrical Engineering(specialisation in Signal Processing) at Indian Institute of Technology, Bombay. My semester will complete in late April leaving me enough time to get ready for my GSoC project. If I am selected, I shall be able to work around 40 hrs a week on the project, though am open to putting in more effort if the work requires.
Why this project?
This technique of motion magnification unlocks completely new avenues like detection of blood vessels,magnify motions of small babies etc. I look at this technology from the perspective of enabling a whole new dimension of use cases. So our aim should be to provide it to users/developers in as flexible form as possible so that people can build upon it for their personal use cases that they can think of.
Another advantage of this technique is that it can work in near real time. Hence it unlocks completely new avenues for technology.
Technical Knowledge
I am a 4th year dual degree student in IIT Bombay. I am enrolled in a 5 year B.Tech. + M.Tech. course. My major is Electrical Engineering(it’s more of maths and electronics though). My specialisation is in the field of Communications and Signal Processing. The courses that I have done include
- Digital Signal Processing
- Fundamentals of Digital Image Processing
- Advanced Computing for Electrical Engineers(A compressed version of important CS courses)
- Advanced Topics in Signal Processing
- Computer Vision
- Algorithms for Medical Image Processing
I have done many algorithms and machine learning courses on coursera.
Some of my previous projects in image processing and computer vision include -
- Dental Imaging Project with ReDx (MIT Media Lab), built a real time system(designed algorithms) to detect caries(dental cavities) using intra oral camera. This video summarises the complete work.
- Digit recognition using Adaboost
- Video Stabilisation using RANSAC and least squares on SIFT features
- Denoising MRI images
- Digit Recognition on MNIST database. Achieved nearly 91% accuracy with a 100 dimensional subspace(for images of size 28*28 = 784) using PCA technique. Implemented LDA(Fisher’s LDA) and ICA techniques.
Some of my signal processing projects include -
- Source localisation
- Audio Source Seperation
- Speech Recognition System
Am currently working on many interesting projects, including Iris detection, finding out innovative techniques for improving temporal resolution of a signal(research project with Prof Subhasis Chaudhuri) etc. Am also currently working on implementing the video magnification algorithm. Read ahead in the personal motivation section for the reasons why I have already been working on implementing this technique (that section has been redacted).
I interned with Goldman Sachs technologies in my third year. My work involved extensive use of Java. Goldman involved working with teams across the globe. So I am comfortable working with people across multiple time zones and this shall not be a problem in the development process. I have worked with C++ in my advanced computing course. Based on my skills, I was selected to be a Teaching Assistant for IIT Bombay’s first online course, in which programming was taught in C++.
Programming languages I have previously worked with include C++, Java, Cilk, Python, Matlab, openGL, CUDA, Assembly Language Programming, etc. I have built VTK on my macbook. I also ran a few test codes from the site and it was awesome :D. This was the cylinder test. I have also worked with more advanced tools like Slicer.
Project
Project Abstract
This project aims to develop algorithms to extract out subtle changes in a time-dependent data set and amplify them. To begin with, the data set can be considered as videos(2D data at each temporal resolution) but scope of the project can be modified to deal with different dimensional datasets at each time instant. We also plan to build custom views for Video magnification in VTK. We extract out temporal and spatial frequencies from the given data and amplify specific frequencies according to our use case.
Technical Details
Below we have explained the significant steps of the algorithm.
- We start by considering each of the frame of the video independently for analysis
- We choose a suitable colour space in which we want to work, This could depend on the specific application that we are dealing with, though in the paper, authors have used NTSC color space for further operations.
frame = rgb2ntsc(rgbframe);
- For each of the color level(or spectrum level for hyper spectral images), we build a Laplacian pyramid. Note the Laplacian Pyramid is built of the NTSC image, not the RGB image.
[pyr,pind] = buildLaplacianPyramid(frame(:,:,1))
- We initialise the lowpass filter to have the Laplacian Pyramid values. Later, we will change the values of the filter limits to perform temporal filtering of the signal.
- We consider the next frame, and perform the similar laplacian pyramid calculation on it(after converting in NTSC colour space).
- The value of laplacian pyramids of subsequent frames is used to perform the temporal filtering of the signal. The exact method of temporal filtering could vary with application, for ex. we could use a butterworth filter, an IIR filter etc. For an IIR filter, we perform a simple multiplicative update of both the filter threshold
lowpass = (1-const_factor)*lowpass + const_factor*pyramid;
- The difference of the computed thresholds gives us the range of frequencies to work with(magnify or suppress, based on the laplacian pyramid level)
filtered = (cutoff1 - cutoff2);
- Now we have performed the temporal selection of the signal. We will selectively perform the spatial magnification of the signal. Note that the equation that we will use to magnify the spatial frequencies(amount of magnification of a specific spatial frequency), can vary across different use cases, but in this paper, the authors have used linearly increasing magnification with spatial wavelengths with a specific threshold after which the magnification remains constant.
- The above figure gives us the magnification value(Y axis) at each spatial wavelength(X axis) level(which is given by the pyramid index). We multiply the filtered signal above by the appropriate multiplication(or magnification) factor to get the modified filter values. Note that we will have to do this for all the images in the pyramid.
- Using these final filtered values, we again recreate the image frame from the new image pyramid.
- We can also consider adding some chromatic aberration if we either want to mix the motion magnified signal and the original signal better(more homogeneously without weird colour artifacts) or we want to show a clear motion in the subsequent frames hence clearly separate the motion magnification and the frame(we can have a contrast between them). Which of the two cases happens will depend on the exact method(and value) of chromatic aberration.
output(:,:,1) = output(:,:,1)*chromaticAttenuation1; //Red channel chromatic attenuation
output(:,:,2) = output(:,:,2)*chromaticAttenuation2; //Green channel chromatic attenuation
output(:,:,3) = output(:,:,3)*chromaticAttenuation3; //Blue channel chromatic attenuation
- So we finally have the magnified motion. But we need the magnified motion on the image. Hence we add this magnified motion to the original input frame to get the output frame which we will write back to the output video.
output = frame + output;
Practically observing outputs of the above algorithm
To understand the algorithm completely and to verify if the outputs are intuitive, I ran the code and I am summarising the results below.
A baby sample video (first part of the video below) was used to generate the images below.
Below is the second frame of the video. This is what goes as an input to our algorithm.
The algorithm find out the difference between frame 1 and frame 2 and gives us the following output
Note that the above difference image had to be scaled to normalise it to cover the entire intensity levels otherwise its very difficult to see the difference and we only observe a black image. But one thing is really interesting in this image, that amidst the random squares that we observe, there is one straight red line, nearly in the centre of the image. If we try to match this to out input image, we will observe that this line belongs to the chain of the child’s clothes. But as it would be obvious here, if we add this motion to the original frame, the resultant frame will be really bad since we see lots of random areas where there was ideally zero motion.
To solve this problem, we use chromatic aberration factors. We can see that most of the useful information is in red here, so we suppress the other color channels(by multiplying them by 0.1, so reducing them to 10% of original value). After performing the above calculation, the frame becomes
It is very clear now that the major motion is the one of the child’s zip. This should be expected because other areas are more continuous, hence have a lesser frequency components, whereas a zip is like a discontinuity which contributes multiple frequencies and hence has a higher contribution to the difference frame.
We add the above frame to our original frame(in the RGB domain) and we get the final frame(for the output video) below.
If we closely notice, some of the bad areas in the image(patches in smooth areas) are exactly the same as what were observed in the final difference image(after chromatic aberrations).
Now, we would expect that the difference image that we get for nearby frames should be similar(since we aren’t allowing for infinite frequency changes) and should be more different for frames which are further apart.
This claim is verified by the 3 images below.
We also see much lesser artefacts in frame 29–30 difference because after some time, the error caused by the initial condition nullifies.
Clearly we can see that the above technique is able to identify the motion in the frame(we can say this because it identifies the child’s body motion, or the motion of his zip and neglects the other bodies in the frame). Also as we have seen the new created frames aren’t that big a problem to the video quality(artefacts).
The final processed video was created by running the complete code. Output video can be found below.
Which basically proves Video Magnification is awesome :D
Timeline
Pre GSOC
Implement the video magnification algorithm. I am even currently working on the algorithm as my course project and this will have to be completed before April end, which means before the start of GSOC project. This also easies my task in those 12 weeks and helps me concentrate on shipping professional code.
Community Bonding
Getting acquainted with the code base of VTK and the procedure that needs to be followed to submit code and get it reviewed. Discussing with the team on what exactly needs to be the problem statement(minute details, like kind of dataset we want to use, what parameters should be user choices, like magnification factors, what should be algorithmically decided based on input data type, etc).
Week 1
Understand the relevant parts of the VTK code base and try to figure out how the final product should look like, how many and what kind of views should be made etc.
Week 2
Begin implementing magnifying motions. In this week, focus on implementing spatial pyramids, given an image using Gaussian and Laplacian pyramid techniques. Work load has been kept less this week so that I can practice other crucial techniques such as adding test cases and understanding code review work flow.
Week 3
Finally get to video(and/or some other temporally varying data if the team decides to tackle that use case). Understand temporal filtering of signals and begin coding temporal filtering of signals.
Week 5–6
Code all different kind of temporal filters(IIR, Ideal etc, Which ones to implement will be decided in the community bonding period. My personal choice would be to code the 4 used in the paper). Also this would be the mid term review period.
Week 7–8
Once we have both the spatial and temporal filters ready, to combine these two codes is easy. We would need to spend time though trying to optimise parameters for specific use cases. We can perform enough number of experiments to come up with a guideline or a rule book on how to choose the parameters. This would be useful to developers when they want to use this code for their applications and use cases.
Week 9–10
Start working on the designing views for implementation of this code in the toolkit.
Week 11–12
Take feedback from the community and iterate on the designs and improvise on use cases. Ensure code quality by adding more test cases and working with more videos. Work to make document, blogs or videos to help increase the user base for this product(Subject to developer community approval).
Week 13
Spare week in case of some work getting delayed, in case of any emergency or otherwise.
Personal Inspiration for the Project
I am really excited to work on the idea of video magnification. I have been following that topic since really long, first came to know about it from their TED talk, which inspired me to read their research paper. I was also working in the week long ReDX camp conducted by MIT Media lab(conducted by Prof Ramesh Raskar). One of the projects in that lab was to find out the human arteries. I was shocked to find out that doctors still need to stop the blood flow and then find the arteries. According to some data that was collected during the workshop, it could take upto 7 minutes to find an infants(or older people’s) correct blood vessels; in a crucial operation, this could be lethal. The team in that week had come up with an IR device which would assist the RGB camera of the cellphone and using Augmented Reality would superimpose the blood vessels(visible in IR) on to the RGB smartphone camera.
But I still had a problem with this solution, that we would still need a separate hardware, it could not be user friendly etc. So I wanted to apply the concept of Video Magnification to be able to magnify the blood vessel movement. This way, the injections could be correct in the first attempt and in no time.
(1 paragraph from the original application removed)
We all know that video magnification can solve “this” simple problem, the only issue is that it seems no one has actually put in effort to get it to the people. I decided to implement this technology myself, out of interest and as a course project in my Medical Image Processing course. By April end(this year), I would have myself implemented this technique(which would mean I would completely know how this works, not just have a theoretical understanding). I will be free in summers and even in the final year(5th year), we just have a research project and a course or two. It is my aim to be able to get this technology in the hands of as many people as I possibly can. I already had plans to work on an implementation of this technology in these summer vacations. I would be glad if I could do this as a part of GSOC along with Kitware.
I have a huge personal inspiration to get this technology out to the world, and you can be assured of my motivation to complete this project.
References
Most of the content including images and input video for this document have been taken from the website of the authors of the paper.