In this tutorial, we will learn how to build a model in keras to learn a simple linear function. By learning I mean, is to find the desired values of weights that gives the correct output. In the example of a simple model, y = m*x + b , m and b are the weights that we are interested in learning. We’ll be using keras to build a simple linear model that learns a function. For those of you who are unfamiliar with keras, it is a deep learning framework, that is built on top of theano and tensorflow (more Popular Deep Learning Frameworks) providing a simple API to work with.

At core of library, we have keras.model which builds up our model. Most common type of model is a Sequential model which basically stacks layers one over the other in a sequence. As a first step let us import our dependencies:

from keras.models import Sequential

import numpy as np

import random

Next step is to define our function that we want to learn.

shape = (20,2)

x = random(shape)

# Weights that defines a specific function we will learn

weights = [1.0,2.0]

y = np.dot(x,weights)

shape defines the shape of our input. We have 20 examples of our input having 2-dimensions each. weight defines the parameter values that we want to learn. We want our model to learn weights [1,2] starting with a random initialization. Let’s build our model now:

model = Sequential()

model.add(Dense(1, input_shape = (2,)))

We define out model as a Sequential model, and add a Dense layer to it. Dense layer simply implements a linear function with given inputs and outputs. We have defined input_shape = (2,) which means each input is 2 — dimensional. Let us peek at weight values to see what they are before model learns any thing.

model.get_weights()

>> [array([[ 0.7215],

[ 0.1305]], dtype=float32), array([ 0.], dtype=float32)]

So, weights are initialized randomly with 0.7215 and 0.1305 . Our goal is to learn the original function with weights 1.0 and 2.0 . So, let us define out loss function and optimizer. A loss function is a critical part of any deep learning/machine learning model. It defines how bad our model is performing (thus loss) and learning process involves modifying our weights to minimize the loss. Loss is minimized by optimization methods such as Gradient Descent or a more complex optimization methods like Adam. Now, we’ll compile our model with loss function and an optimizer.

model.compile(loss='mse', optimizer=Adam(lr=0.5))

model.evaluate(x,y, verbose=0)

>> 1.183187198638916

We are using mse or Mean Squared Error loss, and use Adam as out optimizer with a learning rate of 0.5. Learning rate defines how fast our model will learn but we need to be careful here since a large value can cross local minima (minimum value of out loss function) and model will not converge (which is no good).

Now, we’ll train our model to make it learn the target function.

model.fit(x,y,epochs=20, batch_size=5)

model.fit does job in keras, and performs all the complex operations in following order:

- Calculate function output from given input
- Calculate loss from the obtained output (Mean Squared Error in this case)
- Backprop through model to calcuate gradients with respect to weights
- Updates the weights by using the optimization method.
- Repeats until specified epochs are done.

Let us see how well our model performs once its trained:

model.evaluate(x,y)

>> 2.3498186010328937e-07

model.get_weights()

>> [array([[ 1.0006],

[ 1.9992]], dtype=float32), array([ 0.0005], dtype=float32)]

Weights are now what we wanted to learn. i.e 1 and 2 (although not exactly but very close).

]]>My project was to make a library of Time Series methods for astronomical X-ray data. Software Package called Stingray aims to provide a dedicated package for astronomers to carry out their research. It provides library methods for data analysis, simulations and models to fit data among several other features. Goal of my project was to implement methods for data analysis on Lightcurves which is an astronomical jargon for time series. And with the help of my mentors Matteo Bachetti and Daniela Hupponkothen, almost all the milestones set for the project has been accomplished 🎉.

My Project involved contributions to two repositories:

List of Methods Implemented in Stingray during GSoC includes:

- Cross Correlaation
- Auto Correlation
- Bispectrum
- Window Functions
- 2D Windows for Bispectrum
- Rebining for Dynamical Powerspectrum

Each module implemented is also accompanied by a tutorial on how to use it in the notebooks repository.

During the project, I opened total of 13 pull requests (both stingray and notebook repository). Some of Pull requests involved fixing some issues, while most of them implemented a new algorithm.

Links to major Pull requests in stingray repository:

- Cross Correlation (Merged)
- Bispectrum (Merged)
- Auto Correlation (Merged)
- Rebinning for Dynamical Power Spectrum (Merged)
- Window Functions and 2D windows for Bispectrum (Merged)

- Cross Correlation and AutoCorrelation Notebook (Merged)
- Bispectrum Notebook including 2D windows (Merged)
- Window Functions (Merged)

Total of about 70 commits were made during the last couple of months.

Link to all commits made in Stingray Repository.

Link to all commits in Notebooks Repository.

It has been a wonderful experience into the world of open source software development so far, with a lot of learning and fun along the way. My experience with timelab and python software foundation has been exceptional, particularly, because of my mentors. Mentors helped me all along the way, providing their valuable insights into any issue I had and suggested a suitable solution to a problem whenever I had one. They kept on providing me with proper literature references to the algorithms that I had to implement thanks to their expertise in the domain. I am extremely lucky to be a part of this community and hope to continue my associations with stingray and time lab. For all the future gsocers out there, Time Lab and python is highly recommended. Join our slack channel to start contributions 💻.

]]>Window functions have their research importance in signal processing and any software package supporting scientific research on time series data analysis must have them. So, I implemented even window functions in time domain. Window functions currently supported by stingray include: Uniform window, parzen window, hamming and hanning windows to name a few. Window functions are available in stingray.utils package. Here is a screenshot from notebook repository demonstrating hamming window in time and freq. domain.

Complete link to notebook can he found here.

Next, I extended these windows into 2D lag windows and applied them to bispectrum. Bispectrum now has an additional argument window, which specifies type of window to apply to bispectrum calculations. By default, now window is applied. Updated Bispectrum notebook can be found here. Both of the functionalities are merged and are part of stingray 🎉.

Third evaluation is due in few days and we are required to submit Work Product as a part of GSoC. This will include links to Pull request and any other additional information related to the project. I’ll be updating report here most probably.

So, This is almost an end to GSoC’17 . It has been such an exceptional journey folding in 3 months a lot of learning and open source experience that would have taken me months to accumulate. I plan to continue my work with stingray and hope to continue this amazing journey into the world of open source.

]]>yes, I passed 🎉. Otherwise, I would not been blogging now. I must say that I am very happy and satisfied with my contributions and comments from Matteo reflects that.

So, last week has been busy as whole GSoC has been and none less exciting than before. For the library, the goal was to implement Bicoherence after Bispectrum has been done. It turned out that there was a very old PR already open on Bispectrum and Bicoherence. But luckily, my implementation was different from that of nithinsingh’s PR. I talked with Matteo and he supported the idea of having more than one implementations of the same Algorithm. Bispectrum has been reviewed by some of the maintainers and they like overall look of it but made a point that since there were not enough examples online to verify the results, we should let users use it and report errors in issues. Everyone agreed and Bispectrum is now a part of Stingray 🎉.

Next, PR on Dynamical Power Spectrum had been WIP[Work in Progress] for sometime, so I decided to take a look at what is missing. There was some great work done by evandromr and Orkohunter and PR was missing only some rebin functions. So, I went ahead and implemented rebin_time for Dynamical Power Spectrum. I must mention that during this time I had to go back and learn some more git, since I had never made contributions to someone else’s PR. So, what I had to do was to fork evandromr’s repo, and made contributions to his branch and my additions shall be visible once evan merges my changes into his branch. So, following this, evan implemented rebin_frequency and gave final touches to PR and now Dynamical Power Spectrum is also part of stingray 🎉.

]]>Since last time, I have opened up two Pull Requests. One is one *Bispectrum* and other is on *AutoCorrelation*. In particular, work on Bispectrum was very challenging, since I could not find enough help regarding algorithm implementation online.

A few words on Bispectrum. Bispectrum is the Higher Order Spectral analysis method. In simple words, it is Higher order version of PowerSpectrum. It is mostly used to study non-linear interactions which are not possible in the lower order spectra like Power Spectrum. There are two ways in literature to calculate Bispectrum, a direct way and an indirect way. Direct way uses application of convolution theorem, to calculate Bispectra.

Another way to calculate Bispectrum is by using a means of Correlation. Fourier Transform of AutoCorrelation function gives us Powerspectrum of time series. Similarly, Bispectrum is calculate as a Fourier transform of Triple AutoCorrelation function also called as third cumulant. My implementation of Bispectrum involves, indirect method. A third order cumulant of time series is calculated first and its fourier transform is taken to calculate Bispectra. Bispectrum tutorial notebook is also opened as PR in the notebooks repository. Below is the screenshot of Bispectrum plots from notebook repository.

Matteo has not used Bispectrum in his work alot, so he says that he needs to verify results somehow before Bispectrum can be merged into Stingray repository.

The method to tackle was *AutoCorrelation*. Since, *CrossCorrelation *been implemented keeping in mind that AutoCorrelation will subclass from it. So, implementation was as easy as calling constructor of CrossCorrelation with same lightcurve as its input arguments. Some tests were included to verify the implementation and as I write this blog, PR on AutoCorrelation has also been merged into master branch 🎉 .

Second Evaluations are coming up next 😨. Hopefully, I pass that as well. That is it for this blog. More stuff after the second evaluations.

]]>Matteo also pointed out that correlation function common in astronomy has same size as input data. So, we had to introduce a mode parameter, that allowed user to add mode of correlation as an input to CrossCorrelation object. Different modes supported are {full, same and valid}, where default mode is same. Other changes requested by mentors included some refactoring with plots for better usability, making lightcurves class attributes and allowing for lightcurves with different sizes as inputs.

One interesting thing that I would like to share is floating point comparison bug that got me scratching my head. Tests that I wrote were failing for some unknown reasons and it turned out that I had forgotten a very important lesson from Intro to Computer Science class:

Never compare floating point numbers for equality.

Here is a screenshot that I took of a failing test due to a floating point comparison bug.

Final changes have been approved by mentors and CrossCorrelation along with notebook tutorial are soon to be a part of Stingray repository 😉.

I had a good discussion with mentors on next goals for GSoC, and my plans are to create a general Periodogram class to improve code re-usability for other periodograms that are to become a part of stingray. Next, we have a plan to include higher order specta for analysis of time series that include methods like Bispectrum and Bicoherence. More on these in the next blog .

]]>My mentors had encouraged me to focus on exams during all this time and I am again very grateful to them for this. Taking about work, I started off with CrossCorrelation that I had been working on and finished up with a few changes and modifications. Next step was testing of the modules I had written thus far. My organization uses pytest as a testing framework and I had to learn a bit about the framework. I started off with writing tests for each module. Goal was to write unit test for each module to test its validity and working and improve the coverage so that more and more lines of code are covered under tests.

Our organization has setup travis for integration tests that runs every time code is uploaded to github. First few tests actually caused the overall coverage to decrease although tests written for CrossCorrelation had full 100% coverage. Here is a screenshot from coveralls:

It turned out that someone else had made changes to lightcurve.py during same period with no tests written for that. Coveralls was not happy with that and as a result it decreased overall coverage and caused the build to fail due to decreased coverage.

After writing unit tests for missing modules in lightcurve, latest build passes all the tests ✋. Here is a link to the complete pull request that is waiting for review and after review it should get merged into the stingray master branch.

Next, I’ll working on AutoCorrelation in the upcoming days and hope to get that done before 1st evaluation that is from 26th to 30th June.

]]>For those of you who don’t know, Google Summer of Code is a program for university students sponsored by google to encourage more involvement in the open source development. Students receive stipend from Google for their contribution to the open source software over the summer. Best part is that you work directly under a mentor and get to network with brilliant people. So, there is lots and lots of learning.

As a part of GSoC,organizations also require you to write a blog post to share about your work and experiences publicly. So, I’ll be sharing my experiences over next few weeks. Since it’s still early days, I would like to use this blog post as an introduction to my project and my experiences connecting with the community.

The organization that I am working for is Python Software Foundation. It is basically an umbrella organization which means that it has several sub-organizations under it. The sub-organization that I am working for is Time Lab Technologies. My project is tilted Library of Time Series Methods. It is basically a library of methods for X-ray Astronomical Research. The goal is to write fully-tested and well-documented analysis methods for time series data recorded from X-ray Observatories.

Google Summer of code begins with community bonding period in which students are required to interact with mentors and other members of the community, understand existing code base and discuss with members about the project. I have three mentors for my project, Daniela Huppenkothen, Matteo Bachetti and Himanshu Mishra. I must mention here that all of my mentors have been very helpful and encouraging. I get answers to the questions quickly since all of them are from different time zones. I have had a very great time interacting with them during the community bonding period. They have also let me take a break to focus on my end semester exams!.

For the one week I got to work, I implemented basic CrossCorrelation class to get started with the project. After my exams, I am supposed to write tests for the code I have implemented so far and make it ready for merge. Here is incomplete pull request that requires tests before it is ready for merge.

This is it for now. I have got to go back and study for exams (I have got 4 of them in next 4 days :p).

]]>