A never ending Journey

A journey is a story of experiences, solving new problems, an affair with new spectrum of things moving unpredictably. A journey is always phenomenal making you realise that there are always bad roads and good roads.

I am on a never ending journey now — watching people do cool stuff, creating innovative things. I am trying to mimic them to learn how things can be done differently. I am learning new meanings now. This part of journey is interesting. My journey with open source project Tellurium is quite fascinating. I should thank my guide, Kyle Medley for helping me discover this beautiful World. In this blog, I shall be sharing a brief overview of things I tried with Tellurium.

The cool technologies that I used in this project include Apache Spark, Apache Hadoop, Apache Zeppelin, Livy & Docker.

Week 1 — Week 2

Parameter Scan : We have created a new module distributed_parameter_scaning which helps users to provide multiple models and simulations to run for each model and all these models are run in a distributed environment parallelly and then collect results to an array/graph. Interested in knowing a little more ? Read my previous blog . You can also check my pull request related to Parameter Scan

Week 3 — Week 4

Parameter Estimation : To estimate a particular parameter, we now have a new module that also runs in distributed environment. In order to run this, a user provides the model (SBML/Antimony) and bounds of the parameter(s) to estimate. The module internally uses differential evolution where the objective is Sum of Squared Errors. We have tested this for Immigration Death Model and Lotka Volterra Model and presented a poster at Beacon 2017.

Below are the commit links for Parameter Estimation

And here is the Pull Request link for Parameter Estimation.

Week 5 — Week 6

Sensitivity Analysis : This describes how sensitive a particular parameter is for a small change in its value. Like the previous two modules, sensitivity analysis is a new module where users will provide SBML/Antimony models, a custom simulator where the users will have freedom to define their own pre-simulation and simulations. Along with that, the users will provide the parameters for which bounds are provided — these get changed in simulations (in a distributed environment). The Results of Sensitivity Analysis are categorised as follows

a) Metrics → Compute the mean, standard deviation or variance for each of the parameters

For example, sensitivity of PP_K with respect to r1b_k2, r8a_a8, and 
r10a_a10 individually. And the final output will be return the average of getCC(‘PP_K’, ‘r1b_k2’), getCC(‘PP_K’, ‘r8a_a8’) and getCC(‘PP_K’, ‘r10a_a10’) individually.

b) BINS → Given the sizes of bins for each parameter, the final result will provide how many values fall into the each bin of the bins provided.

For example, the user will provide bins of different range for each parameter. And the final output will depict how many times the getCC(‘PP_K’, ‘r1b_k2’), getCC(‘PP_K’, ‘r8a_a8’), and getCC(‘PP_K’, ‘r10a_a10’) values fall into each bin.

c) Everything → Print the results of every simulation ran.

Below are the commits for Sensitivity Analysis

This link has pull request for Sensitivity Analysis.

Below lists the pull requests in providing functionality of distributed computation for Tellurium

Week 7— Week 9

Experimenting with Apache Livy

With Livy, we are trying to decouple Client interaction and Spark Cluster and integrate with Livy so that the users can still run their jobs from any system.

There is a Wiki page https://github.com/sys-bio/tellurium/wiki/Livy-Instructions that describes the work done corresponding to Livy.

We have also built a wrapper (as we are still experimenting with Livy) that helps users to communicate to our Spark Clusters. Here is the brief overview of how it can be done

  1. Every Consumer needs to get register with us.
  2. For every registered user, we shall create a user
  3. Every customer needs to send his/her public key or we can share them password for authentication
  4. Then he can use the wrapper to connect to the server and transfer scripts from his local system
  5. There can be many types of files that he should send
i) Code like that of Zeppelin Notebook
ii) SBML XML File
iii) Addition Python Helper Files (e.g. Custom Simulator in case of sensitivity analysis)

The above diagram makes it more clear regarding the usage of wrapper.

This is how the user can communicate

import distribtellurium as dte

This will import all the required scripts that allows the client to run jobs on cluster.

There is a method available (add_file) in distribtellurium that allows to ship the local code to Spark Cluster. Depending on the type of the file, there is extra parameter available ( run=True ) which the client provides to only one file.

distrib_work=dte()

distrib_work.add_file(filename=”sensitivity_test.py”,run=True)

If there are any additional files, the client can just call the same method but without run parameter or run=False.

distrib_work.add_file(filename=”huang-ferrell-96.xml”)

distrib_work.add_file(filename=”custom_simulator.py”)

Finally, the client should use the start method which runs the “sensitivity_test.py” on the cluster and provides results locally — to the client .

distrib_work.start()

Week 10

Apache Zeppelin was integrated with Apache Livy so that the users can run their spark jobs through Zeppelin which is connected to Livy Server running on the Cluster.

Week 11— Week 12

Dockerization

A docker image container Apache Spark, Apache Zeppelin (connected to Spark Cluster) and latest tellurium build is on its way. By this, we can scale it to any cluster of any size. Here is the link of docker repo.

A video demonstrating the above will be soon added to make the whole process easy to understand.

Thanks for Reading..