The common problem with using the latest release of any framework is that there are no or very few adopters, docs are not updated or point to older versions. We encountered a similar problem while integrating MongoDB driver with Apache Spark 2.X. Majority of the library docs available as of today work only with spark 1.5+.
All we wanted to do was to create a dataframe by reading a mongodb collection. After a lot of googling, we figured out there are two libraries that support such operation:
We decided to use go ahead…
The bane of using bleeding edge technology is very less or hidden information of new features in the latest version. We at Unnati use bleeding edge releases of many data science tools for various research and production systems. In this post we explain how to add external
jars to Apache Spark 2.x application.
Starting Spark 2.x, we can use the
--package option to pass additional jars to
spark-submit. Spark will look through the local
ivy2 repository for the jar, if it is missing, it will pull the dependency from the central maven server.
$SPARK_HOME/bin/spark-submit --packages org.mongodb.spark:mongo-spark-connector_2.10:2.0.0 <py-file>
In the above…
Booking a hotel is a fairly involved decision. You want to ensure that you are making an informed decision. However, unlike most e-commerce purchases, the product on offer here (a hotel) isn’t as standardised (yet), and thus there are a lot of factors to consider.
Some of the commonly asked questions before booking a hotel include -
These are questions where the answer isn’t exactly a straightforward yes or no, but instead lies on a spectrum and needs a careful consideration of…
How to fix word spacing?
Word cloud is one of the most common visualizations we see today, especially with social media analytics. Open source libraries like D3JS have eased developers life. With these libraries we can quickly wire data and get beautiful visualizations. Thanks to Mike Bostock for giving the community D3JS and http://bl.ocks.org. With bl.ocks, we have a plethora of visualizations from the community, open to public with their implementation.
This library from Jason Davies — https://github.com/jasondavies/d3-cloud , can help you build a word cloud in 5 minutes or less. A big thank you to Jason for this handy…
Word embedding is a technique of converting words to vectors of a high dimension space. In simple terms, in each dimension, we group words based on a particular aspect — gender, colour etc., and score the words based on similarity in that space.
For example — “I have a red car, maroon shirt and a grey bicycle”
One of the dimensions can represent colour. Red, maroon and grey are assigned similar scores. While rest of the words will have very different scores. Another dimension can represent type of object. Car and bicycle are assigned similar scores because they are automobiles.
Artificial Neural Networks (ANNs) have totally changed what computers are capable of learning. Though neural networks date back 1940s, we are seeing an astonishing amount of increase of its applications in the recent 5–10 years.
Artificial neural networks are modeled on the functioning of the human brain, where the input is converted into output based on a series of transformations. Though they are capable of achieving complex tasks, the way they work is fairly straight forward.
Three main concepts which explain the working of neural networks:
This is a simple computation unit which takes…
April 26, 2015
I started a side project on Scala with a group of friends (noobs in scala). We chose Scala because it is well known for type safety and functional programming with support for OOP. One of the important parts of the project was speaking to a REST API which returned JSON responses.
We began our hunt for efficient JSON parsers on scala and soon we were flooded with libraries:
With so many options, we were confused! Thanks to this wonderful post from Ooyala Engineering team for putting up a nice comparison of libraries…
January 22, 2015
Every time I look at the examples page of D3, I’m simply go…
@mbostock has transformed how visualizations are created for web.
Today I learnt how to use svg markers with D3. I was using force layout to analyze graphs, just like this example. But I wanted a directed graph!
September 7, 2014
I always wanted to setup a media server at home for the following reasons:
The easiest solution was to turn my RaspberryPi into a DLNA server. For this I required to a few basic packages and had to configure each.
Data Scientist | Technology Enthusiast