Just before I flew to London, UK for the 5-day Spark/Scala Workshop I sent out emails to the leaders of Apache Spark meetups in London. There are two meetup actually — Spark London and London Spark Coding Dojo.
I got an email back from Ulrich — the leader of London Spark Coding Dojo — in the morning next day. I was surprised how welcoming the email was.
Sounds good I was going to set up a meet up for next week so that would work just fine.
“What a coincidence!” one could say, but I personally wouldn’t call it a coincidence — it is that I wanted something (speaking about Spark) the other person wanted too (having a speaker about Spark) so it was just yet another example that you need to be vocal about your needs to have them heard.
After a not-so-short skype session — Ulrich and I are very talkative :) — we decided to have the meetup on Thursday next week.
As it turned out the location of the meetup — The Helicon — was around 30-minute walk from my hotel.
It was the 4th day of my Spark/Scala workshop in which I taught the attendees how to work with Spark MLlib 2.0’s Pipeline API — transformers, estimators, and pipelines. It could be another coincidence that the two different groups — from the workshop and the meetup — wanted the same topic to be covered the same day, but actually it was me to decide to continue the journey into Spark using DataFrames for Machine Learning in Spark MLlib. I almost repeated what I taught during the workshop. It worked very well for both groups. I enjoyed doing it twice as I could have improved my understanding of Spark MLlib’s API so much better (remember the saying “teach others to understand better”).
We started at 6:30pm in The Helicon building on the Oracle floor. It was in a very nice venue with 9 people (I think it was Michel who left after about 45 minutes so he’s not in the picture).
We talked about how easy it is to deploy Spark applications to a cluster using Spark Standalone and Hadoop YARN. We used spark-shell to work with the clusters. We learnt about the difference between the available master URLs — local vs clustered. We had discussions about Spark MLlib’s API after I demoed Tokenizer and HashingTF transformers with LinearRegression ML algorithm. See Mastering Apache Spark notes for more coverage (and ping me offline if you feel you need slightly more personal explanation).
What amazed me during the meetup was that people wanted to stay longer! We finished at around 9:30pm yet it seemed people would have stayed longer. I couldn’t have since I’ve got the last 5th day of the Spark/Scala workshop and felt a bit tired (from too much excitement obviously!)
After the meetup I spoke to Ulrich who offered help setting up a one or two-day-long Spark/Scala workshop in the Oracle office in London so stay tuned for more information when it happens exactly. Ping Ulrich or me to stay updated if you don’t hear from us soon.
Thanks a lot Ulrich for having me and the participants for letting me demo Apache Spark. I always love demoing Apache Spark (as that’s my personal way to be better at Spark!) Ping me if you need a similar meetup with me about Spark in your city. I wish I could be in every big city in Europe this year!
p.s. There’s another meetup scheduled this Friday, July 8, 2016 at 5:30 pm Warsaw time with me about Apache Spark — What is Apache Spark exactly? AMA o Apache Spark z Jackiem Laskowskim. This time however it’s a remote meetup (which is going to be a good opportunity to learn how to use such “venues” for demos). Feel invited!