I’ve been keeping this small set of blogs on a WP on GoDaddy, which is although not very expensive it still added up to large amount over years.

I’m not very happy with the lack of dedicated domain (which is not possible on Medium anymore) so will be considering Github or any other better option once I’ll figure it out.

Until then, I’ll keep the small archive on Medium.


HBaseCon 2012

We’ve been building on, fixing and deploying HBase for the last 4 years.
We’ve written about why we’re using HBase but not much about what for.

Tomorrow, at HBaseCon, I’ll be talking about our low latency OLAP platform built on top of HBase.
I’ll cover both functional and technical aspects of the system and go through some of the strategies that we use to provide high-throughput, real-time OLAP queries.

If you’re attending the conference we hope to see you there.

Update: here are the slides, video should come soon.

Cosmin


Next Tuesday (31st of May 2011) we’ll host a HBase/Hadoop meetup at the Adobe office in Bucharest. We’ll have Lars George — HBase committer, author of “HBase: The Definitive Guide”, Cloudera Solution Architect for Europe as a special guest.

Our hope is to get to meet more HBase/Hadoop local users to share knowledge. So if you’re using HBase or Hadoop or plan to use them you’re welcome.

Leave a comment if you want to sign-up for an up to 10 minutes lightning talk.

Agenda:
HBase intro — Lars George
Big Data with HBase and Hadoop at Adobe
Talk 3
Lightning talks (10m each)
HBase status and roadmap — Lars George
Q&A/Open discussion

After: beers at Rock’n Pasta or downtown

Register here


Introduction

Deploying and configuring Hadoop and HBase across clusters is a complex task. In this article I will show what we do to make it easier, and share the deployment recipes that we use.

For the tl;dr crowd: go get the code here.

Before going into how we do things, here is the list of tools that we are using, and which I will mention in this article. I will try to put a link next to any tool-specific term, but you can always refer to its specific home-page for further reference.

  • Hudson — this is a great CI server, and we are using it to build Hadoop, HBase, Zookeeper and…

Performance is one of the most interesting characteristics in a system’s behavior. It’s challenging to talk about it, because performance measurements need to be accurate and in depth.

Our purpose is to share our reasons for doing performance testing, our methodology as well as our initial results, and their interpretation. Hopefully, this will come in handy for other people.

The key take-aways here are that:

  • Performance testing helps us determine the cost of our system; it helps size the hardware appropriately, so we don’t introduce hardware bottlenecks or spend too much money on expensive equipment.
  • A black-box approach (only the actual test results: average response time) is not enough. You need to validate the results by doing an in-depth analysis. …

The first part of this article is about our success with the technologies we have chosen. Here are some more arguments (by no means exhaustive :P) about why we think HBase is the best fit for our team. We are trying to explain our train of thought, so other people can at least ask the questions that we did, even if they don’t reach to the same conclusion.

We usually develop against trunk code (for both Hadoop and HBase) using a mirror of the Apache Git repositories. We don’t confine ourselves to released versions only, because we implement fixes, and there are always new features we need or want to evaluate. We test a large variety of conditions and find a variety of problems — from HBase or HDFS corruption to data loss etc. Usually we report them, fix them and move on. Our latest headache from working with unreleased versions was HDFS-909 that causes the corruption of the NameNode “edits” file by losing a byte. We were comfortable enough with the system to manually fix the “edits” binary file in a hex editor so we could bring the cluster back online quickly, and then track the actual cause by analyzing the code. …


Our team builds infrastructure services for many clients across Adobe. We have services ranging from commenting and tagging to structured data storage and processing. We need to make sure that data is safe and always available; the services have to work fast regardless of the data volume.

This article is about how we got started using HBase and where we are now. More in depth reasoning can be found in the second part of the article

If one would have asked me a couple of days ago why or how we chose HBase, I would have answered in a blink that it was about reliability, performance, costs, etc.(a bit brainwashed after answering “correctly” and “objectively” too many times). …

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store