Top Stories published by Iván’s blog in February of 2008

Distributed databases: BigTable, HBase and Hypertable

Since the publication of the Google paper about BigTable, people have started to make up their mind about distributed databases. BigTable is a distributed database where you can store big amounts of data. On the other hand, a lot of…


El economista camuflado

Magnífico libro de Tim HarfordEl economista camuflado” (The undercover economist en inglés).

He de reconocer que el primer capítulo me resultó un poco obvio e incluso dejé de leer el libro. Pero volví a retomarlo por el capitulo “Por qué los países pobres son pobres” y fui…


Big Data Sets Queriying and Analisys

The use of SQL and databases to analyze and extract data from datasets is a common practice. Functions like GROUP BY, ORDER BY and aggregation functions like COUNT, AVG, etc are useful and flexible enough. Tasks as generating statistics from log files or extract…


Coordination of services in a distributed system

ZooKeeper is a service to coordinate processes in a distributed system. As they say:

“Coordinating processes of a distributed system is challenging as there are often problems that arise when implementing synchronization…

Reading Hadoop SequenceFile from Pig

A trick to read SequenceFile generated by Hadoop into Pig:


public class SequenceFileStorage implements LoadFunc {

 protected SequenceFileRecordReader reader;


 public SequenceFileStorage() {}


Open Source and Startups: the next step forward

It is clear that the open source platform LAMP was a revolution for startups. Open source projects like MySql, Apache or PHP brought “vitamine” to many startups to reach their objectives. Some of them could have not reached success if they had had…


Versión española de Properazzi

Hoy hemos lanzado la nueva versión para España del portal inmobiliario Properazzi.

Incluye muchísimos cambios, entre ellos una interfaz web muy clara y usable además de una velocidad de búsqueda endiablada. Creo que supone una gran mejora. Os invito a probarla.

Properazzi…


Grid computing with Java

I have found the project named GridGain very interesting. GridGain is an Open Source project for grid computing using Java.

Seems very easy to develop distributed applications and execute them in cluster with this library/platform.

I will take a look to this project deeply in the…