Top Stories published by Iván’s blog in 2008
January
June
August
October
December

Paper: Detecting Near-Duplicates for Web Crawling

Three guys from Google have published the paper Detecting Near-Duplicates
for Web Crawling
at the 2007 WWW Conference with a technique for detecting near-duplicates over a set of web pages.

They have developed a method aimed at performing…


Distributed databases: BigTable, HBase and Hypertable

Since the publication of the Google paper about BigTable, people have started to make up their mind about distributed databases. BigTable is a distributed database where you can store big amounts of data. On the other hand, a lot of…


El economista camuflado

Magnífico libro de Tim HarfordEl economista camuflado” (The undercover economist en inglés).

He de reconocer que el primer capítulo me resultó un poco obvio e incluso dejé de leer el libro. Pero volví a retomarlo por el capitulo “Por qué los países pobres son pobres” y fui…


Zookeper Video and Slides

Zookeper is a coordination service that can provide a lot of help when developing distributed systems (See my previous post about it). An introduction to Zookeper has been recently published. You can see the video and some slides in PDF…


Big Data Sets Queriying and Analisys

The use of SQL and databases to analyze and extract data from datasets is a common practice. Functions like GROUP BY, ORDER BY and aggregation functions like COUNT, AVG, etc are useful and flexible enough. Tasks as generating statistics from log files or extract…


Properazzi.com is now Enormo.com

Properazzi is now Enormo.com!

As part as our constant evolution, we have changed the portal name. Now it is named Enormo.com with the aim of fitting better with the site aspirations.

Enormo is currently the properties search engine with more listings arround the world…


Coordination of services in a distributed system

ZooKeeper is a service to coordinate processes in a distributed system. As they say:

“Coordinating processes of a distributed system is challenging as there are often problems that arise when implementing synchronization…

Reading Hadoop SequenceFile from Pig

A trick to read SequenceFile generated by Hadoop into Pig:


public class SequenceFileStorage implements LoadFunc {

 protected SequenceFileRecordReader reader;


 public SequenceFileStorage() {}

These were the top 10 stories published by Iván’s blog in 2008. You can also dive into monthly archives for 2008 by using the calendar at the top of this page.