HDFS Client Configs for talking to HA Hadoop NameNodes

[Reposted from my blogger]

One more simple thing, that had relatively scarce documentation out on the Internet.

As you might know, Hadoop NameNodes finally became HA in 2.0. The HDFS client configuration, which is already a little bit tedious, became more complicated.

Traditionally, there were two ways to configure a HDFS client (lets stick to Java)

  1. Copy over the entire Hadoop config directory with all the xml files, place it somewhere in the classpath of your app or construct a Hadoop Configuration object by manually adding in those files.
  2. Simply provide the HDFS NameNode URI and let the client do the rest.
Configuration conf = new Configuration(false);
conf.set(“fs.default.name”, “hdfs://localhost:8020”); // this is deprecated now
conf.set(“fs.defaultFS”, “hdfs://localhost:8020”);
FileSystem fs = FileSystem.get(conf);

Most people prefer 2, unless you need way more configs from the actual xml config files, at which point it actually makes sense to copy the entire directory over. Now, with NameNodes being HA, which NameNode’s URI do you use? The answer is : the active Namenode’s rpc address. But then, your client can fail if the active Namenode becomes passive or dies.

So, here’s how you deal with this. (a simple program that copies files between local filesystem and HDFS)

Basically, you point your fs.defaultFS at your nameservice and let the client know how its configured (the backing namenodes) and how to fail over between them.