Working with Hadoop Filesystem Api

Knoldus Inc.
Knoldus - Technical Insights
3 min readApr 16, 2017

Reading data from and writing data to Hadoop Distributed File System (HDFS) can be done in a number of ways. Now let us start understanding how this can be done by using the FileSystem API, to create and write to a file in HDFS, followed by an application to read a file from HDFS and write it back to the local file system.

To start with it :

1) we first need to include the (sbt) dependencies (for an sbt project) :

libraryDependencies ++= Seq(
"org.apache.hadoop" % "hadoop-common" % "2.8.0",
"org.apache.hadoop" % "hadoop-hdfs" % "2.8.0"
)

2) Next step is to configure for the filesystem :

/**
* This method configures the file system
* @param coreSitePath Path to core-site.xml in hadoop
* @param hdfsSitePath Path to hdfs-site.xml in hadoop
* @return HadoopFileSystem instance
*/
public FileSystem configureFilesystem(String coreSitePath, String hdfsSitePath) {
FileSystem fileSystem = null;
try {
Configuration conf = new Configuration();
Path hdfsCoreSitePath = new Path(coreSitePath);
Path hdfsHDFSSitePath = new Path(hdfsSitePath);
conf.addResource(hdfsCoreSitePath);
conf.addResource(hdfsHDFSSitePath);
fileSystem = FileSystem.get(conf);
return fileSystem;
} catch (Exception ex) {
System.out.println("Error occurred while Configuring Filesystem ");
ex.printStackTrace();
return fileSystem;
}
}
3) After configuring filesystem we are ready to start reading from HDFS or write to HDFS:Let us start by writing something to HDFS from local filesystem : To perform this operation we will use
“void copyFromLocalFilesystem( Path src, Path dst )”
method of filesystem api.
/**
*
* @param fileSystem refers to Hadoop FileSystem instance
* @param sourcePath provides the sample input file which can be written to HDFS
* @param destinationPath refers to path on hdfs where the sample input file will be written
* @return
*/
public String writeToHDFS(FileSystem fileSystem, String sourcePath, String destinationPath) {
try {
Path inputPath = new Path(sourcePath);
Path outputPath = new Path(destinationPath);
fileSystem.copyFromLocalFile(inputPath, outputPath);
return Constants.SUCCESS;
} catch (IOException ex) {
System.out.println("Some exception occurred while writing file to hdfs");
return Constants.FAILURE;
}
}
Next we can read from HDFS and store to our local file system : To perform this operation we can use
“void copyToLocalFile( Path src, Path dst )”
method of filesystem api.
/**
*
* @param fileSystem refers to Hadoop FileSystem instance
* @param hdfsStorePath refers to path on hdfs where the sample input file is present
* @param localSystemPath refers to a location of file on local system in which data read from hadoop file will be written
* @return
*/
public String readFileFromHdfs(FileSystem fileSystem, String hdfsStorePath, String localSystemPath) {
try {
Path hdfsPath = new Path(hdfsStorePath);
Path localPath = new Path(localSystemPath);
fileSystem.copyToLocalFile(hdfsPath, localPath);
return Constants.SUCCESS;
} catch (IOException ex) {
System.out.println("Some exception occurred while reading file from hdfs");
return Constants.FAILURE;
}
}
4) Final step is to close the filesystem after we are done reading from HDFS or writing to HDFS :/**
* This closes the FileSystem instance
* @param fileSystem
*/
public void closeFileSystem(FileSystem fileSystem) {
try {
fileSystem.close();
} catch (Exception ex) {
System.out.println("Unable to close Hadoop filesystem : " + ex);
}
}
References :1) https://hadoop.apache.org/docs/r2.7.1/api/index.html?org/apache/hadoop/fs/FileSystem.html
KNOLDUS-advt-sticker

--

--

Knoldus Inc.
Knoldus - Technical Insights

Group of smart Engineers with a Product mindset who partner with your business to drive competitive advantage | www.knoldus.com