File Search | Parallel processing | Implementation in Java

GANESH SHAH
3 min readApr 20, 2024

Let’s say you are given a task to search a file inside a folder. How will you do that ?? We can recursively traverse each folder until the last directory is reached or the file to be searched is found. Below is the sample structure of any folder.

Let’s try to write a sequential program to find a file named hosts in the Windows directory.


import java.io.File;

public class SerialFileSearch {

private static boolean fileFound = false;

public static void searchFiles(File file, String fileName) {

File[] files;

files=file.listFiles();

if ((files==null) || (files.length==0 ) || fileFound) {
return;
}

for (File content : files) {
if (content.isDirectory()) {
searchFiles(content,fileName);
} else {
if (content.getName().equals(fileName)) {
System.out.printf("Serial Search: Path: %s%n", content.getPath());
fileFound = true;
return;
}
}
}
}
}

What this program does ??

==> It iterates through each folder recursively until the file is found or the end of the folder is reached. The search takes too long to complete. The reason for such huge processing time is that only one thread in our case main thread is used to search for the file.

Can we think of any better way ?? Using parallel processing ?? The answer is yes.

  1. We can first iterate through the root directory and push all the internal directories to a blocking queue.
  2. Spawn multiple threads and pass a runnable which picks a directory from the blocking queue if available and then recursively searches for the desired file.
  3. This will distribute the workload from a single thread to multiple threads and will provide us with faster results.

Let's create a runnable to pass to multiple threads.



import java.io.File;
import java.util.ArrayList;
import java.util.concurrent.BlockingQueue;
import java.util.concurrent.ConcurrentLinkedQueue;
import java.util.concurrent.LinkedBlockingQueue;
import java.util.concurrent.atomic.AtomicBoolean;

public class FileSearchRunner implements Runnable {

private String fileToBeSearched;
private BlockingQueue<File> directoriesQueue = new LinkedBlockingQueue<>();

private volatile AtomicBoolean fileFound;
ArrayList<Thread> threads;


FileSearchRunner(BlockingQueue<File> directoriesQueue, String fileToBeSearched, AtomicBoolean fileFound,ArrayList<Thread> threads) {
this.fileToBeSearched = fileToBeSearched;
this.directoriesQueue = directoriesQueue;
this.fileFound = fileFound;
this.threads = threads;
}


@Override
public void run() {
while (!directoriesQueue.isEmpty() && !fileFound.get()) {
try{
File file = directoriesQueue.take();
performRecursiveSearch(file, fileToBeSearched);
}catch(InterruptedException e){
System.out.println("Interrupt signal received");
return;
}
}
}


private void performRecursiveSearch(File file, String fileToBeSearched) {

File[] files = file.listFiles();

if (files == null || files.length == 0)
return;

for (File eachFile : files) {

if(fileFound.get())
return;

if (eachFile.isDirectory()){
performRecursiveSearch(eachFile, fileToBeSearched);
}
else {
if (eachFile.getName().equals(fileToBeSearched)) {
System.out.println("File found");
System.out.println("File path is :" + eachFile.getAbsolutePath());
fileFound.set(true);
return;
}
}
}
}
}

Create a main program with a thread pool of 12 and pass the blocking queue of directories to each thread.



import java.io.File;
import java.util.ArrayList;
import java.util.Date;
import java.util.concurrent.*;
import java.util.concurrent.atomic.AtomicBoolean;

public class FileSearchMain {


public static void main(String[] args) throws InterruptedException {


File files = new File("C:\\Windows\\");
String fileToBeSearched = "hosts";

long start1 = new Date().getTime();
SerialFileSearch.searchFiles(files,fileToBeSearched);
long end1 = new Date().getTime();
System.out.println("Total time taken by sequential program : " + (end1 - start1));

long start2 = new Date().getTime();
BlockingQueue<File> directories = new LinkedBlockingQueue<>();
AtomicBoolean fileFound = new AtomicBoolean(false);
ArrayList<Thread> threads = new ArrayList<>();

File[] allFilesInRootDirectory = files.listFiles();

for(File file : allFilesInRootDirectory){
if(file.isDirectory())
directories.add(file);
else if (file.getName().equals(fileToBeSearched))
System.out.println("file found in path : " + file.getName());
}

ExecutorService executor = Executors.newFixedThreadPool(12);


for(int i=0;i<12;i++){
executor.submit(new FileSearchRunner(directories,fileToBeSearched, fileFound, threads));
}

while(!fileFound.get() && !directories.isEmpty()){
Thread.sleep(100);
}

long end2 = new Date().getTime();
System.out.println("Total time taken by parallel program : " + (end2 - start2));

executor.shutdown();

}
}

Finally, execute the program and note the performance difference :

Note that using 12 threads helped us achieve 300X performance improvement [ 32844 / 109 = 301 (approx) ]

--

--

GANESH SHAH

Passionate Java developer with demonstrated expertise in creating robust Java APIs solving business use cases.