Streamlining Finance In Mobile Apps: Effortless IBAN Scanning with OCR in React Native

Using Native Solutions

Remzi Ogul Tum
ParamTech
5 min readMar 27, 2024

--

What is OCR?

Essentially, Optical Character Recognition (OCR) is a technology that recognizes text within an image and extracts it, enabling users to manipulate and interact with the text as if it were a standard digital document.

In our latest React Native app ‘Param’, we utilized OCR to streamline the process of scanning International Bank Account Numbers (IBANs). By implementing OCR, we eliminated the need for manual data entry, reducing the risk of errors and saving time. When a user scans an IBAN, the OCR technology recognizes the text and automatically pastes it into a designated TextInput field, ready for further processing or validation.

Technologies Used for OCR

In our project, we explored two primary technologies for OCR: Google MLKit Text Recognition and Swift’s Vision Text Recognition. Each of these technologies offers unique advantages:

Google MLKit Text Recognition: This is a versatile SDK that provides advanced machine learning capabilities for mobile developers. It’s designed to recognize text in various languages and formats with high accuracy, making it a popular choice for applications requiring reliable text recognition.

Swift’s Vision Text Recognition: Part of Apple’s Vision framework, this technology is optimized for iOS devices. It leverages machine learning to detect and recognize text in images, offering seamless integration with other iOS features and a smooth user experience.

Comparison:

  • Accuracy: Google MLKit demonstrates a higher accuracy in text recognition across a diverse set of languages and fonts. Swift’s Vision Text Recognition, while also accurate, is optimized for performance on iOS devices, which can influence its precision in certain scenarios.
  • Performance: Swift’s Vision Text Recognition is highly optimized for iOS, offering fast processing times and efficient resource utilization. Google MLKit, being platform-agnostic, provides consistent performance across Android and iOS, and it has been extremely fast in our experiences.
  • Integration: For iOS applications, Swift’s Vision Text Recognition offers seamless integration with native development tools and frameworks, simplifying the implementation process by a great margin. Google MLKit, on the other hand, requires additional setup for integration but offers the flexibility to work across different platforms.
  • Language Support: Google MLKit stands out with its extensive language support, catering to a global audience. Swift’s Vision Text Recognition has a more limited language range, which may be a consideration depending on the application’s target user base.

And while having the best of both worlds sounds great, remember, in the realm of technology, “the best of both worlds” is usually just one software update away from becoming “the quest for the next bug fix.”

Now I want to get a bit more technical and explain how we integrated this functionality in the app, and I will also include an extra method for reading IBAN from images too.

Setting Up Bridge Files For iOS

Setting Up Bridge Files for iOS

For the iOS implementation of our OCR functionality, we created a bridge between our native Swift code and React Native. Here’s a detailed look at how we set up the bridge files:

  • TextRecognition.swift: This Swift file is the heart of our OCR implementation. We defined a class TextRecognitionModule that extends RCTEventEmitter. Here's a snippet of the code:
@objc(TextRecognitionModule)
class TextRecognitionModule: RCTEventEmitter {
@objc
func startCamera(_ resolve: @escaping RCTPromiseResolveBlock, rejecter reject: @escaping RCTPromiseRejectBlock) {
// Implementation for starting the camera and OCR process
}

@objc
func readFromImage(_ imagePath: String, resolver resolve: @escaping RCTPromiseResolveBlock, rejecter reject: @escaping RCTPromiseRejectBlock) {
// Implementation for reading text from an image
}

// Additional methods and properties
}
  • TextRecognition.m: This Objective-C file serves as a bridge between our Swift code and React Native. We used the RCT_EXTERN_MODULE and RCT_EXTERN_METHOD macros to expose our TextRecognitionModule class and its methods to JavaScript. Here's the code:
@interface RCT_EXTERN_MODULE(TextRecognitionModule, NSObject)

RCT_EXTERN_METHOD(startCamera:(RCTPromiseResolveBlock)resolve rejecter:(RCTPromiseRejectBlock)reject)
RCT_EXTERN_METHOD(readFromImage:(NSString *)imagePath resolver:(RCTPromiseResolveBlock)resolve rejecter:(RCTPromiseRejectBlock)reject)

@end

This setup allows our React Native to call the startCamera and readFromImage methods and invoke the OCR process from our JavaScript/TypeScript code and handle the results in our React components.

Setting Up Bridge Files for Android

For the Android implementation of our OCR functionality, we created a bridge between our native Java code and React Native. Here’s a detailed look at how we set up the bridge files:

  • TextRecognitionModule: This Kotlin class is the core of our OCR implementation on Android. It extends ReactContextBaseJavaModule and contains methods for starting the camera and reading text from an image using Google MLKit Text Recognition. Here's a snippet of the code:
public class TextRecognitionModule extends ReactContextBaseJavaModule {
@ReactMethod
public void startCamera(Promise promise) {
// Implementation for starting the camera and OCR process
}

@ReactMethod
public void readFromImage(String url, Promise promise) {
// Implementation for reading text from an image
}

// Additional methods and properties
}
  • TextRecognitionPackage: This class implements the ReactPackage interface and is responsible for registering our TextRecognitionModule with React Native. In the createNativeModules method, we add an instance of TextRecognitionModule to the list of native modules. Here's the code:
public class TextRecognitionPackage implements ReactPackage {
@Override
public List<NativeModule> createNativeModules(ReactApplicationContext reactContext) {
return Arrays.<NativeModule>asList(new TextRecognitionModule(reactContext));
}

// Implementation for createViewManagers
}

This setup allows our React Native JavaScript code to access the TextRecognitionModule and its methods.

Importing Functions and Using Them in TypeScript Code

To utilize our native OCR functionality in our React Native application, we need to import the functions from our bridge modules and define their interfaces for TypeScript. Here’s how we did it:

import { NativeModules } from 'react-native'; 
const { TextRecognitionModule } = NativeModules;

interface RecognizeImageInterface {
readFromImage(url: string): Promise<string>;
startCamera(): Promise<string>;
}
export default TextRecognitionModule as RecognizeImageInterface;
  1. Destructure TextRecognitionModule from Native Modules.
  2. Define an interface RecognizeImageInterface that specifies the types of the functions we're importing. In our case, we have two functions: readFromImage and startCamera, both of which return a Promise<string>.
  3. Exporting the Module: Finally, we export TextRecognitionModule as RecognizeImageInterface. This allows us to use the module in our TypeScript code with the correct types, ensuring type safety and better developer experience.

With this setup, we can now easily call our native OCR functions from within our React Native components, leveraging the power of TypeScript for type checking and autocompletion.

Using OCR in Our Application

With the interface set up, we can now use the OCR functionality in our application.
We define several functions to handle the OCR process:

  • hasCameraRollPermissions: Checks and requests camera permissions.
  • processTextRecognition: Processes the OCR task and updates the state with the recognized IBAN.
  • handleImageSelection: Opens the image picker and processes the selected image.
  • handleCamera: Initiates the camera and processes the captured image.

A brief example of the codes:

const processTextRecognition = async (recognitionTask: Promise<string>, source: string) => {
if (source !== 'camera') {
setIsLoading(true);
}
try {
const response = await recognitionTask;
inputRef.current?.handlePaste(response.replace('TR', ''));
} catch (error) {
handleError();
} finally {
setIsLoading(false);
}
};

const handleImageSelection = async () => {
const result = await ImagePicker.launchImageLibrary({
mediaType: 'photo'
});
// error handling
await processTextRecognition(TextRecognitionModule.readFromImage(result.assets[0].uri), 'galleryImage');
};

const handleCamera = async () => {
const hasPermission = await hasCameraRollPermissions();
// error handling
await processTextRecognition(TextRecognitionModule.startCamera(), 'camera');
};

Ending Thoughts

In conclusion, the integration of OCR technology for IBAN scanning in our React Native application has significantly enhanced the efficiency and accuracy of financial transactions of users. By automating the process of IBAN entry, we’ve not only reduced the risk of human error but also streamlined the user experience, making it quicker and more seamless for users to conduct transactions. This improvement has been instrumental in boosting user satisfaction and trust in our app, ultimately contributing to the overall success and growth of our company in the competitive financial technology landscape.

--

--