Streamlining Finance In Mobile Apps: Effortless IBAN Scanning with OCR in React Native
Using Native Solutions
What is OCR?
Essentially, Optical Character Recognition (OCR) is a technology that recognizes text within an image and extracts it, enabling users to manipulate and interact with the text as if it were a standard digital document.
In our latest React Native app ‘Param’, we utilized OCR to streamline the process of scanning International Bank Account Numbers (IBANs). By implementing OCR, we eliminated the need for manual data entry, reducing the risk of errors and saving time. When a user scans an IBAN, the OCR technology recognizes the text and automatically pastes it into a designated TextInput field, ready for further processing or validation.
Technologies Used for OCR
In our project, we explored two primary technologies for OCR: Google MLKit Text Recognition and Swift’s Vision Text Recognition. Each of these technologies offers unique advantages:
Google MLKit Text Recognition: This is a versatile SDK that provides advanced machine learning capabilities for mobile developers. It’s designed to recognize text in various languages and formats with high accuracy, making it a popular choice for applications requiring reliable text recognition.
Swift’s Vision Text Recognition: Part of Apple’s Vision framework, this technology is optimized for iOS devices. It leverages machine learning to detect and recognize text in images, offering seamless integration with other iOS features and a smooth user experience.
Comparison:
- Accuracy: Google MLKit demonstrates a higher accuracy in text recognition across a diverse set of languages and fonts. Swift’s Vision Text Recognition, while also accurate, is optimized for performance on iOS devices, which can influence its precision in certain scenarios.
- Performance: Swift’s Vision Text Recognition is highly optimized for iOS, offering fast processing times and efficient resource utilization. Google MLKit, being platform-agnostic, provides consistent performance across Android and iOS, and it has been extremely fast in our experiences.
- Integration: For iOS applications, Swift’s Vision Text Recognition offers seamless integration with native development tools and frameworks, simplifying the implementation process by a great margin. Google MLKit, on the other hand, requires additional setup for integration but offers the flexibility to work across different platforms.
- Language Support: Google MLKit stands out with its extensive language support, catering to a global audience. Swift’s Vision Text Recognition has a more limited language range, which may be a consideration depending on the application’s target user base.
And while having the best of both worlds sounds great, remember, in the realm of technology, “the best of both worlds” is usually just one software update away from becoming “the quest for the next bug fix.”
Now I want to get a bit more technical and explain how we integrated this functionality in the app, and I will also include an extra method for reading IBAN from images too.
Setting Up Bridge Files For iOS
Setting Up Bridge Files for iOS
For the iOS implementation of our OCR functionality, we created a bridge between our native Swift code and React Native. Here’s a detailed look at how we set up the bridge files:
- TextRecognition.swift: This Swift file is the heart of our OCR implementation. We defined a class
TextRecognitionModule
that extendsRCTEventEmitter
. Here's a snippet of the code:
@objc(TextRecognitionModule)
class TextRecognitionModule: RCTEventEmitter {
@objc
func startCamera(_ resolve: @escaping RCTPromiseResolveBlock, rejecter reject: @escaping RCTPromiseRejectBlock) {
// Implementation for starting the camera and OCR process
}
@objc
func readFromImage(_ imagePath: String, resolver resolve: @escaping RCTPromiseResolveBlock, rejecter reject: @escaping RCTPromiseRejectBlock) {
// Implementation for reading text from an image
}
// Additional methods and properties
}
- TextRecognition.m: This Objective-C file serves as a bridge between our Swift code and React Native. We used the
RCT_EXTERN_MODULE
andRCT_EXTERN_METHOD
macros to expose ourTextRecognitionModule
class and its methods to JavaScript. Here's the code:
@interface RCT_EXTERN_MODULE(TextRecognitionModule, NSObject)
RCT_EXTERN_METHOD(startCamera:(RCTPromiseResolveBlock)resolve rejecter:(RCTPromiseRejectBlock)reject)
RCT_EXTERN_METHOD(readFromImage:(NSString *)imagePath resolver:(RCTPromiseResolveBlock)resolve rejecter:(RCTPromiseRejectBlock)reject)
@end
This setup allows our React Native to call the startCamera
and readFromImage
methods and invoke the OCR process from our JavaScript/TypeScript code and handle the results in our React components.
Setting Up Bridge Files for Android
For the Android implementation of our OCR functionality, we created a bridge between our native Java code and React Native. Here’s a detailed look at how we set up the bridge files:
- TextRecognitionModule: This Kotlin class is the core of our OCR implementation on Android. It extends
ReactContextBaseJavaModule
and contains methods for starting the camera and reading text from an image using Google MLKit Text Recognition. Here's a snippet of the code:
public class TextRecognitionModule extends ReactContextBaseJavaModule {
@ReactMethod
public void startCamera(Promise promise) {
// Implementation for starting the camera and OCR process
}
@ReactMethod
public void readFromImage(String url, Promise promise) {
// Implementation for reading text from an image
}
// Additional methods and properties
}
- TextRecognitionPackage: This class implements the
ReactPackage
interface and is responsible for registering ourTextRecognitionModule
with React Native. In thecreateNativeModules
method, we add an instance ofTextRecognitionModule
to the list of native modules. Here's the code:
public class TextRecognitionPackage implements ReactPackage {
@Override
public List<NativeModule> createNativeModules(ReactApplicationContext reactContext) {
return Arrays.<NativeModule>asList(new TextRecognitionModule(reactContext));
}
// Implementation for createViewManagers
}
This setup allows our React Native JavaScript code to access the TextRecognitionModule
and its methods.
Importing Functions and Using Them in TypeScript Code
To utilize our native OCR functionality in our React Native application, we need to import the functions from our bridge modules and define their interfaces for TypeScript. Here’s how we did it:
import { NativeModules } from 'react-native';
const { TextRecognitionModule } = NativeModules;
interface RecognizeImageInterface {
readFromImage(url: string): Promise<string>;
startCamera(): Promise<string>;
}
export default TextRecognitionModule as RecognizeImageInterface;
- Destructure
TextRecognitionModule
fromNative Modules
. - Define an interface
RecognizeImageInterface
that specifies the types of the functions we're importing. In our case, we have two functions:readFromImage
andstartCamera
, both of which return aPromise<string>
. - Exporting the Module: Finally, we export
TextRecognitionModule
asRecognizeImageInterface
. This allows us to use the module in our TypeScript code with the correct types, ensuring type safety and better developer experience.
With this setup, we can now easily call our native OCR functions from within our React Native components, leveraging the power of TypeScript for type checking and autocompletion.
Using OCR in Our Application
With the interface set up, we can now use the OCR functionality in our application.
We define several functions to handle the OCR process:
hasCameraRollPermissions
: Checks and requests camera permissions.processTextRecognition
: Processes the OCR task and updates the state with the recognized IBAN.handleImageSelection
: Opens the image picker and processes the selected image.handleCamera
: Initiates the camera and processes the captured image.
A brief example of the codes:
const processTextRecognition = async (recognitionTask: Promise<string>, source: string) => {
if (source !== 'camera') {
setIsLoading(true);
}
try {
const response = await recognitionTask;
inputRef.current?.handlePaste(response.replace('TR', ''));
} catch (error) {
handleError();
} finally {
setIsLoading(false);
}
};
const handleImageSelection = async () => {
const result = await ImagePicker.launchImageLibrary({
mediaType: 'photo'
});
// error handling
await processTextRecognition(TextRecognitionModule.readFromImage(result.assets[0].uri), 'galleryImage');
};
const handleCamera = async () => {
const hasPermission = await hasCameraRollPermissions();
// error handling
await processTextRecognition(TextRecognitionModule.startCamera(), 'camera');
};
Ending Thoughts
In conclusion, the integration of OCR technology for IBAN scanning in our React Native application has significantly enhanced the efficiency and accuracy of financial transactions of users. By automating the process of IBAN entry, we’ve not only reduced the risk of human error but also streamlined the user experience, making it quicker and more seamless for users to conduct transactions. This improvement has been instrumental in boosting user satisfaction and trust in our app, ultimately contributing to the overall success and growth of our company in the competitive financial technology landscape.