Making a simple Image Captioning app w/ Monaca, Capacitor, ReactJS, and Hugging Face

Published in

The Web Tub

13 min readMar 15, 2024

In this blogpost, the process of creating a simple image captioning app will be explained. The goal is to provide a clear explanation on how a pre-trained model can be easily used in a mobile App via an API. There are countless potential use cases for cloud-based models. Here, we focus on how to build an efficient interface with capacitor, monaca, reactjs, and hugging face, that can be extended and adapted to many other applications and user needs.

Index:

App Setup
In-depth App Structure
Capacitor Camera Plugin
Hugging Face API
SQLite Offline Storage
Caption History Screen
Settings

Technologies Used:

ReactJS: Basis for app development, with TypeScript.
MaterialUI: for styling and animations.
Capacitor: for building the app.
@capacitor-community/sqlite: for storage of image/caption history.
@capacitor/camera: for taking pictures using device’s camera or selecting images from gallery.
HuggingFace serverless API: to send a selected image as payload to a pre-trained image captioning model API, and receive a caption as a reply.
Monaca (optional): as an environment setup alternative.

App Structure:

The app consists of two simple main screens: menu and settings. The menu allows the user to caption images either by taking a picture with the camera, or by selecting a previously existing image from the device’s gallery. Additionally, there is a third option for navigating and managing the device’s previous caption history. The settings screen allows the user to choose the Hugging Face model to use, as well as toggle dark mode.

App Setup

To setup the app, yarn and monaca will be used. Other pre-requisites might be installing Node (if you have not already), Android Studio and Xcode, where the apps will then be tested. First create an account (https://ja.monaca.io/), then follow these steps:

# Installing Monaca
npm i -g monaca
# Login to Monaca
monaca login
# Install yarn if needed
npm install yarn -g
# Create Monaca App, choose Capacitor template
monaca create SampleApp
# Capacitor setup
yarn add @capacitor/core @capacitor/ios @capacitor/android
yarn build
yarn cap add ios
yarn cap add android
yarn cap sync

And to open your desired target platform (Android Studio or Xcode):

yarn cap open PLATFORM

where PLATFORM is either android or iOS. Alternatively, build your application on Monaca cloud.

To run in the browser, run:

yarn dev

or

monaca preview

Additionally, there are two commands:

monaca:trapeze-ios or trapeze run config.yaml -y --ios-project ios/App

and

monaca:trapeze-android or trapeze run config.yaml -y --android-project android

that will later allow for the android manifest to be updated with the requirement permissions for access to internal storage images through capacitor camera plugin use.

2. In-depth App Structure

The App container contains a Menu container, providing a grid for the buttons and the navigation bar. Additionally, it’s responsible for handling the camera plugin and consequently invoking the API. Due to the simple structure of the App, booleans are used to select which components get rendered, instead of routing. The general code is as follows:

import React, { useState } from 'react';
import { Modal, Box, Typography } from '@mui/material';
import { Camera, CameraResultType, CameraSource } from '@capacitor/camera';
import CaptionedImage from '@/components/CaptionedImage';
import Loading from '@/components/Loading';
import Menu from '@/components/Menu';
import fetchImageCaption from '@/APIHelpers'
import { decode } from "base64-arraybuffer";
import useSQLiteDB from '@/useSQLiteDB';
import CaptionHistory from '@/components/CaptionHistory';
import { useDarkMode } from '@/components/DarkModeContext'


const App: React.FC = () => {

    interface ImageCaption {
        image: '';
        caption: '';
    }

    const [selectedImageCaption, setSelectedImageCaption] = useState<ImageCaption>({ caption: '', image: '' } as ImageCaption);
    const [model, setModel] = useState<string>('https://api-inference.huggingface.co/models/Salesforce/blip-image-captioning-large');  
    const [isCaptioning, setIsCaptioning] = useState(false);
    const [checkingHistory, setCheckingHistory] = useState(false);
    const [error, setError] = useState(false);
    const { performSQLAction } = useSQLiteDB();
    const { darkMode } = useDarkMode();  

    const handleImageCaptioning = async (selectedImage) => {
        ...
    };

    const handleImageCapture = async () => {
        ...

    };

    const handleImageUpload = async () => {
        ...
    };

    const handleCheckCaptions = () => {
        setCheckingHistory(true);
    };

    const handleClose = async () => {
        await setSelectedImageCaption({ caption: '', image: '' } as ImageCaption);
        await setIsCaptioning(false);
        await setCheckingHistory(false);
        await setError(false);
    };

    return (
        <div className={darkMode ? 'app-container dark' : 'app-container light'}>
            {!isCaptioning && !checkingHistory ?
                <Menu handleImageCaptioning={handleImageCapture} handleImageUpload={handleImageUpload} handleCheckCaptions={handleCheckCaptions} setModel={setModel} model={model} />
                :
                (!isCaptioning && checkingHistory) ? <CaptionHistory onClose={handleClose} performSQLAction={performSQLAction} /> : null
            }
            {isCaptioning && selectedImageCaption.caption === '' && <Loading />}
            {isCaptioning && selectedImageCaption.caption !== '' && <CaptionedImage onClose={handleClose} imageCaption={selectedImageCaption} />}
            ...
        </div>
    );
};

export default App;

Menu container:

import React, { useState, useEffect } from 'react';
import { Button, Grid, Typography, BottomNavigation, BottomNavigationAction, Paper } from '@mui/material';
import PhotoCameraIcon from '@mui/icons-material/PhotoCamera';
import PhotoLibraryIcon from '@mui/icons-material/PhotoLibrary';
import HistoryIcon from '@mui/icons-material/History';
import SettingsIcon from '@mui/icons-material/Settings';
import HomeIcon from '@mui/icons-material/Home';
import Settings from './Settings';

const Menu: React.FC = ({ handleImageCaptioning, handleImageUpload, handleCheckCaptions, setModel, model }) => {

    const [value, setValue] = useState('Home');

    return (<div>
            {value === 'Home' && <Grid
                    container
                    spacing={2}
                    direction="column"
                    justifyContent="center"
                    alignItems="center"
                    style={{ minHeight: '100vh' }}
                >
                    <Grid item>
                        <Typography variant="h4" align="center" gutterBottom>
                            Welcome :)
                        </Typography>
                    </Grid>
                    <Grid item>
                        <Button
                            variant="contained"
                            color="primary"
                            startIcon={<PhotoCameraIcon />}
                            onClick={handleImageCaptioning}
                        >
                            Caption with Camera
                        </Button>
                    </Grid>
                    <Grid item>
                        <Button
                            variant="contained"
                            color="primary"
                            startIcon={<PhotoLibraryIcon />}
                            onClick={handleImageUpload}
                        >
                            Caption from Gallery
                        </Button>
                    </Grid>
                    <Grid item>
                        <Button
                            variant="contained"
                            color="secondary"
                            startIcon={<HistoryIcon />}
                            onClick={handleCheckCaptions}
                        >
                            Check Previous Captions
                        </Button>
                    </Grid>
            </Grid>}
        {value === 'Settings' && <Settings setModel={setModel} model={model} /> }
            <Paper sx={{ position: 'fixed', bottom: 0, left: 0, right: 0 }} elevation={3}>
                <BottomNavigation
                    showLabels
                    value={value}
                    onChange={(event, newValue) => {
                        setValue(newValue);
                    }}
                >
                    <BottomNavigationAction label='Home' value='Home'  icon={<HomeIcon/>}/>
                    <BottomNavigationAction label='Settings' value='Settings' icon={<SettingsIcon/>}/>
                </BottomNavigation>
            </Paper>
        </div>
    );
};

export default Menu;

The specific functions for image selecting/picture taking and respective captioning are explained ahead. However, in the App container, when an image is captioned, it is then displayed in the CaptionedImage component:

import React from 'react';
import { Typography, Button } from '@mui/material';
import CloseIcon from '@mui/icons-material/Close';

const CaptionedImage = ({ imageCaption, onClose }) => {
    return (
        <div style={{ height: '100vh', display: 'flex', flexDirection: 'column', justifyContent: 'center', alignItems: 'center' }}>
            <Button style={{ position: 'absolute', top: 16, left: 4, fontSize: '6rem', color: '#1976D2' }} onClick={onClose}>
                <CloseIcon />
            </Button>
            <img
                src={imageCaption.image}
                alt="Captioned Image"
                style={{ maxWidth: '100%', maxHeight: '70vh', borderRadius: '8px' }}
            />
            <Typography variant="subtitle1" align="center" style={{ marginTop: '16px', marginBottom: '16px', color: '#616161' }}>
                {imageCaption.caption}
            </Typography>
        </div>
    );
};

export default CaptionedImage;

3. Capacitor Camera Plugin

The official documentation for Capacitor’s camera plugin is available here: https://capacitorjs.com/docs/apis/camera. First, install the plugin:

yarn add @capacitor/camera
yarn cap sync

Then, to configure camera and gallery access permissions on both Android and IOS, your config.yaml should look like this:

platforms:
  android:
    versionName: 1.0.0
    manifest:
      -
        file: AndroidManifest.xml
        target: manifest/application/activity
        attrs:
          'android:screenOrientation': unspecified
      -
        file: AndroidManifest.xml
        target: manifest
        inject: |
          <uses-permission android:name="android.permission.READ_MEDIA_IMAGES"/>
      -
        file: AndroidManifest.xml
        target: manifest
        inject: |
          <uses-permission android:name="android.permission.READ_EXTERNAL_STORAGE"/>
      -
        file: AndroidManifest.xml
        target: manifest
        inject: |
          <uses-permission android:name="android.permission.WRITE_EXTERNAL_STORAGE" />
  ios:
    targets:
      App:
        version: 1.0.0
        xcconfig:
          -
            file: App/Config.xcconfig
            set:
              TARGETED_DEVICE_FAMILY: '1,2'
        plist:
          -
            replace: true
            entries:
              -
                UISupportedInterfaceOrientations:
                  - UIInterfaceOrientationPortrait
                  - UIInterfaceOrientationLandscapeLeft
                  - UIInterfaceOrientationLandscapeRight
              - NSCameraUsageDescription: 'Take photos'
              - NSPhotoLibraryAddUsageDescription: 'Add photos'
              - NSPhotoLibraryUsageDescription: 'Access photos'

Running:

trapeze run config.yaml -y --android-project android

and

trapeze run config.yaml -y --ios-project ios/App

will add these permissions to the built mobile projects. The plugin is then used in the App container in the following way, depending on the source of image to be captioned (camera or gallery):

import { Camera, CameraResultType, CameraSource } from '@capacitor/camera';

...

import fetchImageCaption from '@/APIHelpers'
import { decode } from "base64-arraybuffer";

...

const handleImageCapture = async () => {
        const image = await Camera.getPhoto({
            quality: 90,
            allowEditing: false,
            resultType: CameraResultType.Base64,
            source: CameraSource.Camera
        })
        const blob = new Blob([new Uint8Array(decode(image.base64String))], {
            type: `image/${image.format}`,
        });
        const result = await handleImageCaptioning(blob);
       ...
    };

    const handleImageUpload = async () => {
        const image = await Camera.getPhoto({
            quality: 90,
            allowEditing: false,
            resultType: CameraResultType.Base64,
            source: CameraSource.Photos
        })
        const blob = new Blob([new Uint8Array(decode(image.base64String))], {
            type: `image/${image.format}`,
        });
        const result = await handleImageCaptioning(blob); 
        ...
    };

The image has to converted to its blob, to serve the API as payload. handleImageCaptioning is as follows:

    const handleImageCaptioning = async (selectedImage) => {
        setIsCaptioning(true); 
        try {
            const result = await fetchImageCaption(selectedImage,model);
            return result;
        } catch (error) {
            console.error('Error captioning image:', error);
            setError(true);
        }
    };

isCaptioning is set to “true”, in order to display a loading screen, while the API is called:

import React from 'react';
import { CircularProgress, Grid } from '@mui/material';
import Box from '@mui/material/Box';

const Loading: React.FC = () => {
    return (
        <Grid
                container
                spacing={2}
                direction="column"
                justifyContent="center"
                alignItems="center"
                style={{ minHeight: '100vh' }}
            >
            <Box sx={{ display: 'block' }}>
                <CircularProgress color="secondary" />
            </Box>
            <br></br>
            <h2>Loading...</h2>
        </Grid>
    );
};

export default Loading;

4. Hugging Face API

The main chosen model for image captioning is Salesforce’s BLIP (Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation) model, available here: https://huggingface.co/Salesforce/blip-image-captioning-large. This model receives an image as input, and has a vision transformer (ViT) based backbone for an encoder, outputting a textual caption for the selected image. Hugging Face provides many other image captioning models through their APIs, so the developer is free to choose from here: https://huggingface.co/models?pipeline_tag=image-to-text&sort=trending In the settings menu, the user can also change between models, as further explained in Section 7.

The API needs to receive the image data as a payload, and not the image URI, returning the infered caption. The code for calling the API is simple, as follows:

const fetchImageCaption = async (imageData, model) => {
    try {
        const response = await fetch(
            model,
            {
                headers: { Authorization: `Bearer ${API_TOKEN}` },
                method: "POST",
                body: imageData,
            }
        );
        if (response.status !== 200) {
            throw "Unavailable API or incorrect request."
        }
        const result = await response.json();
        return result;
    } catch (error) {
        console.error('Error fetching image caption:', error);
        throw error;
    }
};

The model parameter refers to the model API URL, as set on the Settings container. If the API is not available or the request is incorrectly created, an error is thrown, that will then be caught in the App container, as such:

  <Modal
      open={error}
      onClose={handleClose}
      aria-labelledby="modal-modal-title"
      aria-describedby="modal-modal-description"
      style={{ display: 'flex', alignItems: 'center', justifyContent: 'center' }}
  >
      <Box sx={{ backgroundColor: 'white', padding: '20px', borderRadius: '8px' }}>
          <Typography variant="subtitle1" align="center" style={{ marginTop: '16px', color: '#616161' }}>
              Error captioning image, the API might not be available. Please try again later or select another model.
          </Typography>
      </Box>
  </Modal>

5. SQLite Offline Storage

The app stores the user’s previous captions. In order to implement this, I chose the “@capacitor-community/sqlite” community plugin. There are other options (official documentation page on this topic: https://capacitorjs.com/docs/guides/storage). Additionally, refer to this page for more information on Capacitor plugins: https://capacitorjs.com/docs/plugins

First, install the plugin:

yarn add @capacitor-community/sqlite
yarn cap sync

Then, add this to capacitor.config.json:

plugins: {
    CapacitorSQLite: {
      iosDatabaseLocation: 'Library/CapacitorDatabase',
      iosIsEncryption: true,
      iosKeychainPrefix: 'angular-sqlite-app-starter',
      iosBiometric: {
        biometricAuth: false,
        biometricTitle : "Biometric login for capacitor sqlite"
      },
      androidIsEncryption: true,
      androidBiometric: {
        biometricAuth : false,
        biometricTitle : "Biometric login for capacitor sqlite",
        biometricSubTitle : "Log in using your biometric"
      },
      electronIsEncryption: true,
      electronWindowsLocation: "C:\\ProgramData\\CapacitorDatabases",
      electronMacLocation: "/Volumes/Development_Lacie/Development/Databases",
      electronLinuxLocation: "Databases"
    }
  }

Manually copy the file sql-wasm.wasm from nodes_modules/sql.js/dist/sql-wasm.wasm to the www/assets folder. Finally, setup your Main component, so it can also work on web:

import React from "react";
import ReactDOM from "react-dom/client";
import App from "@/App.tsx";
import { Capacitor } from "@capacitor/core";
import { CapacitorSQLite, SQLiteConnection } from "@capacitor-community/sqlite";
import { JeepSqlite } from "jeep-sqlite/dist/components/jeep-sqlite";
import { DarkModeProvider } from '@/components/DarkModeContext';

window.addEventListener("DOMContentLoaded", async () => {
    try {
        const platform = Capacitor.getPlatform();
        if (platform === "web") {
            const sqlite = new SQLiteConnection(CapacitorSQLite);
            customElements.define("jeep-sqlite", JeepSqlite);
            const jeepSqliteEl = document.createElement("jeep-sqlite");
            document.body.appendChild(jeepSqliteEl);
            await customElements.whenDefined("jeep-sqlite");
            await sqlite.initWebStore();
        }
        const root = ReactDOM.createRoot(document.getElementById('root'));
        root.render(
            <DarkModeProvider>
                <App/>
            </DarkModeProvider>  
        );
    } catch (e) {
        console.log(e);
    }
});

For more detail on how this plugin is setup in the App (database and table initialization and access), please check this repository: https://github.com/aaronksaunders/ionic7-react-sqlite/tree/main or this previous post that goes more in-depth about a similar implementation: https://medium.com/the-web-tub/making-a-simple-japanese-character-learning-app-w-capacitor-reactjs-and-ionic-af4dcc73bff6.

This is the specific component to initialize and access the database:

import { useEffect, useRef, useState } from "react";
import { SQLiteConnection, CapacitorSQLite, } from "@capacitor-community/sqlite";

const useSQLiteDB = () => {
    const db = useRef();
    const sqlite = useRef();
    const [initialized, setInitialized] = useState(false);

    useEffect(() => {
        const initializeDB = async () => {
            if (sqlite.current) return;

            sqlite.current = new SQLiteConnection(CapacitorSQLite);
            const ret = await sqlite.current.checkConnectionsConsistency();
            const isConn = (await sqlite.current.isConnection("db_vite", false))
                .result;

            if (ret.result && isConn) {
                db.current = await sqlite.current.retrieveConnection("db_vite", false);
            } else {
                db.current = await sqlite.current.createConnection(
                    "db_vite",
                    false,
                    "no-encryption",
                    1,
                    false
                );
            }
        };

        initializeDB().then(() => {
            initializeTables().then(() => setInitialized(true));
        });
    }, []);

    const initializeTables = async () => {
        await performSQLAction(async (db: SQLiteConnection | null) => { 
            const queryCreateTable = `
            CREATE TABLE IF NOT EXISTS pastCaptions (
                caption TEXT NOT NULL,
                image TEXT NOT NULL,
                format TEXT NOT NULL
            );`;
            await db?.execute(queryCreateTable);
        });
    };

    const performSQLAction = async (
        action,
        cleanup
    ) => {
        try {
            await db.current?.open();
            await action(db.current);
        } catch (error) {
            console.log((error));
        } finally {
            try {

                (await db.current?.isDBOpen())?.result && (await db.current?.close());
                cleanup && (await cleanup());
            } catch { }
        }
    };

    return { performSQLAction, initialized};
};

export default useSQLiteDB;

The performSQLAction function is accessed from App.tsx after a caption is done, as such:

    const handleImageUpload = async () => {
        const image = await Camera.getPhoto({
            quality: 90,
            allowEditing: false,
            resultType: CameraResultType.Base64,
            source: CameraSource.Photos
        })
        const blob = new Blob([new Uint8Array(decode(image.base64String))], {
            type: `image/${image.format}`,
        });
        const result = await handleImageCaptioning(blob); 
        await performSQLAction(async (db) => {
            await db?.query(`INSERT INTO pastCaptions(caption, image, format) VALUES('${result[0]['generated_text']}', '${image.base64String}','${image.format}');`);
        }, null);
        await setSelectedImageCaption({ caption: result[0]['generated_text'], image: URL.createObjectURL(blob) }); 
    };

Additionally, the database is utilized in the caption history screen, as detailed in the next section.

6. Caption History Screen

When the user presses the “Check previous captions” button, they are greeted to the CaptionHistory container:

import React, { useEffect, useState } from 'react';

...

const CaptionHistory = ({ performSQLAction, onClose }) => {

    const [data, setData] = useState([]);
    const [selectedImage, setSelectedImage] = useState(null);
    const [confirm, setConfirm] = useState(false);
    const [isLoading, setIsLoading] = useState(false);

    const handleImageClick = (image) => {
        setSelectedImage(image);
    };

    const handleCloseModal = () => {
        setSelectedImage(null);
    };

    const deleteCaptions = async () => {
        ...       
    };

    const loadData = async () => {
        ...
    }

    useEffect(() => {
        loadData();
    }, []);

    return (
        <div style={{ height: '100vh', overflowY: 'scroll', width:'100%' }}>
            <Box sx={{ flexGrow: 1 }} className="toolbar">
                <AppBar position="static" sx={{ borderBottom: 1, borderColor: 'divider' }}>
                    <Toolbar sx={{ justifyContent: 'space-between' }}>
                        <IconButton
                            size="large"
                            edge="start"
                            color="inherit"
                            aria-label="menu"
                            onClick={onClose}
                        >
                            <CloseIcon />
                        </IconButton>
                        <IconButton
                            size="large"
                            edge="end"
                            color="inherit"
                            aria-label="open drawer"
                            onClick={() => setConfirm(true)}
                        >
                            <DeleteIcon />
                        </IconButton>
                    </Toolbar>
                </AppBar>
            </Box>
            {isLoading ? <Loading /> : <div>
                {data.length === 0 ? <h2 style={{ paddingTop: '10px' }} >No previous captions...</h2> : <div>
                    <h2 style={{ paddingTop: '10px' }}>Caption History:</h2>
                    <Grid container spacing={2} justifyContent="center" alignItems="flex-start" style={{ paddingTop: '10px', paddingLeft: '10px', paddingRight: '10px' }}>
                        {data.map((item, index) => (
                            ...
                        ))}
                    </Grid></div>}
             ...
            </div>}
        </div>
    );
};

export default CaptionHistory;

When the component is rendered, previous captions are rendered, and thus a loading screen is shown:

    const loadData = async () => {
        setIsLoading(true);
        await performSQLAction(async (db: SQLiteConnection | null) => {
            const jsonData = JSON.parse(JSON.stringify(await db?.query(`SELECT * FROM pastCaptions`)));
            const parsedData = jsonData.values.map(item => ({
                ...item,
                image: URL.createObjectURL(new Blob([new Uint8Array(decode(item.image))], {
                    type: `image/${item.format}`,
                }))
            }));
            setData(parsedData.reverse())
        });
        setIsLoading(false);
    }

The toolbar provides a button to delete all previous captions:

    const deleteCaptions = async () => {
        const deleteData = async () => {
            await performSQLAction(async (db: SQLiteConnection | null) => {
                await db?.query(`DELETE FROM pastCaptions`);
                setData([]);
            });
        }
        setConfirm(false);
        await deleteData();
        await loadData();       
    };

An alert is additionally shown to the user to confirm the deletion:

<Dialog
    open={confirm}
    onClose={() => null}
    aria-labelledby="alert-dialog-title"
    aria-describedby="alert-dialog-description"
>
    <DialogTitle id="alert-dialog-title">
        {"Deleting Caption History..."}
    </DialogTitle>
    <DialogContent>
        <DialogContentText id="alert-dialog-description">
            Are you sure?
        </DialogContentText>
    </DialogContent>
    <DialogActions>
        <Button onClick={() => setConfirm(false)}>Cancel</Button>
        <Button onClick={deleteCaptions} autoFocus>Confirm</Button>
    </DialogActions>
</Dialog>

If an image in the grid is clicked on, its caption is shown in a modal:

<Modal
    open={Boolean(selectedImage)}
    onClose={handleCloseModal}
    aria-labelledby="modal-modal-title"
    aria-describedby="modal-modal-description"
    style={{ display: 'flex', alignItems: 'center', justifyContent: 'center' }}
>
    <Box sx={{ backgroundColor: 'white', padding: '20px', borderRadius: '8px' }}>
        {selectedImage && (
            <>
                <img
                    src={selectedImage.image}
                    alt={`Selected Image`}
                    style={{ maxWidth: '100%', maxHeight: '70vh', borderRadius: '40px' }}
                />
                <Typography variant="subtitle1" align="center" style={{ marginTop: '16px', color: '#616161' }}>
                    {selectedImage.caption}
                </Typography>
            </>
        )}
    </Box>
</Modal>

7. Settings

The settings screen allows for the user to toggle dark mode using a Context Provider, and choose which model to use for captioning, with the model and setModel state from App.tsx passed as props, as follows:

import React from 'react';
import { useDarkMode } from './DarkModeContext';
import { FormGroup, FormControlLabel, Grid, Switch, FormControl, InputLabel, NativeSelect } from '@mui/material';
import { SelectChangeEvent } from '@mui/material/Select';

const Settings: React.FC = ({setModel,model}) => {

    const { darkMode, toggleDarkMode } = useDarkMode();  

    const handleChangeModel = (event: SelectChangeEvent) => {
        setModel(event.target.value);
    };

    return (
        <Grid
            container
            spacing={2}
            direction="column"
            justifyContent="center"
            alignItems="center"
            style={{ minHeight: '100vh' }}
        >
            <FormGroup>
                <FormControlLabel control={<Switch color="secondary" checked={darkMode} onChange={toggleDarkMode} />} label="Dark Mode" />
                <br></br>
                <FormControl fullWidth>
                    <InputLabel variant="standard" htmlFor="uncontrolled-native" className={darkMode ? 'dark' : 'light'}>
                        Model
                    </InputLabel>
                    <NativeSelect className={darkMode ? 'dark' : 'light'}
                        defaultValue={model}
                        inputProps={{
                            name: 'model',
                            id: 'uncontrolled-native',
                        }}
                        onChange={handleChangeModel}
                    >
                        <option value={"https://api-inference.huggingface.co/models/Salesforce/blip-image-captioning-large"} className={darkMode ? 'dark' : 'light'} >'Salesforce BLIP - Large'</option>
                        <option value={"https://api-inference.huggingface.co/models/nlpconnect/vit-gpt2-image-captioning"} className={darkMode ? 'dark' : 'light'} >'NLPCONNECT Vit-gpt2-image-captioning'</option>
                        <option value={"https://api-inference.huggingface.co/models/microsoft/git-base"} className={darkMode ? 'dark' : 'light'} >'GIT (GenerativeImage2Text), base-sized'</option>
                    </NativeSelect>
                </FormControl>
            </FormGroup>
        </Grid>
    );
};

export default Settings;

DarkModeProvider.tsx:

import React, { createContext, useContext, useState } from 'react';

const DarkModeContext = createContext();

export const DarkModeProvider = ({ children }) => {
    const [darkMode, setDarkMode] = useState(false);

    const toggleDarkMode = () => {
        setDarkMode((prevMode) => !prevMode);
    };

    return (
        <DarkModeContext.Provider value={{ darkMode, toggleDarkMode }}>
            {children}
        </DarkModeContext.Provider>
    );
};

export const useDarkMode = () => {
    const context = useContext(DarkModeContext);
    if (!context) {
        throw new Error('useDarkMode must be used within a DarkModeProvider');
    }
    return context;
};

As previously mentioned, there are plenty of other models to choose from Hugging Face’s API. The chosen three are a small sample of the wide range of models available on the platform.

Conclusion

In this blogpost, the process of creating a simple image captioning app with Monaca, ReactJS, Hugging Face, and Capacitor was presented. This application can be adapted and extended to many other tasks available through Hugging Face’s API, showcasing how powerful the platform can be when combined with Capacitor.

Thank you for reading :) ! Hopefully some information in this blogpost can be useful to your development! To access the full implementation, visit this repository: https://github.com/AlvaroAsial/IMGCaptioningApp.

Making a simple Image Captioning app w/ Monaca, Capacitor, ReactJS, and Hugging Face

Index:

Technologies Used:

App Structure:

Conclusion

Written by Alvaro Saldanha