How to Automate Midjourney Image Generation with Puppeteer

Web scraping Discord

7 min readMay 16, 2023

a girl is hacking — This image was generated by Midjourney, powered by my prompt skills.

This is the second article in the series. The first one, comparing different AI image/art generators, can be found here.

The Art of AI: Using ChatGPT and Midjourney to Generate Inspiring Visuals

Exploring the beauty of randomness.

medium.com

Midjourney prohibits automation in its terms of service. Hence proceed carefully and avoid abuse.

The article assumes you have a Midjourney subscription but the code can be adapted to the free trial as well.

There are different ways to approach automating Midjourney image generation.

Here is an article describing how to achieve this with Python’s pyautogui, written by Michael King. There are ways to do it in headless mode by providing a path to a virtual display forpyautogui or using a Dummy display plug, which can be purchased for a few dollars on Amazon.

This is a suitable solution for pyautogui running on your Raspberry Pi. However, the article above assumes that you are using your own computer and possibly leaving the script running overnight and doesn’t explain how to use pyautogui in headless mode.

We will try a different approach.

We will be using Puppeteer, a Node.js library that allows you to control Chromium or Chrome through the DevTools Protocol. This library can help you automate web testing, crawl websites, generate screenshots and PDFs, and perform other tasks. If you are not familiar with Puppeteer, you can learn its basics by referring to this guide.

We start off by creating a new project and importing Puppeteer. We also need to define the environment variables that we will need.

import puppeteer from "puppeteer";

const password = process.env.DISCORD_PASSWORD;
const email = process.env.DISCORD_EMAIL;
const server_name = process.env.DISCORD_SERVER_NAME;

You can replace environment variables with hardcoded values if you want or inject them in your own way. We need credentials and your server name. We need to set up a dedicated Midjourney server, which will serve as your channel for communicating with Midjourney’s bot.

export default async (prompt) => {
  const browser = await puppeteer.launch({
    headless: true,
    ignoreHTTPSErrors: true,
    slowMo: 0,
    args: [
      "--disable-gpu",
      "--disable-dev-shm-usage",
      "--disable-setuid-sandbox",
      "--no-first-run",
      "--no-sandbox",
      "--no-zygote",
    ],
  });
  const page = await browser.newPage();
  await page.setUserAgent(
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36"
  );

You can replace the user agent with whatever you like. Notice thatheadless is set to true. In the testing phase, locally, I would suggest setting it to false.

Let’s define a wait function. It will let us wait for JavaScript. Even though an element might have appeared in the DOM tree, it still might not be responsive.

const wait = (ms) => new Promise((resolve) => setTimeout(resolve, ms));

Next, we will attempt to log in.

await page.goto("https://discord.com/login");
  // Set screen size
  await page.setViewport({ width: 1080, height: 1024 });
  await page.waitForSelector('input[name="email"]');
  //wait for js
  await wait(1000);
  await page.type('input[name="email"]', email);
  await wait(1000);
  await page.type('input[name="password"]', password);
  await wait(2000);

If you’re running the script on your personal computer or a nearby machine, you won’t encounter any issues. However, if you’re using a remote server with a new IP address, Discord’s anti-bot protection mechanism will kick in and require users to solve hCaptcha challenges like puzzles and checkboxes.

In this tutorial, we will rely on third-party services to solve hCaptchas since currently, there are no algorithms that can consistently solve them. While some AI projects perform well, such as hcaptcha-challenger, they typically specialize in one type of captcha. For our purposes, we will be using 2captcha. However, there are other services available, so it’s worth doing some research.

I spent 3£ which will yield about 1000 solutions. We give them the URL and a site-keywhich is a unique UUID found somewhere in the HTML. Once provided with these inputs, 2captcha returns the solution hash.

import captcha from "2captcha";
const solver = new captcha.Solver(process.env.CAPTCHA_API_KEY);

export default async (siteKey, url) => {
  const obj = await solver.hcaptcha(siteKey, url);
  const { data } = obj;
  return data;
};

But that’s only half of the job. Then we have to find a callback function somewhere in the Discord website, and manually call it, providing the hash as an argument. That’s the tricky part. The code below does it.

try {
    await page.click('button[type="submit"]');
    let foundElement = await page.waitForSelector(
      `iframe[data-hcaptcha-widget-id], div[data-dnd-name="${server_name}"]`
    );
    //let's determine what we found
    const tagName = await foundElement.evaluate((el) => el.tagName);
    if (tagName === "IFRAME") {
      const srcString = await foundElement.evaluate((el) => el.src);
      const siteKey = srcString.split("sitekey=")[1].split("&")[0];
      const data = await solver(siteKey, "https://discord.com/login");
      await page.evaluate((token) => {
        const node =
          document.querySelector("iframe").parentElement.parentElement;
        const properties = Object.getOwnPropertyDescriptors(node);
        const keys = Object.keys(properties);
        const reactProp = keys[1];
        document
          .querySelector("iframe")
          .parentElement.parentElement[reactProp].children.props.onVerify(
            token
          );
      }, data);
      foundElement = await page.waitForSelector(
        `div[data-dnd-name="${server_name}"]`
      );
    }

Let me elaborate on the code above. We press submit and wait for either Discord to load, then a div with your server name will appear somewhere on the left, or hCaptcha to be triggered. In the latter case, we are expecting to see an iframe.

await page.click('button[type="submit"]');
    let foundElement = await page.waitForSelector(
      `iframe[data-hcaptcha-widget-id], div[data-dnd-name="${server_name}"]`
    );

Once we got the element, whichever loads first, we need to figure out what we got by the tag name.

const tagName = await foundElement.evaluate((el) => el.tagName);

If it’s an iframe, we get the site-key , which is one of the properties of the iframe.

const srcString = await foundElement.evaluate((el) => el.src);
const siteKey = srcString.split("sitekey=")[1].split("&")[0];

We call our solver from above and wait for a human, somewhere far away, to solve it for us and get back the hash.

const data = await solver(siteKey, "https://discord.com/login");

Now we have to find the callback and call it. A bit of digging in the source code and we have the name.

We also know Discord is built with React and onVerify, along with other arguments, is passed as a prop. In principle, finding any prop by name is not that hard. We have to loop through every single node and check for it.

//__reactProps + random string
// will be different every time we reload the page
let react_p = "__reactProps$ksy66ebrux";
const search = (el) => {
  if (!el) return;
  if (el[react_p]?.children?.props) {
    if (Object.keys(el[react_p].children.props).find((el) => el === "onVerify"))
      console.log("Got it!", el.className);
  }
  if (el.children) {
    if (HTMLCollection.prototype.isPrototypeOf(el.children)) {
      for (let i = 0; i < el.children.length; i++) {
        if (el.children[i]) search(el.children[i]);
      }
    } else {
      search(el.children);
    }
  }
};
search(document);

We access React’s Virtual DOM with react_p and node's props with .children.props. Once we got the class name, we can instead target it directly. Back to the Puppeteer script

if (tagName === "iframe") {
      const srcString = await foundElement.evaluate((el) => el.src);
      const siteKey = srcString.split("sitekey=")[1].split("&")[0];
      const data = await solver(siteKey, "https://discord.com/login");
      await page.evaluate((token) => {
        const node =
          document.querySelector("iframe").parentElement.parentElement;
        const properties = Object.getOwnPropertyDescriptors(node);
        const keys = Object.keys(properties);
        const reactProp = keys[1];
        document
          .querySelector("iframe")
          .parentElement.parentElement[reactProp].children.props.onVerify(
            token
          );
      }, data);
      foundElement = await page.waitForSelector(
        `div[data-dnd-name="${server_name}"]`
      );
    }

Once we beat hCaptcha, the rest is easy.

//wait for js
    await wait(1000);
    await foundElement.click();

    await page.waitForSelector("form");
    await page.type("form", `/imagine`);
    await wait(1000);
    await page.type("form", ` `);
    await wait(1000);
    await page.type("form", `${prompt}`);
    await page.keyboard.press("Enter");
    // wait for js to update html tree
    await wait(3000);
    // waiting for the 4ximage to load
    await page.waitForSelector('ol li:last-of-type img[alt="🔄"]', {
      timeout: 1000 * 60 * 4,
    });
    const button = await page.waitForSelector("ol li:last-of-type button");
    //wait for javascript, buttons are not responsive at first
    await wait(3000);
    await button.click();
    //wait for upscale, upscaling the first image
    await page.waitForSelector('ol li:last-of-type img[alt="❤️"]', {
      timeout: 1000 * 60 * 4,
    });
    await wait(1000);
    const image = await page.$('ol li:last-of-type img[alt="Image"]');
    const imageSrc = await image.evaluate((img) => img.src);
    await browser.close();

The code above is also upscaling the first image.

Waiting times for selectors can be significant, up to 5 minutes or even more. If you plan to deploy to the cloud and use the script actively, it makes sense to split it into three parts: writing prompt, upscaling, and extracting URL. By doing so, you can avoid being charged for waiting times in between the processes, and you can call each script separately. This is your decision to make.

The full source code is right here.