EdgeCloud: Unveiling Proprietary Sketch-to-3D generative AI Model

Theta Labs
Theta Network
Published in
5 min readMay 16, 2024

The Theta team is excited to share more details of the Sketch-to-3D generative AI model demoed by our CTO Jieyi during the May 15, 2024 live AMA. You can view a recording here. The underlying AI technology is truly groundbreaking and incredibly complex, this proprietary AI model pipeline has been under development at Theta for the greater part of last year. We’re thrilled to announce that it is now deployed in production on EdgeCloud:

https://www.ThetaEdgeCloud.com/dashboard/ai/service/model-explorer

Challenging generative AI Problem

Sketch-to-3D generation is a highly challenging generative AI task which involves interpreting the lines and shapes of a hand-drawn sketch, accurately converting them into 3D digital objects. This technology can be used in various fields from film animation, gaming to architecture and industrial design, allowing artists and designers to quickly prototype ideas and visualize their concepts in a fully interactive 3D space.

Translating a simple, often rough, 2D sketch into a detailed and accurate 3D model involves numerous technical hurdles, as it is difficult to generate a 3D model with high spatial resolution which preserves the subtle details and shapes envisioned by the artist in the 2D sketch.

We tackle this task by dividing it into two stages, first converting the sketch into a “2.5D” image, which is the projection of a 3D object onto a 2D plane, and subsequently transforming that projected image into a 3D model. This two-stage approach is another example of the “model-pipeline” concept Theta Labs and GoogleCloud jointly presented at the GoogleCloud Next conference last year.

Stage 1 — Sketch-to-3D Model Pipeline Design

Figure 1.Theta’s two-stage “model-pipeline” for sketch-to-3D generation.

The first stage, generating a 2.5D image from a sketch, employs the powerful combination of StableDiffusion and ControlNet. In particular, we integrated the advanced neural network architecture, ControlNet. ControlNet is a neural network designed to provide additional control and precision during the generation process of models, particularly in the context of image synthesis using diffusion models. In essence, ControlNet enhances the capability of these models by allowing them to incorporate extra conditions or constraints, which can guide the output more effectively. This is particularly useful in applications like converting sketches to 2.5D images, where maintaining the integrity and details of the original sketch is crucial. For Stable Diffusion, we selected the retrained ReV Animated model due to its outstanding capability in producing highly detailed 2.5D images. This approach established a robust foundation for the challenging task of 3D modeling.

Figure 2. ControlNet schematics as proposed in the original paper from Stanford University.

Stage 2 — Sketch-to-3D Model Pipeline Design

Figure 3. The LRM model, which forms the basis of TripoSR, as proposed in the original paper from Adobe.

The second stage, converting the 2.5D image into a 3D model, begins with the process of using advanced object extraction algorithms to extract the primary object from the 2.5D image, effectively reducing background noise and interference. This stage requires precision and attention to detail to ensure the main object is accurately isolated. Following this, we leveraged TripoSR to construct the 3D model. TripoSR is a model developed by Stability.ai, which excels in rendering detailed and precise 3D representations. This combination of cutting-edge techniques creates a complex, yet highly effective workflow, transforming a simple sketch into a detailed 3D model with remarkable efficiency and precision.

Model Deployed in Production at Scale

Theta’s proprietary Sketch-to-3D model is now available on the Theta EdgeCloud dashboard:

https://www.ThetaEdgeCloud.com/dashboard/ai/service/model-explorer

Figure 4. Launching our proprietary Sketch-to-3D model from the Theta EdgeCloud dashboard (under AI model explorer).

From here, you can launch the model with just a few clicks. After the model is successfully deployed to the EdgeCloud, you can obtain an inference endpoint that looks something like this: https://sketchtoxxxxx.tec-s1.onthetaedgecloud.com/. You should see a WebUI once you click on the endpoint, which allows you to either upload a sketch image, or directly sketch on a drawing board. After uploading the sketch, you can optionally provide a text prompt to describe the 3D object you would like to generate. Next, click the “Generate 2D” button. You can try that a few times until it generates a 2.5D image you are satisfied with. Then, click on the “Generate 3D” button to turn it into a 3D model! You can zoom and rotate the generated 3D model on the WebUI. In addition, you can download the model and import it into other 3D modeling tools. Note that for the first generation might take about 2–3 minutes since it needs to load the model parameters, but subsequent generations should be faster.

Figure 5. The Gradio WebUI and API endpoint documentation of our Sketch-to-3D model running in Theta EdgeCloud.

The model also provides a set of APIs which allows developers to perform sketch-to-3D generations programmatically, or integrate this capability into their custom applications. To view the API endpoints and documentation, simply click on “Use via API” button on the bottom of the WebUI, or append `?view=api` to the inference endpoint, for example: https://sketchtoxxxxx.tec-s1.onthetaedgecloud.com/?view=api

We will be adding sketch-to-3D to the AI Showcase in the near future, please be on the lookout for it. In the meantime, have fun with Theta’s proprietary 3D GenAI!

--

--

Theta Labs
Theta Network

Creators of the Theta Network and EdgeCloud AI — see www.ThetaLabs.org for more info!