Generative AI in Fashion Imagery: A Current Assessment and Future Outlook
The following Analysis offers a comprehensive and pragmatic assessment of the current state of Generative AI (Gen AI) in Fashion Imagery Production.
There is evident enthusiasm surrounding AI’s potential, yet also uncertainty about its actual capabilities. The aim here is to clearly delineate what is achievable with AI today, the expected progress in the near term, and the projected evolution of the industry.
Leveraging insights from my experience with leading Fashion Brands and my network of top level industry professionals, this Analysis combines real-world applications with an understanding of Digital Innovation to provide a grounded perspective on AI’s transformative role in Fashion Marketing.
Contents
· ANALYSIS
∘ 01. PHOTOREALISM
∘ 02. AESTHETICS
∘ 03. NARRATIVE
∘ 04. SETTING
∘ 05. CASTING· CHALLENGES AND LIMITATIONS
∘ 01. GARMENT REPLICATION
∘ 02. CONSISTENCY
∘ 03. CREATIVE CONTROL
∘ 04 . PHOTOREALISM
∘ 05 . DIVERSITY AND UNIQUENESS
PRELIMINARY INSIGHTS
01. Change of Paradigm
The first key insight is the undeniable paradigm shift AI is bringing to the Fashion Advertising Industry.
Gen AI is going to radically revolutionize the way we conceive, produce and distribute content, potentially rendering traditional methods and those resistant to change obsolete.
Specifically, AI’s emergence is fueled by five key business needs:
Innovation in Content
A drive to create new, innovative content forms.Demand for Personalized Content
An increasing need for content that is both personalized and informed by data analytics.Efficiency Gains
A significant reduction in production times and costs.Output Control and Risk Management
Enhanced control over creative outputs and risk mitigation.Environmental Sustainability
The opportunity to diminish environmental impact.
02. Significant Challenges
While quality is rapidly improving, AI-generated content still faces major challenges and limitations.
As we’ll see later in the analysis, there are still significant steps required to fully realize AI’s transformative potential in Fashion Advertising.
03. Rapid Evolution
The third insight is the astonishing speed at which AI technology is evolving.
Developments are occurring so rapidly that what seems implausible today could become reality in a matter of weeks, unlocking unprecedented possibilities.⁰¹
FRAMEWORK
GOAL
The fundamental question I seek to answer is:
“Where do we currently stand on the spectrum that extends from Fashion Imagery wholly produced by Traditional Photography to a future where a significant portion of the images are created by Gen AI visuals?”
FOCUS
High-Level Insights
This analysis emphasizes high-level insights rather than intricate technical details.⁰² It is addressed primarily to Fashion Marketers and Professionals, providing accessible technical explanation alongside industry perspective to clarify our present position and future trajectory.
High-Quality Fashion Photography
Nowadays High-Quality Fashion Photography typically aims for a realistic appearance, even when realism isn’t the primary goal. In contrast, certain types of commercial Advertising Photography display an overly polished, over-retouched, over-posed and over-symmetrical style. These images, in their quest for perfection, can paradoxically end up appearing artificial or ‘fake’, ironically resembling AI-generated photography more than some of the more sophisticated AI-generated works. This hyper-polished aesthetic has, somewhat ironically, become a subconscious standard against which many judge the ‘perfection’ of AI-generated photographs.
My analysis in this piece is not centered on these types of images, but rather on high-quality Fashion Photography and its nuances.⁰³
Photorealism
What truly captured my fascination since the beginning was the challenge of achieving extreme photorealism.⁰⁴ While traditional photography will undoubtedly retain its place — capturing real moments and real people — Gen AI is opening new doors for all other forms of visual representation. Recognizing this transformative role in the industry, my focus has always been on understanding how to achieve imagery that could potentially rival Traditional Photography.
Midjourney
Among AI tools, Midjourney V.6 represents the benchmark for photographic quality. As quality is paramount for Fashion, most of my analysis centers on Midjourney, with some notable exceptions.⁰⁵
ANALYSIS
My approach to directing photoshoots is pretty methodical, attending to each creative dimension while allowing room for the artist’s spontaneity and those unscripted moments that bring authenticity to the shoot. The same methodical approach guides me in creating prompts for AI-generated images, reflecting the preparation process typical of a traditional photoshoot’s creative deck. I will adopt a similar structure here by examining AI’s advantages and limitations into each of the following dimensions
01. Photorealism
02. Aesthetics
Framing, Camera Focus, Camera Angles, Styling, Cameras & Film, Light, Hair & Makeup03. Narrative
Action & Posing04. Setting
Location & Studio05. Casting
01. PHOTOREALISM
With the advent of Midjourney V.6, the distinction between AI-generated and traditional photography is becoming remarkably subtle. For many viewers, images produced by AI are virtually indistinguishable from those captured through conventional photography, especially when crafted with skillful prompts.
The integration of additional tools like Magnific.ai, which radically enhances details and textures, further blurs this line.
With advanced versions of these software tools, as well as new ones, on the horizon, it’s evident that AI-generated Fashion Photography has achieved a level of photorealism that is objectively satisfying.
02. AESTHETICS
Framing
Framing exhibits a phenomenon known as “Regression to the Mean.” This term implies that AI tends to generate imagery that aligns with the most common or typical patterns in its training set, limiting exploration of less conventional or novel compositions.⁰⁶ As a result, at first subjects are frequently centralized, leading to a graphic and overly designed feel, and compositions need a few iterations in order to show diversity, innovation, or edginess.
In terms of effectiveness, medium shots and close-ups excel, often appearing natural and nearly flawless. Portraits, particularly when the subject is not engaged in specific actions, are among the best use cases, showcasing impressive realism.
Full-length shots present more challenges, often requiring multiple iterations for a credible outcome. At first body proportions can be skewed, typically resulting in overly slim figures with elongated limbs, requiring good retouching work.
While the results for single subjects are often impressive, improvements are needed for couple and group shots.
Faces tend to look more authentic when closer to the camera, posing a challenge in group settings. In couple shots, a phenomenon I call ‘The Twin Effect’ frequently occurs, where subjects appear overly similar, almost as if they share portions of the same DNA.
Camera Focus
Even though Photography Professionals might notice subtle oddities, the focus generally meets the expectations of the average viewer.
Camera Angles
Fashion Photography is about nuances, but slight adjustments to angles and perspective remain challenging in text-to-image tools. The ‘regression to the mean’ tendency in fact applies to camera angles too, often making them appear overly dramatic. However, despite the significant room for improvement, using carefully selected reference images and a few iterations can guide the AI towards achieving better perspectives.
Styling
The styling is an area where MJ truly excels, creating very interesting clothing and looks that could also inspire traditional photo shoots.
Sometimes, accessories and jewelry may appear unconventional at first, requiring a few iterations to achieve less avant-garde looks.
Cameras & Film
I like incorporating camera, film, and printing qualities in the prompts to achieve a more authentic and human touch. This mirrors my approach in actual photography, where I love using film whenever possible. It helps bridge the gap between digital and analog, infusing the images with a sense of realism, and the results are extremely satisfying.
However, engaging in a coloring and retouching phase in Photoshop remains crucial: it helps to refine the outputs, adding depth, texture, and a professional finish to the images.
Light
Light effects in AI-generated imagery face similar challenges to camera angles. At times, they can appear overly dramatic with simple prompts (e.g., ‘Golden Hour’, ‘Camera Flash’). To address this, a strategic approach involves using more descriptive prompts and reference images. This helps the AI understand the nuances of lighting that give fashion photography its depth and mood.
Hair & Makeup
Midjourney V.6 has brought considerable improvement in Hair rendition. When paired with Magnific.ai, there is a notable enhancement in the details and realistic textures of both skin and hair.
Controlling the specifics of makeup in the output remains challenging, but it often yields surprisingly creative results. However, more progress is needed to create beauty shots for beauty brands that closely resemble real makeup.
03. NARRATIVE
Action & Posing
Creating AI-generated imagery that involves storytelling requires significant effort to depict extremely realistic specific actions. Achieving generic actions like ‘walking on the street’ is feasible, but capturing the ‘entropy of the moment’ found in traditional documentary-inspired fashion photography is more challenging. AI currently struggles with these nuanced ‘micromoments’ and spontaneous expressions.
However, a significant part of fashion photography, especially in-studio, is focused on posing. In this area, AI-generated imagery excels. Sometimes, the poses might be overly commercial, but as tools like Stable Diffusion evolve, we expect to gain more control over posing with advancements in technologies such as ControlNet.
04. SETTING
Location & Studio
The overall quality of both location and studio settings in the imagery is quite impressive. Classic studio renditions, such as canvas backdrops, are well-executed. However, in street-based locations, there are a few minor issues, such as occasionally odd positioning or shapes of cars, or the depiction of background characters. These inconsistencies can often be resolved with additional iterations and post-production.
05. CASTING
The models generated by Midjourney are stunning and showcase its excellence, much like in styling. Using detailed prompts can result in the creation of incredibly beautiful and diverse subjects. However, a significant challenge is maintaining the consistency of a character. Once the perfect subject is found, replicating them across different images while preserving their unique features is not easy. We will delve deeper into this topic in the next chapter.
CHALLENGES AND LIMITATIONS
As we’ve observed, the quality of AI-generated imagery has dramatically improved over the past year. Generally speaking, the technology is now largely primed for commercial use, particularly with the aid of skilled postproduction and retouching.
However, there are still challenges that need to be addressed. Ranked in order of significance, from the most to the least critical, these issues are:
01. Garment Replication
Achieving accurate and detailed representation of clothing.02. Consistency
Maintaining character, outfit and scene consistency across various images.03. Creative Control
Allowing for more nuanced and precise artistic direction.04. Photorealism Quality
Enhancing the realism to match traditional photography.05. Diversity and Uniqueness
Ensuring a wide range of diverse and unique outputs.
01. GARMENT REPLICATION
The paramount challenge in the realm of Generative AI tools is the ability to dress subjects in virtual garments that replicate the look and feel of real clothes with exactness. This is undeniably the game-changer that, once resolved, will fundamentally transform the Fashion Advertising Industry. Current renditions based on images injected into the software differ slightly from the original garments. Images that present ‘real’ clothes are typically products of retouching, starting from actual photoshoots, and yield mixed results.
That said, in the recent weeks We have seen a surge in efforts to address this challenge. Notably, there have been developments in tools and academic papers that suggest a growing focus in this area. Specifically
- There are Stable Diffusion / ComfyUI nodes that can dress the subjects starting from still life images of the clothes.
- And two different papers, one from the University of Washington and Google Research called TryOnDiffusion, and one from the Institute for Intelligent Computing of Alibaba Group called Outfit anyone
Despite these advancements, we are still in the early stages. Issues like quality and effective clothes layering remain unresolved. Yet, the rapid pace of technological innovation in this field makes it evident that significant improvements are on the horizon for 2024.
Once this critical breakthrough is achieved, it will revolutionize the production of lookbooks, catalogues, and e-commerce imagery. This advancement promises to reduce production costs, enable customization, facilitate micro-targeting, and streamline processes within the fashion industry’s value chain.
02. CONSISTENCY
The second crucial breakthrough needed is achieving consistency in the subject, their outfit, and the scene. It’s crucial to be able to generate multiple images with consistent facial and bodily features, identical clothing, and the same setting once a subject and their look are established.
Current solutions, predominantly based on Stable Diffusion, involve face-swapping and deepfake technology, but these tools haven’t yet reached the quality necessary for serious, high-quality fashion imagery.
When fully developed, this feature will not only aid in creating cohesive narratives but also empower creatives and casting directors to develop and maintain specific characters throughout a narrative, significantly enhancing the imagery’s overall impact and emotional resonance..
03. CREATIVE CONTROL
In Generative AI image tools, another key objective is to enhance creative control across all dimensions.
DALLE-3’s Natural Language Processing (NLP) currently outperforms Midjourney and Stable Diffusion in prompt adherence, but there are still challenges in independently adjusting variables in text-to-image models. For instance, refining lighting in a prompt might unintentionally alter colors or even hair details. Advancing NLP for more precise control is crucial.
Additionally, the integration of enhanced prompt adherence with intuitive interfaces that enable extra control is vital. This combination is key for fine-tuned adjustments, like precise camera angle tweaks, essential in capturing the subtleties of high-quality fashion photography. This integration of technology and usability would meet the high standards of Fashion Photography and fully address the diverse needs of fashion Marketing.
Among leading platforms, Stable Diffusion is advancing through open-source tools that enable more control. While Midjourney delivers superior output quality, but faces strategic decisions around maintaining exclusivity rather than accessibility to its API.
04 . PHOTOREALISM
This is the area where we’ve seen the most significant advancements over the past year. Yet, it’s evident that there are still specific details requiring fine-tuning, particularly in relation to general human anatomy.
A critical focus must be on avoiding the ‘Uncanny Valley’ effect⁰⁷, where at times the almost-but-not-totally-realistic subjects can still create a sense of unease. Future updates are expected to achieve increasingly lifelike representations. This will not only enhance the realism of the images, but also ensure they resonate more emotionally with viewers.
05 . DIVERSITY AND UNIQUENESS
Last but not least, as the technology advances, a key area for improvement is enhancing the diversity and uniqueness of AI-generated imagery. Currently, images sometimes appear to have a uniform aesthetic. Future advancements should focus on developing algorithms capable of generating a wider and more subtle spectrum of styles.
TAKEAWAYS
The ultimate takeaway from this analysis is that as of today, AI-generated images can be effectively utilized in Fashion Advertising in two primary ways:
Brand Values Content
Here, the primary objective is to convey the brand’s values and ethos, rather than focusing on specific clothing items. AI-generated images can create visually compelling narratives that resonate with the brand’s identity, allowing for a more abstract and conceptual approach to advertising. This method is particularly useful for campaigns that aim to strengthen brand recognition and emotional connection with the audience, rather than direct product promotion.
Social Media Content
AI imagery can be integrated with traditional photography and skillful retouching to produce content for Social Media of innovative brands.
In both approaches, the key is to strategically leverage AI’s strengths — its ability to generate novel, eye-catching visuals quickly— while aligning with the brand’s creative vision and marketing goals. As AI technology continues to evolve, its role in Fashion Advertising is likely to expand, offering more possibilities for creative expression and audience engagement.
WHAT’S NEXT
Having evaluated the current landscape of Gen AI in Fashion Advertising, let’s now explore the anticipated advancements in the near term.
01. E-commerce, Catalogues, Lookbooks
02. Blending AI with traditional Photography
03. Integration of 3D and Design and Gen AI
04. Emergence of AI Models and Influencers
05. Digital Presence and Image Licensing
06. Stable Diffusion Take
07. Copyright and Ethical considerationce
01. E-COMMERCE, CATALOGUES AND LOOKBOOKS
In the coming months, we are going to see Gen AI expanding its reach from Branding and Social Media to E-commerce, Catalogues and Lookbooks.
The integration of traditional photography with AI tools will allow to outfit diverse AI models in a variety of locations and settings, thereby streamlining the production process.
02. BLENDING AI WITH TRADITIONAL PHOTOGRAPHY
Mixing traditional photography and AI Generated Imagery will open up new aesthetic possibilities. I see this as an emerging trend that offers significant uncharted potential.
In the meantime I also see opportunities for existing industry professionals — from Designers to Photographers, Casting Directors, Stylists — working with AI Artists. Their expertise will be instrumental in shaping the evolving aesthetics of the field.
03. INTEGRATION OF 3D DESIGN AND GEN AI
Gen AI tools will be increasingly used in conjunction with fashion design softwares like Clo3D.
The transformation of 3D designs into high-quality images using AI tools will enable designers to visualize and present their products in realistic contexts easily. These AI-enhanced images can then be incorporated into catalogues and lookbooks, serving as a potent asset for sales and marketing departments.
This advancement not only simplifies the journey from design to display but also paves the way for immersive AR and VR experiences, enhancing the presentation of collections and offering a more dynamic, emotional and engaging experience for the audience.
04. EMERGENCE OF AI MODELS AND INFLUENCERS
Addressing character consistency in AI modeling will not only improve digital representations and narratives, but also ignite a new industry as AI models will emerge more and more as influential figures. As they become more prominent, audiences may start recognizing and forming emotional connections with them, similar to human influencers.
Model agencies and brands are likely to strategically develop the aesthetics features of AI models, using data insights to create personas that best appeal to their target audience. Moreover new tools will start delivering different models for each user aesthetically tailored on their specific data.
05. DIGITAL PRESENCE AND IMAGE LICENSING
Additionally, the evolving landscape of modeling and photography will lead to a significant shift for real models as they will no longer need to be physically present at photoshoots. Instead, models can license the use of their image, essentially selling image rights.
06. STABLE DIFFUSION TAKE OVER
Stable Diffusion is on a trajectory of continuous improvement and will soon rival Midjourney (MJ) in terms of photorealism quality, offering enhanced control and versatility.
As we look to the future, there is also a strong expectation for the development of more sophisticated tools equipped with user-friendly interfaces. These advanced tools will provide more control over various photographic elements, including camera settings, film types, lens choices, subjects, lighting, posing, and wardrobe.
UX progress in AI technology is not just about technological advancement; it’s about democratizing the medium and unlocking new realms of artistic possibilities, making sophisticated photography and imagery more accessible and adaptable to a wide range of creative needs.
07. COPYRIGHT AND ETHICAL CONSIDERATIONS
The escalating utilization of AI in fashion imagery will spark extensive discussions on copyright and ethical issues.
These concerns encompass the legalities of image rights used in training data sets, the ethical ramifications of crafting hyper-realistic models, and the replication of garments. While these are complex legal matters that will be fought in court, my experience suggests that regulating such technologies becomes challenging once they are widely accessible (in liberal countries).
HOW THE INDUSTRY WILL LOOK
While traditional photography continues to play a pivotal role, AI-generated content will claim an increasingly larger share of the market. This trend brings us towards a new paradigm where every creative element of content will be finely tuned, quickly and cost-effectively. Coupled with the use of AI’s predictive analytics for improved, data-informed creativity, this development will steer content creation towards being more precisely targeted, thereby greatly enhancing ROI.
However, a potential challenge arises when creativity is predominantly led by research and data. When competitors will have access to similar data, relying solely on this can diminish the brands competitive edge, leading to a landscape where too many Brands may end up producing variations of similar content.
To stand out in this dynamic environment, it’s crucial for Marketing Managers, Creative Agencies, and Artists to adopt forward-thinking strategies:
MARKETING MANAGERS
Hire experts and consultants to integrate new technology and knowledge into your workflows, streamlining the value chain.
- Organize workshops to bring knowledge in-house, educating employees to take advantage of new tools and technologies.
- Consider incorporating AI in the production of Brand Value Content, Social Media Content, E-commerce, Catalogues, and Lookbooks, especially if you manage brands that value innovation and target younger demographics.
CREATIVE AGENCIES
Lead the change by hiring AI professionals of different type: Data Scientists, marketers and AI Creatives. Being left behind is not an option in this rapidly evolving landscape.
- Build expertise in and require your employees to utilize cutting-edge Generative AI tools. Organize workshops to educate your employees.
- Capitalize on the predictive power of AI analytics.
- Infuse this technology with Human Creativity and Insight to create content that transcends the limitations of purely data-driven approaches.
ARTISTS
Traditional Photography Artists
- Top-level professional Photographers, Filmmakers, Stylists, Casting Directors, HMU, and Producers will continue to work at a consistent pace. However, there will be a gradual shift towards content focused on real people, narratives, and documentary-style approaches.
- Emerging Creatives might consider collaborating with AI Creatives on innovative projects that blend traditional expertise with AI technology. The work will look more and more like a creative consultancy on aesthetics and trends.
Digital Art Directors, Artists, 3D Artists, Retouchers, Designers
- Stay updated with new tools and technologies as they emerge.
- Get creative with blending different softwares and techniques: it’s in this mix that the best outputs will be realized.
- Develop a holistic approach, as emerging roles will require a mix of technical skills and sensitivity to the zeitgeist.
All Images in this article are AI Generated by Me with Midjourney 6.0
I am Federico Donelli, International Creative Director.
I founded Ontologie, a pioneering Studio enhancing Human Creative Excellence with AI Technology.
I’d love to hear from you, you can write me or follow me on Instagram1, Instagram2, Linkedin or Twitter.
Notes
01 — The industry is innovating at such a pace that if you read this piece in a few months, some of the content might already be obsolete, particularly at the technological level of analysis. However, the general framework, particularly in terms of industry insights and trajectory, will remain relevant.
02 — Advanced prompt designers and others might find some explanations in this analysis somewhat oversimplified. The intention here is not to serve as a comprehensive guide or to delve deeply into technical details.
03 — It goes without saying that although this analysis does not primarily focus on Commercial Photography, most of the explained concepts still apply. The main difference lies in the fact that Commercial Photography serves different use cases and is often ready for immediate use. For instance, a significant portion of the imagery created for entities like a hamburger chain can already be effectively produced using AI.
04 — I acknowledge that limiting the scope of Fashion Photography to photorealism may seem reductive, especially given AI’s potential to broaden the field’s creative horizons. However, I believe that the primary drivers of industry innovation will not be the edgiest visual experiments. Moreover, even when these visual experiments incorporate photorealism, they create an emotional connection that enhances their impact.
05 — There are numerous Generative AI tools available, such as Adobe Firefly, DALL·E 3, Leonardo AI, Generative AI by Getty Images, Canva AI, and others. Additionally, there are AI enhancers like Topaz AI, Gigapixel, etc. I am confident that a few of these tools, or perhaps new ones, will have breakthroughs and emerge as valid alternatives to Midjourney in the coming year. However, as of now, Midjourney is unparalleled in terms of quality of results and continues to set the bar high.
06 — ‘Regression to the mean’ in the context of text-to-image generative AI tools refers to the tools’ propensity to produce images that align with the most common or average characteristics found in their training data. This phenomenon is a result of several key factors:
- Training Data: AI tools are trained on extensive datasets with a wide range of images, often containing prevalent themes, styles, subjects, and compositions.
- Average Output: The AI relies on its training data to respond to prompts, typically generating images that mirror the average of its training. This often results in common and typical visual representations.
- Creativity and Diversity Limitation: The outputs may lack diversity and creativity, particularly if the training data is skewed towards conventional and popular styles. Novel representations might be underrepresented.
- User Implications: Users seeking unique or unconventional images may need to provide detailed prompts and may have to undergo several iterations or use reference images to direct the AI towards more unique outputs.
- Ongoing Development: To mitigate this limitation, ongoing efforts focus on diversifying training datasets and refining algorithms, enhancing the AI’s capability to produce more varied and creative outputs.
In essence, ‘regression to the mean’ highlights a tendency in text-to-image AI tools to default towards generating more average or typical images, influenced by the common characteristics of their training datasets, potentially constraining the diversity and uniqueness of the outputs.
07 — The “Uncanny Valley” refers to a phenomenon where digital representations of humans or humanoid characters become eerily realistic, yet still have subtle imperfections that make them seem unnatural or unsettling. This occurs in AI-generated images when the appearance of characters is very close to real humans but not perfect, causing a sense of discomfort or eeriness in viewers. This effect can be a significant challenge in creating realistic human-like images with AI, as the closer the images get to real human likeness, the more pronounced the unsettling feelings can become if they are not absolutely lifelike.