Categories
Life Style

AI image generators often give racist and sexist results: can they be fixed?

[ad_1]

A conceptual illustration featuring a collage of faces.

Illustration by Ada Zielińska

In 2022, Pratyusha Ria Kalluri, a graduate student in artificial intelligence (AI) at Stanford University in California, found something alarming in image-generating AI programs. When she prompted a popular tool for ‘a photo of an American man and his house’, it generated an image of a pale-skinned person in front of a large, colonial-style home. When she asked for ‘a photo of an African man and his fancy house’, it produced an image of a dark-skinned person in front of a simple mud house — despite the word ‘fancy’.

After some digging, Kalluri and her colleagues found that images generated by the popular tools Stable Diffusion, released by the firm Stability AI, and DALL·E, from OpenAI, overwhelmingly resorted to common stereotypes, such as associating the word ‘Africa’ with poverty, or ‘poor’ with dark skin tones. The tools they studied even amplified some biases. For example, in images generated from prompts asking for photos of people with certain jobs, the tools portrayed almost all housekeepers as people of colour and all flight attendants as women, and in proportions that are much greater than the demographic reality (see ‘Amplified stereotypes’)1. Other researchers have found similar biases across the board: text-to-image generative AI models often produce images that include biased and stereotypical traits related to gender, skin colour, occupations, nationalities and more.

Amplified stereotypes. Chart showing the difference between self-identification of people working in different professions and AI model output.

Source: Ref. 1

Perhaps this is unsurprising, given that society is full of such stereotypes. Studies have shown that images used by media outlets2, global health organizations3 and Internet databases such as Wikipedia4often have biased representations of gender and race. AI models are being trained on online pictures that are not only biased but that also sometimes contain illegal or problematic imagery, such as photographs of child abuse or non-consensual nudity. They shape what the AI creates: in some cases, the images created by image generators are even less diverse than the results of a Google image search, says Kalluri. “I think lots of people should find that very striking and concerning.”

This problem matters, researchers say, because the increasing use of AI to generate images will further exacerbate stereotypes. Although some users are generating AI images for fun, others are using them to populate websites or medical pamphlets. Critics say that this issue should be tackled now, before AI becomes entrenched. Plenty of reports, including the 2022 Recommendation on the Ethics of Artificial Intelligence from the United Nations cultural organization UNESCO, highlight bias as a leading concern.

Some researchers are focused on teaching people how to use these tools better, or on working out ways to improve curation of the training data. But the field is rife with difficulty, including uncertainty about what the ‘right’ outcome should be. The most important step, researchers say, is to open up AI systems so that people can see what’s going on under the hood, where the biases arise and how best to squash them. “We need to push for open sourcing. If a lot of the data sets are not open source, we don’t even know what problems exist,” says Abeba Birhane, a cognitive scientist at the Mozilla Foundation in Dublin.

Make me a picture

Image generators first appeared in 2015, when researchers built alignDRAW, an AI model that could generate blurry images based on text input5. It was trained on a data set containing around 83,000 images with captions. Today, a swathe of image generators of varying abilities are trained on data sets containing billions of images. Most tools are proprietary, and the details of which images are fed into these systems are often kept under wraps, along with exactly how they work.

An AI-generated image showing a Black man in a long tunic with a disconnected leg standing in front of a small mud hut with a grass roof

This image, generated from a prompt for “an African man and his fancy house”, shows some of the typical associations between ‘African’ and ‘poverty’ in many generated images.Credit: P. Kalluri et al. generated using Stable Diffusion XL

In general, these generators learn to connect attributes such as colour, shape or style to various descriptors. When a user enters a prompt, the generator builds new visual depictions on the basis of attributes that are close to those words. The results can be both surprisingly realistic and, often, strangely flawed (hands sometimes have six fingers, for example).

The captions on these training images — written by humans or automatically generated, either when they are first uploaded to the Internet or when data sets are put together — are crucial to this process. But this information is often incomplete, selective and thus biased itself. A yellow banana, for example, would probably be labelled simply as ‘a banana’, but a description for a pink banana would be likely to include the colour. “The same thing happens with skin colour. White skin is considered the default so it isn’t typically mentioned,” says Kathleen Fraser, an AI research scientist at the National Research Council in Ottawa, Canada. “So the AI models learn, incorrectly in this case, that when we use the phrase ‘skin colour’ in our prompts, we want dark skin colours,” says Fraser.

The difficulty with these AI systems is that they can’t just leave out ambiguous or problematic details in their generated images. “If you ask for a doctor, they can’t leave out the skin tone,” says Kalluri. And if a user asks for a picture of a kind person, the AI system has to visualize that somehow. “How they fill in the blanks leaves a lot of room for bias to creep in,” she says. This is a problem that is unique to image generation — by contrast, an AI text generator could create a language-based description of a doctor without ever mentioning gender or race, for instance; and for a language translator, the input text would be sufficient.

Do it yourself

One commonly proposed approach to generating diverse images is to write better prompts. For instance, a 2022 study found that adding the phrase “if all individuals can be [X], irrespective of gender” to a prompt helps to reduce gender bias in the images produced6.

But this doesn’t always work as intended. A 2023 study by Fraser and her colleagues found that such intervention sometimes exacerbated biases7. Adding the phrase “if all individuals can be felons irrespective of skin colour”, for example, shifted the results from mostly dark-skinned people to all dark-skinned people. Even explicit counter-prompts can have unintended effects: adding the word ‘white’ to a prompt for ‘a poor person’, for example, sometimes resulted in images in which commonly associated features of whiteness, such as blue eyes, were added to dark-skinned faces.

An AI-generated image in a photo-realistic style showing a white man in a white doctor's coat sitting beside three Black children

In a Lancet study of global health images, the prompt “Black African doctor is helping poor and sick white children, photojournalism” produced this image, which reproduced the ‘white saviour’ trope they were explicitly trying to counteract.Credit: A. Alenichev et al. generated using Midjourney

Another common fix is for users to direct results by feeding in a handful of images that are more similar to what they’re looking for. The generative AI program Midjourney, for instance, allows users to add image URLs in the prompt. “But it really feels like every time institutions do this they are really playing whack-a-mole,” says Kalluri. “They are responding to one very specific kind of image that people want to have produced and not really confronting the underlying problem.”

These solutions also unfairly put the onus on the users, says Kalluri, especially those who are under-represented in the data sets. Furthermore, plenty of users might not be thinking about bias, and are unlikely to pay to run multiple queries to get more-diverse imagery. “If you don’t see any diversity in the generated images, there’s no financial incentive to run it again,” says Fraser.

Some companies say they add something to their algorithms to help counteract bias without user intervention: OpenAI, for example, says that DALL·E2 uses a “new technique” to create more diversity from prompts that do not specify race or gender. But it’s unclear how such systems work and they, too, could have unintended impacts. In early February, Google released an image generator that had been tuned to avoid some typical image-generator pitfalls. A media frenzy ensued when user prompts requesting a picture of a ‘1943 German soldier’ created images of Black and Asian Nazis — a diverse but historically inaccurate result. Google acknowledged the mistake and temporarily stopped its generator creating images of people.

Data clean-up

Alongside such efforts lie attempts to improve curation of training data sets, which is time-consuming and expensive for those containing billions of images. That means companies resort to automated filtering mechanisms to remove unwanted data.

However, automated filtering based on keywords doesn’t catch everything. Researchers including Birhane have found, for example, that benign keywords such as ‘daughter’ and ‘nun’ have been used to tag sexually explicit images in some cases, and that images of schoolgirls are sometimes tagged with terms searched for by sexual predators8. And filtering, too, can have unintended effects. For example, automated attempts to clean large, text-based data sets have removed a disproportionate amount of content created by and for individuals from minority groups9. And OpenAI discovered that its broad filters for sexual and violent imagery in DALL·E2 had the unintended effect of creating a bias against the generation of images of women, because women were disproportionately represented in those images.

The best curation “requires human involvement”, says Birhane. But that’s slow and expensive, and looking at many such images takes a deep emotional toll, as she well knows. “Sometimes it just gets too much.”

Independent evaluations of the curation process are impeded by the fact that these data sets are often proprietary. To help overcome this problem, LAION, a non-profit organization in Hamburg, Germany, has created publicly available machine-learning models and data sets that link to images and their captions, in an attempt to replicate what goes on behind the closed doors of AI companies. “What they are doing by putting together the LAION data sets is giving us a glimpse into what data sets inside big corporations and companies like OpenAI look like,” says Birhane. Although intended for research use, these data sets have been used to train models such as Stable Diffusion.

Researchers have learnt from interrogating LAION data that bigger isn’t always better. AI researchers often assume that the bigger the training data set, the more likely that biases will disappear, says Birhane. “People often claim that scale cancels out noise,” she says. “In fact, the good and the bad don’t balance out.” In a 2023 study, Birhane and her team compared the data set LAION-400M, which has 400 million image links, with LAION-2B-en, which has 2 billion, and found that hate content in the captions increased by around 12% in the larger data set10, probably because more low-quality data had slipped through.

An investigation by another group found that the LAION-5B data set contained child sexual abuse material. Following this, LAION took down the data sets. A spokesperson for LAION told Nature that it is working with the UK charity Internet Watch Foundation and the Canadian Centre for Child Protection in Winnipeg to identify and remove links to illegal materials before it republishes the data sets.

Open or shut

If LAION is bearing the brunt of some bad press, that’s perhaps because it’s one of the few open data sources. “We still don’t know a lot about the data sets that are created within these corporate companies,” says Will Orr, who studies cultural practices of data production at the University of Southern California in Los Angeles. “They say that it’s to do with this being proprietary knowledge, but it’s also a way to distance themselves from accountability.”

In response to Nature’s questions about which measures are in place to remove harmful or biased content from DALL·E’s training data set, OpenAI pointed to publicly available reports that outline its work to reduce gender and racial bias, without providing exact details on how that’s accomplished. Stability AI and Midjourney did not respond to Nature’s e-mails.

Orr interviewed some data set creators from technology companies, universities and non-profit organizations, including LAION, to understand their motivations and the constraints. “Some of these creators had feelings that they were not able to present all the limitations of the data sets,” he says, because that might be perceived as critical weaknesses that undermine the value of their work.

Specialists feel that the field still lacks standardized practices for annotating their work, which would help to make it more open to scrutiny and investigation. “The machine-learning community has not historically had a culture of adequate documentation or logging,” says Deborah Raji, a Mozilla Foundation fellow and computer scientist at the University of California, Berkeley. In 2018, AI ethics researcher Timnit Gebru — a strong proponent of responsible AI and co-founder of the community group Black in AI — and her team released a datasheet to standardize the documentation process for machine-learning data sets11. The datasheet has more than 50 questions to guide documentation about the content, collection process, filtering, intended uses and more.

The datasheet “was a really critical intervention”, says Raji. Although many academics are increasingly adopting such documentation practices, there’s no incentive for companies to be open about their data sets. Only regulations can mandate this, says Birhane.

One example is the European Union’s AI Act, which was endorsed by the European Parliament on 13 March. Once it becomes law, it will require that developers of high-risk AI systems provide technical documentation, including datasheets describing the training data and techniques, as well as details about the expected output quality and potential discriminatory impacts, among other information. But which models will come under the high-risk classification remains unclear. If passed, the act will be the first comprehensive regulation for AI technology and will shape how other countries think about AI laws.

Specialists such as Birhane, Fraser and others think that explicit and well-informed regulations will push companies to be more cognizant of how they build and release AI tools. “A lot of the policy focus for image-generation work has been oriented around minimizing misinformation, misrepresentation and fraud through the use of these images, and there has been very little, if any, focus on bias, functionality or performance,” says Raji.

Even with a focus on bias, however, there’s still the question of what the ideal output of AI should be, researchers say — a social question with no simple answer. “There is not necessarily agreement on what the so-called right answer should look like,” says Fraser. Do we want our AI systems to reflect reality, even if the reality is unfair? Or should it represent characteristics such as gender and race in an even-handed, 50:50 way? “Someone has to decide what that distribution should be,” she says.

[ad_2]

Source Article Link

Categories
Featured

Midjourney just changed the generative image game and showed me how comics, film, and TV might never be the same

[ad_1]

Midjourney, the Generative AI platform that you can currently use on Discord just introduced the concept of reusable characters and I am blown away.

It’s a simple idea: Instead of using prompts to create countless generative image variations, you create and reuse a central character to illustrate all your themes, live out your wildest fantasies, and maybe tell a story.

[ad_2]

Source Article Link

Categories
News

Ideogram AI image generator results performance comparison

Ideogram AI image generator results performance comparison

The digital art world is buzzing with excitement over the latest breakthrough in artificial intelligence: the Ideogram AI Image Generator, also known as Ideogram 1.0 released yesterday. This advanced AI art generator is reshaping the landscape of AI-driven artistry, offering artists and creators a new way to bring their visions to life. With its state-of-the-art text rendering capabilities, Ideogram 1.0 is a powerful ally for anyone looking to produce images that are not just realistic, but also full of artistic flair.

Ideogram 1.0 is making waves by outperforming other AI image generators on the market. It has surpassed models like Mixel AI, Sunno AI’s V3 Alpha, Stable Diffusion 3, Midjourney V6, and DALL-E 3, especially when it comes to incorporating text into images. This means that the images it produces have fewer mistakes and clearer visuals. It’s as if Ideogram 1.0 can read your mind, translating your ideas into stunningly accurate visual representations.

What sets Ideogram 1.0 apart is its dual strength in creating images that are both photo-realistic and artistically engaging. Whether you’re aiming for a picture that could pass for a professional photograph or an artwork that looks like it was made by hand, Ideogram 1.0 can do it all. Its advanced algorithms are designed to understand and execute complex instructions, ensuring that the final product matches your creative vision.

Ideogram AI art creation demo

The tool’s versatility is further highlighted by its ability to support different image sizes and shapes, making it perfect for various platforms and purposes. The “Magic Prompt” feature takes this a step further by optimizing your input to produce even better images. It’s like having an AI assistant that knows exactly how to turn your ideas into captivating visuals.

Here are some other articles you may find of interest on the subject of AI art generators

Tests comparing Ideogram 1.0 to its competitors have shown that it excels in understanding instructions and creating images that are detailed and contextually accurate, even with complicated prompts. It also has fewer restrictions on content, which means you can push the boundaries of your creativity.

Ease of access is a key aspect of Ideogram 1.0, with a free plan that offers a generous number of images and affordable paid plans for those who need more. This makes the technology available to both hobbyists and professionals without putting a dent in their wallets. Moreover, Ideogram 1.0 gives you full ownership of the images you create, so you can use your work however you see fit.

Exploring the Digital Art Revolution

The Ideogram AI Image Generator is a standout tool in the realm of AI-generated art. Its sophisticated text rendering, ability to produce both realistic and artistic images, and skill in handling complex prompts make it a leader in the field. The range of image sizes, the “Magic Prompt” feature, and its top-notch performance in tests further solidify its position at the top. With pricing that makes it accessible to all and the guarantee of owning your creations, Ideogram 1.0 is empowering creators to explore the full potential of their imagination with the help of cutting-edge technology. As AI continues to advance, Ideogram 1.0 is a clear example of how technology is expanding the possibilities of human creativity.

The digital art world is experiencing a significant transformation with the introduction of the Ideogram AI Image Generator, known as Ideogram 1.0. This sophisticated tool is revolutionizing the field of AI-driven artistry, providing artists and creators with unprecedented capabilities to manifest their creative ideas. Ideogram 1.0’s advanced text rendering technology is particularly noteworthy, as it enables the production of images that are not only lifelike but also infused with distinctive artistic qualities.

Ideogram 1.0 distinguishes itself by outperforming competing AI image generators currently available. It has achieved superior results compared to tools like Mixel AI, Sunno AI’s V3 Alpha, Stable Diffusion 3, Mid Journey V6, and DALL-E 3, particularly in the realm of text incorporation within images. This proficiency results in images with minimal errors and enhanced clarity. Ideogram 1.0 seems to possess an almost telepathic ability to interpret your thoughts, converting them into stunningly accurate visual representations.

Unleashing Creativity with Advanced Features

What truly differentiates Ideogram 1.0 is its dual capability to generate images that are both photo-realistic and artistically compelling. It caters to a wide range of aesthetic goals, whether one desires an image that resembles a professional photograph or an artwork that appears handcrafted. Ideogram 1.0’s sophisticated algorithms are adept at comprehending and executing intricate instructions, ensuring that the output aligns precisely with the user’s creative vision.

The versatility of the Ideogram AI Image Generator is further accentuated by its support for various image dimensions and formats, catering to different platforms and applications. The “Magic Prompt” feature enhances this versatility by refining user input to yield superior image quality. This function acts like an AI collaborator that expertly translates your concepts into captivating visuals.

Comparative assessments of Ideogram 1.0 against its rivals have demonstrated its exceptional ability to comprehend instructions and generate images that are both intricate and contextually precise, even when faced with complex prompts. Additionally, it imposes fewer content limitations, allowing users to explore the outer limits of their creativity.

Accessibility is a crucial feature of Ideogram 1.0, with a complimentary plan that provides a substantial quota of images and reasonably priced subscription options for those requiring more extensive use. This pricing strategy ensures that the technology is attainable for both amateurs and professionals, without imposing financial burdens. Furthermore, Ideogram 1.0 grants users complete ownership of the images they create, offering the freedom to utilize their artwork as they wish.

Filed Under: Gadgets News





Latest timeswonderful Deals

Disclosure: Some of our articles include affiliate links. If you buy something through one of these links, timeswonderful may earn an affiliate commission. Learn about our Disclosure Policy.

Categories
News

How to use ChatGPT Vision AI correctly for image analysis

how to use ChatGPT Vision AI

The ability to quickly analyze and interpret visual information is more important than ever. ChatGPT Vision is a cutting-edge feature that can transform the way we approach complex challenges, both online and in the physical world. By harnessing the power of this AI tool, you can significantly improve your problem-solving skills and streamline your workflow. Let’s dive into how you can make the most of ChatGPT Vision, customize it to meet your needs, and even use it to create visual aids such as diagrams and flowcharts. Additionally, for those eager to expand their knowledge, we’ll touch on a specialized course designed to enhance your expertise in ChatGPT.

OpenAI’s Vision is a standout multimodal AI that allows you to upload images for detailed analysis, providing you with in-depth feedback. This can be incredibly useful when you’re dealing with a problem that’s difficult to describe in words. For instance, if you have a technical diagram or a photo of a malfunctioning piece of equipment,  Vision can examine the visual information and offer a thorough assessment.

One of the areas where OpenAI’s Vision excels is in diagnosing issues. Whether you’re trying to figure out what’s wrong with a gadget or pinpoint problems on a website, an image can often convey what words cannot. Vision will analyze the picture and provide you with suggestions for solutions or steps to address the issue.

ChatGPT Vision AI user guide

To start using ChatGPT Vision, you simply need to access the ChatGPT interface and look for the image analysis option. Once you upload your image, ChatGPT will begin processing it and share its insights. The system may ask you questions to refine its understanding, which helps ensure that the analysis is accurate and beneficial to you.

Here are some other articles you may find of interest on the subject of ChatGPT Vision :

Moreover, OpenAI’s Vision AI is not a one-size-fits-all tool; you can tailor it to your specific needs. By crafting custom prompts, you can guide ChatGPT to focus on particular aspects of an image or problem, resulting in targeted and practical advice.

But ChatGPT Vision isn’t just about analysis—it’s also a powerful ally in visualizing ideas. When paired with compatible applications, it can help you turn rough sketches or notes into polished, professional-looking diagrams and flowcharts. This is particularly useful when you need to clearly outline complex processes or systems.

Understanding ChatGPT Vision and Its Impact on Visual Analysis

ChatGPT Vision is a multifaceted tool that can greatly enhance your analytical and problem-solving abilities. By learning how to activate it, customizing its output to your needs, and taking advantage of its ability to create visuals, you can unlock a new level of image analysis and feedback. And if you’re interested in broadening your skills, the ChatGPT Mastery course is an excellent next step. By exploring the full range of ChatGPT Vision’s capabilities, you can refine your approach to challenges and achieve more efficient and effective results.

The ability to swiftly process and understand visual data is more important than ever. With the advent of ChatGPT Vision, users have at their disposal a sophisticated tool that can revolutionize the way we tackle intricate problems. This feature is not just about recognizing what’s in an image; it’s about comprehending the context and extracting actionable insights. For professionals and hobbyists alike, this means a significant enhancement in how they address visual challenges, whether it’s in digital spaces or the tangible world.

OpenAI Vision stands out by allowing users to upload images for detailed analysis, offering nuanced feedback that goes beyond surface-level interpretation. This is particularly beneficial when faced with issues that are complex or hard to articulate verbally. For example, a detailed image of a circuit board could reveal subtleties that might be missed in a textual description, and ChatGPT Vision can dissect these nuances to provide a comprehensive evaluation.

To begin utilizing ChatGPT Vision, one must navigate to the ChatGPT interface and select the image analysis feature. Upon uploading an image, ChatGPT processes the visual data and conveys its findings. The system’s interactive nature may prompt users with questions to refine its analysis, ensuring that the feedback is accurate and beneficial for the user’s specific context.

Customizing ChatGPT Vision for Personalized Solutions

Vision’s ability to diagnose issues is one of its most impressive attributes. It’s adept at identifying problems that might be difficult to detect through text alone. By analyzing an image, ChatGPT Vision can offer suggestions and actionable steps to rectify the issue, whether it’s a technical glitch in a device or a flaw in a website’s design. This visual analysis can often communicate what words cannot, providing a clearer understanding of the problem at hand.

The versatility of Vision means that it can be tailored to specific user needs. By creating custom prompts, users can direct ChatGPT to concentrate on certain elements within an image, leading to targeted and practical advice that addresses the user’s unique challenges. This customization is key to making the most out of the tool’s capabilities.

Beyond problem analysis, ChatGPT Vision is also a valuable resource for visualizing ideas. When integrated with compatible software, it can assist in transforming basic sketches or notes into polished, professional-looking diagrams and flowcharts. This feature is especially helpful for those who need to articulate complex concepts or workflows in a visual format, making it easier for others to understand and follow. ChatGPT Vision is a multifaceted instrument that can significantly boost one’s analytical and problem-solving abilities.

Filed Under: Guides, Top News





Latest timeswonderful Deals

Disclosure: Some of our articles include affiliate links. If you buy something through one of these links, timeswonderful may earn an affiliate commission. Learn about our Disclosure Policy.

Categories
News

Stable Diffusion 3 AI image generator launched by Stability AI

Stable Diffusion 3 AI art generator launched by Stability AI

Stability AI has unveiled its latest creation, Stable Diffusion 3, an artificial intelligence image generator that has taken a significant leap forward in the field. This new AI art generator which is currently available in early preview  and not yet widely available, is capturing the attention of tech enthusiasts and creative minds alike with its enhanced ability to interpret prompts and produce images of remarkable quality. Unlike its predecessors and current rivals, DALL-E 3 and Midjourney v6, Stable Diffusion 3 is not just another step in AI development; it represents a substantial advancement in how machines understand and create visual content.

The Stable Diffusion 3 suite of AI models currently ranges from 800M to 8B parameters and combines diffusion transformer architecture with flow matching. One of the most impressive features of Stable Diffusion 3 is its refined prompt understanding. Users will notice that the AI is now more adept at grasping the nuances of language, accurately incorporating text into images with correct spelling and context. This means that the images generated are not only visually stunning but also make sense in relation to the prompts given. This level of comprehension is a testament to the strides made in AI’s ability to interpret human language and translate it into coherent visual representations.

Stable Diffusion 3

What sets Stable Diffusion 3 apart even further is its commitment to community-driven progress. By releasing the platform as open-source, Stability AI has essentially handed the keys to the public, allowing anyone with interest and skill to contribute to the evolution of this technology. This approach democratizes the development process, inviting input from developers, artists, and AI enthusiasts worldwide. The collective effort can lead to rapid improvements and innovations, making Stable Diffusion 3 a product of its community as much as its creators. Stability AI explains more :

“We believe in safe, responsible AI practices. This means we have taken and continue to take reasonable steps to prevent the misuse of Stable Diffusion 3 by bad actors. Safety starts when we begin training our model and continues throughout the testing, evaluation, and deployment. In preparation for this early preview, we’ve introduced numerous safeguards. By continually collaborating with researchers, experts, and our community, we expect to innovate further with integrity as we approach the model’s public release.

Our commitment to ensuring generative AI is open, safe, and universally accessible remains steadfast. With Stable Diffusion 3, we strive to offer adaptable solutions that enable individuals, developers, and enterprises to unleash their creativity, aligning with our mission to activate humanity’s potential.”

Here are some other articles you may find of interest on the subject of Stability AI and its AI creations :

At the core of Stable Diffusion 3 is its diffusion Transformer architecture. This sophisticated framework enables the AI to scale efficiently and handle a variety of inputs, including the remarkable ability to transform sounds into images. This opens up a world of possibilities for both creative and practical applications, pushing the boundaries of what AI image generation can achieve. The diffusion Transformer architecture is a testament to the ingenuity behind Stable Diffusion 3, showcasing the potential for AI to venture into previously uncharted territories.

The ethos behind Stable Diffusion 3 is to empower and inspire. By making advanced AI technology more accessible, Stability AI is removing barriers that have traditionally limited who can experiment with and benefit from AI-generated art and applications. This tool is designed to encourage a wave of creativity, enabling users to push the limits of what can be created with AI assistance. Whether for artistic expression, business use, or personal projects, Stable Diffusion 3 is poised to be a catalyst for innovation.

The launch of Stable Diffusion 3 from Stability AI marks a significant moment in the evolution of AI image generation. Its superior prompt understanding and image quality, combined with an open-source philosophy, position it at the forefront of the industry. As the community eagerly anticipates the detailed technical report, there is a sense of excitement about the potential of Stable Diffusion 3 to shape the future of AI. With its focus on broadening access and fostering creativity, Stability AI’s latest offering is set to be a key player in the ongoing development of artificial intelligence.

Image Credit :  Stability AI

Filed Under: Technology News, Top News





Latest timeswonderful Deals

Disclosure: Some of our articles include affiliate links. If you buy something through one of these links, timeswonderful may earn an affiliate commission. Learn about our Disclosure Policy.

Categories
News

Stability AI introduces new Stable Cascade AI image generator

Stability AI Stable Cascade AI artwork creator

Stability AI has today launched its latest open source AI image generator in the form of Stable Cascade.  The new AI artwork creator represents a significant leap forward in the ability to create realistic images and text, outpacing previous models such as Stable Diffusion and its larger counterpart, Stable Diffusion XL. What sets Stable Cascade apart is not just its performance but also its efficiency, which is crucial in the fast-paced realm of AI.

Würstchen architecture

The secret behind Stable Cascade’s impressive capabilities lies in its Würstchen architecture. This design choice effectively shrinks the size of the latent space, which is a technical term for the abstract representation of data within the model. By doing so, Stable Cascade can operate faster, reducing the time it takes to generate images, and also cut down on the costs associated with training the AI. Despite these efficiencies, the quality of the images produced remains high. In fact, the model boasts a compression factor of 42, a significant jump from the factor of 8 seen in Stable Diffusion, which is a testament to its enhanced speed and efficiency.

Stage A, Stage B and Stage C

Stable Cascade consists of three models: Stage A, Stage B and Stage C, representing a cascade for generating images, hence the name “Stable Cascade”. Stage A & B are used to compress images, similarly to what the job of the VAE is in Stable Diffusion. However, as mentioned before, with this setup a much higher compression of images can be achieved. Furthermore, Stage C is responsible for generating the small 24 x 24 latents given a text prompt. The following picture shows this visually. Note that Stage A is a VAE and both Stage B & C are diffusion models.

Stable Cascade open source AI image generator

One of the most exciting aspects of Stable Cascade is its open-source nature. The code for this AI image generator is freely available on GitHub, along with helpful scripts for training and using the model. This openness invites a community of developers and AI aficionados to contribute to the model’s development, potentially leading to even more advancements. However, it’s important to note that those looking to use Stable Cascade for commercial purposes will need to navigate licensing requirements.

Here are some other articles you may find of interest on the subject of Stability AI :

For this release, Stability AI are offering two checkpoints for Stage C, two for Stage B and one for Stage A. Stage C comes with a 1 billion and 3.6 billion parameter version, but it’s develop and team highly recommend using the 3.6 billion version, as most work was put into its finetuning.

The two versions for Stage B amount to 700 million and 1.5 billion parameters. Both achieve great results, however the 1.5 billion excels at reconstructing small and fine details. Therefore, you will achieve the best results if you use the larger variant of each. Lastly, Stage A contains 20 million parameters and is fixed due to its small size.

Stable Cascade doesn’t just stop at its core technology; it offers a suite of extensions that can be used to fine-tune its performance. These include a control net, an IP adapter, and an LCM, among others. These tools give users the ability to tailor the model to their specific needs, whether that’s adjusting the style of the generated images or integrating the model with other software.

When compared to other AI models in the market, such as DallE 3 and Mid Journey, Stable Cascade stands out. Its unique combination of features and capabilities positions it as a strong contender in the AI image generation field. This is not just about the technology itself but also about how accessible it is. Stability AI has made Stable Cascade available through various platforms, including the HuggingFace Library and the Pinokio app, which means that a wide range of users, from hobbyists to professionals, can explore and leverage the advanced features of this model.

Commercial Availability

Looking ahead, Stability AI has plans to offer a commercial use license for Stable Cascade. This move will open up new opportunities for businesses and creative professionals to utilize the model’s capabilities for their projects. But before that happens, the company is committed to a thorough period of testing and refinement to ensure the tool meets the high standards required for commercial applications.

The community’s role in the development of Stable Cascade cannot be overstated. Users are not just passive recipients of this technology; they are actively engaged in creating custom content and exploring the model’s possibilities. This collaborative environment is vital for innovation, as it allows for a sharing of ideas and techniques that can push the boundaries of what AI can achieve. Stability AI explain little more about Stable Cascade’s achievements far :

“Moreover, Stable Cascade achieves impressive results, both visually and evaluation wise. According to our evaluation, Stable Cascade performs best in both prompt alignment and aesthetic quality in almost all comparisons. The above picture shows the results from a human evaluation using a mix of parti-prompts (link) and aesthetic prompts. Specifically, Stable Cascade (30 inference steps) was compared against Playground v2 (50 inference steps), SDXL (50 inference steps), SDXL Turbo (1 inference step) and Würstchen v2 (30 inference steps).”

Stability AI’s Stable Cascade is a notable addition to the AI image generation landscape. With its efficient architecture, open-source accessibility, and extensive customization options, it offers a powerful tool for those looking to create realistic images and text. As the community continues to grow and contribute to the model’s evolution, the potential uses for Stable Cascade seem boundless. The excitement surrounding this new AI image generator is a clear indication that the field of artificial intelligence is not just growing—it’s thriving, with innovations that continue to surprise and inspire.

Filed Under: Technology News, Top News





Latest timeswonderful Deals

Disclosure: Some of our articles include affiliate links. If you buy something through one of these links, timeswonderful may earn an affiliate commission. Learn about our Disclosure Policy.

Categories
News

Apple releases MGIE open-source AI image editor

Apple releases MGIE open-source AI image editor

In the realm of digital image editing, Apple’s recent unveiling of the Multimodal Large Language Model-Guided Image Editing (MGIE) system marks a significant milestone. This cutting-edge AI tool leverages the capabilities of large language models to interpret and execute complex, instruction-based image modifications, offering users an unprecedented level of control and flexibility. MGIE’s innovative approach combines the power of text and visual inputs to facilitate Photoshop-style adjustments, global photo enhancements, and precise local edits with remarkable efficiency.

Apple MGIE

The development of MGIE embodies Apple’s commitment to pushing the boundaries of technology and creativity, providing a platform that not only simplifies sophisticated editing tasks but also encourages collaboration and innovation within the open-source community. By integrating multimodal learning techniques, MGIE significantly improves upon previous image editing systems, enabling more expressive and accurate interpretations of user instructions. Providing open source competition to the likes of Midjourney and OpenAI’s DallE 3.

Open source image editor

In recent years, the intersection of artificial intelligence and creative tools has led to revolutionary advances in how we interact with digital media. Apple’s introduction of the MGIE system stands as a testament to this ongoing transformation, setting a new standard for AI-powered creativity.

MGIE (MLLM-Guided Image Editing), an open-source AI model developed in collaboration with University of California researchers. This model, highlighted for its ability to perform intricate image manipulations based on natural language instructions, leverages multimodal large language models (MLLMs) to accurately interpret user requests. MGIE enables a wide range of edits, from global photo enhancements like adjusting brightness and contrast to local modifications and Photoshop-style alterations such as cropping, resizing, and adding filters.

iOS 18

Its capability to understand and execute commands like making a pizza look healthier or altering the focus in a photo showcases its advanced common sense reasoning and pixel-level manipulation skills. MGIE’s development, shared at the International Conference on Learning Representations (ICLR) 2024 and available on GitHub, signifies a major leap forward in AI research for Apple, following closely on the heels of other significant AI projects and the anticipation of generative AI features in iOS 18.

Apple MGIE AI image editor

MGIE represents a bridge between advanced AI capabilities and user-friendly image editing, enabling a plethora of modifications ranging from global photo enhancements like brightness, contrast, and sharpness adjustments to more focused local edits that can alter the shape, size, color, or texture of specific image elements. Furthermore, it excels in Photoshop-style operations, including cropping, resizing, rotating, and applying various filters, offering users an unprecedented level of control over their digital environments.

Multimodal Large Language Model-Guided Image Editing

One of the most remarkable aspects of MGIE is its common-sense reasoning ability, which allows it to perform tasks such as adding vegetable toppings to a pizza to make it appear healthier or enhancing a photo’s contrast to simulate additional light. This level of intuitive operation paves the way for more creative and personalized image editing, pushing the boundaries of what can be achieved with AI technology.

The collaboration with the University of California and the presentation of MGIE at the International Conference on Learning Representations (ICLR) 2024 mark a milestone in Apple’s AI research endeavors. Available on GitHub, MGIE invites further exploration and development, providing access to its code, data, and pre-trained models to the broader scientific and creative communities.

AI image generation and manipulation research

This initiative is part of Apple’s broader commitment to AI research, as evidenced by its recent achievements in deploying large language models on iPhones and other devices with limited memory. The development of an “Apple GPT” rival and the “Ajax” framework for large language models underscore the company’s dedication to advancing AI technology. Furthermore, the anticipation of generative AI features in iOS 18, including an enhanced version of Siri with ChatGPT-like functionality, signals a significant shift in how AI will integrate into everyday devices, potentially marking the “biggest” software update in iPhone history according to industry analysts.

MGIE is not just a tool but a harbinger of the future of digital creativity, blending the lines between technological innovation and artistic expression. Its development and open-source release underscore Apple’s vision of a world where technology serves not only to enhance productivity but also to foster creativity and personal expression through intuitive, accessible, and powerful tools. As MGIE evolves, it is set to redefine the landscape of image editing, making advanced AI-driven image manipulation accessible to a wider audience and encouraging a new era of digital artistry.

Filed Under: Apple, Top News





Latest timeswonderful Deals

Disclosure: Some of our articles include affiliate links. If you buy something through one of these links, timeswonderful may earn an affiliate commission. Learn about our Disclosure Policy.

Categories
News

Midjourney image prompting vs style reference

Midjourney image prompting vs style reference what are the differences

If you would like to improve your Midjourney AI art creations you might be interested in learning more about the differences between Midjourney’s image prompting and style reference. Each method has their own  distinct approach to generating visual art, each catering to different creative needs and objectives. These tools, integral to the Midjourney platform, offer users subtle control over the creative process, enabling the transformation of ideas into visual representations with precision and flair for any application.

When you dive into the world of Midjourney, you’re met with a suite of tools that can transform your creative ideas into stunning visual art. Understanding how to use these tools effectively is crucial for anyone looking to craft images that truly capture their vision.  Let’s start with image prompting.

Think of it as a way to direct the outcome of your visual creation with a high level of specificity. You begin with a base image, which sets the stage for what’s to come. Then, you add text prompts, much like adding pieces to a puzzle, to fill in the details. The final image is a blend of the original picture and the new elements you’ve introduced. This method is perfect when you want to maintain the core aspects of your starting image while adding distinct touches.

Midjourney Image Prompting vs Style Reference

For instance, if you have a photo of a cat and you prompt Midjourney with “wearing a superhero cape,” the software will generate an image of that very cat, now sporting a cape. The influence of the base image is unmistakable, making image prompting ideal for projects where you want to keep the essence of the original image intact.

Here are some other articles you may find of interest on the subject of Midjourney Styles :

On the other hand, style reference is like mixing a unique cocktail. You’re not looking to replicate the base image but rather to capture its style, tone, or mood. The images produced through this method will have a stylistic connection to the reference but won’t be direct copies. If you provide a picture of a starry night, for example, and ask for a “landscape infused with the night’s mystique,” Midjourney will create a new landscape that embodies the atmospheric qualities of the starry night without replicating its exact appearance. This approach is best when you’re aiming to evoke a certain style or feeling rather than replicate precise details.

Midjourney also offers a way to fine-tune the balance between your reference image and the generated artwork. This is done through the D-ssw parameter, which can be adjusted from 0 to 1,000. A higher value means the reference image will have a stronger influence on the outcome, while a lower value gives more weight to the textual prompts. This allows for a high degree of control over how much your final image resembles the reference.

To put these concepts into practice, consider the task of creating an image of a woman with emerald earrings. Using image prompting, you can ensure that the earrings are depicted just as you envision them. Alternatively, if you’re inspired by the lushness of a forest, style reference can help you channel that greenery into your artwork, resulting in a piece that captures the forest’s essence without directly copying its exact look.

Midjourney Image Prompting

Image prompting in Midjourney allows users to start with a base image and then direct the outcome of their creation with high specificity through text prompts. This method is akin to guiding the artistic process step by step, maintaining the essence of the original image while incorporating new elements or alterations as specified by the user. It’s especially useful for projects where the original image’s core aspects are to be preserved, but with added distinct touches.

Key Characteristics:

  • Precision in Details: Image prompting is perfect for adding specific elements to an existing image, such as dressing a cat in a superhero cape. The final image blends the original picture with the new, prompted features.
  • Maintaining Original Essence: The base image heavily influences the outcome, making this method ideal for projects requiring fidelity to the original’s visual identity.

Midjourney Style Reference

Style reference, on the other hand, is about capturing the essence, style, tone, or mood of a reference image rather than its exact visual details. This approach is more about evoking a certain aesthetic or feeling in the artwork, creating images that have a stylistic connection to the reference but are not direct replicas. It’s best suited for projects aiming to convey a general atmosphere or theme inspired by the reference image.

Key Characteristics:

  • Creative Freedom: Offers more leeway in interpretation, focusing on the mood, style, or tone rather than precise replication of the reference image.
  • Thematic Consistency: Ideal for projects that require the artwork to embody the atmosphere or essence of the reference without duplicating it.

Points to remember :

  • Control Over Outcome: Image prompting offers a higher level of control over specific details within the artwork, while style reference provides a broader control over the artwork’s overall aesthetic.
  • Creative Intent: The choice between the two depends on whether the goal is to replicate specific elements of an image (image prompting) or to create something that conveys a general style or feeling (style reference).
  • Parameter Adjustment: Midjourney allows users to fine-tune the influence of the reference image through parameters (e.g., D-ssw), which is crucial in balancing between the base image and textual prompts in image prompting or the desired style in style reference.

The choice between image prompting and style reference ultimately hinges on what you’re trying to achieve with your art. Do you need to replicate specific elements, or are you looking to create something that conveys a general aesthetic? Armed with this knowledge, you can navigate Midjourney’s features to produce artwork that aligns with your creative goals, whether that means a faithful recreation of your initial idea or a more interpretive piece of art.

As you explore the possibilities of Midjourney, remember that these tools are at your disposal to guide the creative process. By mastering image prompting and style reference, you can bring a new level of sophistication and intention to your visual projects. Whether you’re a seasoned artist or a newcomer to digital creation, these features can help you turn your imaginative concepts into compelling visual narratives. So go ahead, experiment with these tools, and watch as your ideas take shape in ways that are as unique as your own creative journey.

Filed Under: Guides, Top News





Latest timeswonderful Deals

Disclosure: Some of our articles include affiliate links. If you buy something through one of these links, timeswonderful may earn an affiliate commission. Learn about our Disclosure Policy.

Categories
News

Google Imagen 2 AI art generator Image FX user interface first look

Google Imagen 2 AI art generator Image FX user interface first look

Google has unveiled a new tool that is changing the way we create images. This tool, called Image FX, is part of their AI Test Kitchen and it’s powered by artificial intelligence. It’s designed to make images that look so real, they could be mistaken for professional photography. The best part? It’s easy to use, even if you’re not an experienced AI artist.

Google has created the AI Test Kitchen to provide users with a place where they can experience and give feedback on some of Google’s latest AI technologies. As the name implies everything within the AI Test Kitchen is a work in progress and meant for early feedback.  When you use Image FX, you start by typing in a description of what you want the picture to look like. The AI then takes your words and turns them into an image. It’s like having a conversation with a machine that can paint. And to make things even better, the system will offer suggestions to improve your description, making it easier for you to get the perfect image.

Google has made available three AI tools: ImageFX, MusicFX, and TextFX. With these tools, you can use text to turn an idea into images, music, and text. Keep in mind that this technology has its own set of challenges since the responses can be inaccurate or inappropriate. Google say they have added multiple layers of protection to minimize these risks, but they haven’t eliminated them. Currently the AI Test Kitchen is only available in English

Imagen 2 Image FX user interface explored

One of the most impressive things about Image FX is its ability to make pictures of famous characters or to create photos that look so real, it’s hard to tell they weren’t taken with a camera. This shows just how far Google has come in the field of AI and making images from scratch.

Here are some other articles you may find of interest on the subject of AI art generators :

Right now, you can’t change much about how the images are made. There’s only one setting you can adjust, called the ‘seed’, which changes how unique your image is. This means that while you’ll get consistent results, you won’t have a lot of control over how different each image is. The people who use Image FX are really important to its success. When they share the images they’ve made and give feedback, they help Google make the tool even better. It’s this kind of teamwork that shows how much Image FX could improve in the future.

When you interact with the tools, Google collects your conversations, tool outputs, related product usage information, and your feedback. This data is stored in a manner that’s not linked to your Google account. Google uses this data to provide, improve, and develop Google products and services and machine learning technologies, including Google’s enterprise products such as Google Cloud. For example, we use your feedback to increase the effectiveness of our model’s safety policies and help minimize bias in our models more generally. Please do not include any personal information about yourself or others in your interactions.

Google Imagen 2 AI art generator

If you want to try out Image FX, you can go to the AI Test Kitchen website. But keep in mind, it might not be available everywhere, so you’ll need to check if you can access it where you live. Google’s Image FX is a big step forward in making images with the help of AI. It’s all about creating high-quality, realistic pictures in a fun and interactive way. Even though it’s still being developed and has some limits, the future looks bright. With help from users, Image FX will keep getting better and better.

Filed Under: Top News





Latest timeswonderful Deals

Disclosure: Some of our articles include affiliate links. If you buy something through one of these links, timeswonderful may earn an affiliate commission. Learn about our Disclosure Policy.

Categories
News

Midjourney 6 Magnific photo realistic AI image upscaler

Midjourney 6 Magnific photo realistic AI image upscaler

In the fast-paced world of digital imaging, professionals and enthusiasts alike are constantly on the lookout for tools that can enhance their work with precision and ease. Enter the Magnific AI image upscaler, a sophisticated piece of software that promises to elevate the quality of your images with its advanced upscaling capabilities and a keen focus on photorealism. When combined with Midjourney 6 the image upscalers can help you create some real masterpieces at any scale.

At the heart of this tool lies the 16x upscale feature, a remarkable function that allows users to enlarge their images significantly while maintaining an impressive level of detail. This is particularly beneficial for those looking to produce high-quality prints or utilize their images in digital formats where clarity is paramount. The Magnific AI doesn’t just stop at upscaling; it also introduces workflow enhancements that save time and effort, enabling you to focus more on the creative aspects of your projects.

The software is designed with customization in mind, offering presets and prompt weights that let you tailor the upscaling process to your specific needs. Whether you’re aiming to preserve the original essence of your image or inject a new level of creativity, Magnific AI provides the tools to do so. Additionally, it includes visual comparison features that allow you to see the impact of your adjustments in real-time, ensuring that you can achieve the level of photorealism you desire.

AI image upscaler

Image upscaling is a process used to increase the resolution of an image. This involves enlarging the pixel dimensions of an image while striving to maintain as much of the original quality and detail as possible. The challenge in upscaling lies in adding new pixels while ensuring they blend seamlessly with the existing ones to create a higher-resolution image that appears natural and undistorted.

Here are some other articles you may find of interest on the subject of AI art generation :

When it comes to standing out in the competitive field of AI upscalers, Magnific AI distinguishes itself with its unique offerings and attractive pricing.  Ease of use is another benefit of the Midjourney V6 Alpha. Thanks to its natural language understanding, communicating with the AI and expressing your creative intentions has never been simpler. The results speak for themselves, with upscaled images that boast detail accuracy and realism that can rival high-resolution originals.

Efficiency is also a key feature of the Magnific AI, which operates with improved performance, making it an excellent choice for remastering old or low-resolution photos. The ability to add details and breathe new life into these images can be a game-changer for those looking to preserve memories or revitalize vintage photography.

What is an image upscale

There are two primary methods of image upscaling: traditional and AI-based. Traditional methods, such as nearest-neighbor or bicubic interpolation, work by calculating new pixel values based on the surrounding pixels. These methods are straightforward but often result in images that are blurry or have visible artifacts, especially when the upscaling factor is large.

AI-based methods, on the other hand, use machine learning models that have been trained on large datasets of images. These models learn complex patterns and textures, enabling them to predict and generate high-resolution details more effectively than traditional methods. This approach often results in images that are sharper and more detailed, with fewer artifacts. AI-based upscaling is particularly effective for large upscaling factors and can be tailored to specific types of images, such as faces, landscapes, or text.

In practical applications, image upscaling is used in various fields, including digital photography, film restoration, medical imaging, and video game enhancement. It allows for older or lower-quality content to be repurposed for modern, high-resolution displays. However, the effectiveness of upscaling depends on the original image quality, the upscaling method used, and the extent of the enlargement.

Magnific AI

To get the most out of the Magnific AI upscaler, it’s important to use the settings for creativity, resemblance, and fractality judiciously. Experimentation can lead to stunning results, but it’s also crucial to be aware of any biases and to consider reusing upscaled images to save time and resources.

When evaluating the value of an AI upscaler, pricing is always a consideration. Magnific AI’s combination of features and performance offers a strong competitive edge, but it’s still important to compare it with other options on the market, such as Topaz Gigapixel, to ensure that you’re making a cost-effective decision.

The Magnific AI image upscaler for Midjourney V6 Alpha is more than just a tool; it’s a partner in your creative journey. With its advanced features, commitment to efficiency, and dedication to producing photorealistic results, it stands as an indispensable asset for anyone looking to take their digital images to the next level. Whether you’re a seasoned professional or a passionate hobbyist, the Magnific AI is poised to help you achieve a new standard of image quality that can truly enhance your work.

Filed Under: Technology News, Top News





Latest timeswonderful Deals

Disclosure: Some of our articles include affiliate links. If you buy something through one of these links, timeswonderful may earn an affiliate commission. Learn about our Disclosure Policy.