Google’s Whisk: A New AI Tool That Understands Image Prompts

Dec 19, 2024 Reading time : 2 min

Google has introduced yet another AI tool to its growing collection, Whisk is an image generator from Google Labs that allows users to make use of an existing image as a beginning point. However, instead of recreating the picture with new details, it captures only the essence of the original. This makes it even more suitable for brainstorming and quick visualisations instead of editing the source image itself.

Google’s Whisk to create images from image prompt

Whisk has been introduced as a new type of creative tool by the company and the initial interface is quite minimalist and features options for style and subject. Three predefined styles, enamel pin, sticker and plushie are available for the users to select from.

It seems that Google opted for these specific styles to offer the rough outline outputs that the tool excels at in its current iteration. For users, who are looking for more control, Whisk provides an advanced editor accessible by tapping ‘Start from scratch’ on the main screen. In this mode, the users have the option to input text or a source image across three categories, scene, subject and style.

There is also an option available for the users using which they can add additional text for some final adjustments, however, the advanced feature does not present a result that closely matches with the requests.

Google has acknowledged that Whisk only captures a few key characteristics from the source image and the company is cautious that the generated subject might have a different weight, skin tone, height or hairstyle. The Gemini language model is used by Whisk to create a detailed caption of the image which has been uploaded following which it is being processed by the Imagen 3 image generator.

As of now, Whisk is only accessible to people in the US but others can still explore it on the project’s Google Labs site to generate images they want.

Posted by

Suchita Gupta

Tech Journalist