Build an AI Stylist with Amazon Bedrock

Dec 31, 2024

In this article, we explore how generative AI can transform the fashion industry by building an AI stylist using Amazon Bedrock. This AI stylist allows users to get personalized style suggestions based on their gender and the occasion they’re attending. Before diving into the code, let’s first understand what Amazon Bedrock is and how it simplifies the process of building generative AI solutions.

The video tutorial of this blog is available here.

What is Amazon Bedrock?

Amazon Bedrock is a powerful service that simplifies the integration of generative AI models into applications. It provides access to foundation models from various providers through a unified API, making it easy to incorporate state-of-the-art AI models without dealing with the complexities of model management and scaling. Bedrock supports a range of tasks, including text generation, image generation, and more.

For this AI stylist, we used two foundation models available through Amazon Bedrock:

Anthropic Claude-v2:1: A large language model (LLM) specializing in text-based tasks, such as generating conversational recommendations.
Stability Stable Diffusion XL-v1: A model designed for text-to-image tasks, which excels at generating high-quality images from text descriptions.

How the AI Stylist Works

The AI stylist uses a combination of text and image generation models to create personalized fashion recommendations.

Here’s how each model contributes:

Step 1: Text-Based Recommendations with Anthropic Claude-v2:1

The first part of the AI stylist involves generating a text-based outfit suggestion using the Anthropic Claude-v2:1 model. This LLM is ideal for handling conversational inputs and generating detailed responses.

Input Prompt: The AI receives a prompt asking for outfit suggestions based on the user’s gender and the occasion they are attending. For example:
“Human: I am attending a wedding, and I am female. What outfit would you suggest?”
Model Call: This prompt is sent to the Claude model, which generates a detailed response. The response might suggest specific clothing items, such as a “floral dress” or “comfortable shoes.”
Text Processing: The response is then split into individual parts, such as sentences or specific clothing recommendations, which will later be used to generate corresponding images.

Step 2: Image Generation with Stability Stable Diffusion XL-v1

Once the text suggestions are generated, the Stability Stable Diffusion XL-v1 model is used to create images based on each part of the text. This model is designed to transform text descriptions into detailed, high-quality images, making it ideal for visualizing outfit ideas.

Input Prompt: For each part of the text suggestion (e.g., “a floral dress”), a new prompt is constructed for the image generation model. For example:
“An outfit that matches the following description: a floral dress.”
Model Call: This prompt is sent to the Stable Diffusion model, which generates an image that visually represents the clothing item. Parameters like cfg_scale, steps, and seed are used to control the quality and diversity of the generated images.
Image Handling: The response from the model includes a base64-encoded image. This image is decoded and displayed, giving users a visual representation of the outfit suggestion.

The function:


  # Call Claude model for text-based recommendation
  suggestion_response = bedrock.invoke_model(
      modelId='anthropic.claude-v2:1',
      contentType='application/json',
      body=json.dumps(payload).encode('utf-8')
  )

  # Extract the outfit suggestion text
  suggestion = suggestion_response['body'].read().decode('utf-8')

# Call Stable Diffusion for each image generation
image_response = bedrock.invoke_model(
    modelId='stability.stable-diffusion-xl-v1',
    contentType='application/json',
    body=json.dumps(stable_diffusion_payload)
)

image_result = json.loads(image_response['body'].read())

Putting It All Together

The final output of the AI stylist is a combination of text-based recommendations and matching images. The user gets a complete style suggestion, including both a written description and a visual representation of the outfit, tailored to their specific occasion and gender.

For example, if you ask for an outfit suggestion for a wedding, you might receive a text recommendation like “a floral dress paired with comfortable shoes” along with corresponding images of each item.

Example Code

The full code for the AI stylist uses Amazon Bedrock to interact with both the Anthropic Claude-v2:1 model for text suggestions and the Stability Stable Diffusion XL-v1 model for image generation. Here’s how the process is structured:

User Input: The user provides the occasion and their gender.
Text Suggestion: The code sends a prompt to the Claude model to generate a detailed fashion recommendation.
Image Generation: The recommendation is split into parts, and each part is used to generate a corresponding image with Stable Diffusion.
Display: The text suggestions and images are displayed to the user.

Conclusion

Amazon Bedrock makes it incredibly easy to integrate generative AI models into applications, as demonstrated with this AI stylist. By leveraging foundation models like Anthropic Claude-v2:1 and Stability Stable Diffusion XL-v1, we can create a seamless experience that combines personalized text recommendations with high-quality images. This approach has exciting potential not just in fashion but in any industry where personalization and creativity are key.

The AI Point Edge

Discussion about this post