PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding

Patient Tools

Read, save, and share this guide

Use these quick tools to make this medical article easier to read, print, save, or share with a family member.

Article Summary

Recent advances in text-to-image generation have made remarkable progress in synthesizing realistic human photos conditioned on given text prompts. However, existing personalized generation methods cannot simultaneously satisfy the requirements of high efficiency, promising identity (ID) fidelity, and flexible text controllability. In this work, we introduce PhotoMaker, an efficient personalized text-to-image generation method, which mainly encodes an arbitrary number of input ID images into a stack ID...

Key Takeaways

  • This article explains Method in simple medical language.
  • This article explains Recontextualization in simple medical language.
  • This article explains Bringing a person in artwork/old photo into reality in simple medical language.
  • This article explains Stylization in simple medical language.
Educational health guideWritten for patient understanding and clinical awareness.
Reviewed content workflowUse writer and reviewer profiles for stronger trust.
Emergency safety firstUrgent warning signs are highlighted below.

Seek urgent medical care if you notice

These warning signs are general safety guidance. Local emergency numbers and clinical judgment should always come first.

  • Severe symptoms, breathing difficulty, fainting, confusion, or rapidly worsening illness.
  • New weakness, severe pain, high fever, or symptoms after a serious injury.
  • Any symptom that feels urgent, unusual, or unsafe for the patient.
1

Emergency now

Use emergency care for severe, sudden, rapidly worsening, or life-threatening symptoms.

2

See a doctor

Book a professional medical evaluation if symptoms persist, worsen, recur often, affect daily activities, or occur in a high-risk patient.

3

Learn safely

Use this article to understand possible causes, tests, treatment options, prevention, and questions to ask your clinician.

Recent advances in text-to-image generation have made remarkable progress in synthesizing realistic human photos conditioned on given text prompts. However, existing personalized generation methods cannot simultaneously satisfy the requirements of high efficiency, promising identity (ID) fidelity, and flexible text controllability. In this work, we introduce PhotoMaker, an efficient personalized text-to-image generation method, which mainly encodes an arbitrary number of input ID images into a stack ID embedding for preserving ID information. Such an embedding, serving as a unified ID representation, can not only encapsulate the characteristics of the same input ID comprehensively, but also accommodate the characteristics of different IDs for subsequent integration. This paves the way for more intriguing and practically valuable applications. Besides, to drive the training of our PhotoMaker, we propose an ID-oriented data construction pipeline to assemble the training data. Under the nourishment of the dataset constructed through the proposed pipeline, our PhotoMaker demonstrates better ID preservation ability than test-time fine-tuning based methods, yet provides significant speed improvements, high-quality generation results, strong generalization capabilities, and a wide range of applications.

Method

Our method transforms a few input images of the same identity into a stacked ID embedding. This embedding can be regarded as a unified representation of the ID to be generated. During the inference stage, the images constituting the stacked ID embedding can originate from different IDs. We then can synthesize the customized ID in difference contexts.
PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding

We first obtain the text embedding and image embeddings from text encoder(s) and image encoder, respectively. Then, we extract the fused embedding by merging the corresponding class embedding (e.g., man and woman) and each image embedding. Next, we concatenate all fused embeddings along the length dimension to form the stacked ID embedding. Finally, we feed the stacked ID embedding to all cross-attention layers for adaptively merging the ID content in the diffusion model. Note that although we use images of the same ID with the masked background during training, we can directly input images of different IDs without background distortion to create a new ID during inference.

We leave the discussions about ID-oriented data construction pipeline in our paper.

Recontextualization

We demonstrate the generation capabilities of our PhotoMaker under basic prompts. We display the conditioning prompts below each image.

PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding

Bringing a person in artwork/old photo into reality

By taking artistic paintings, sculptures, or old photos of a person as input, our PhotoMaker can bring a person from the last century or even ancient times to the present century to “take” photos for them. We display the conditioning prompts below each image.

PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding

Stylization

Our PhotoMaker not only possesses the capability to generate realistic human photos, but it also allows for stylization while preserving ID attributes. We display the conditioning prompts at the first row.

PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding

Changing Age or Gender

By simply replacing class words (e.g., man and woman), our method can achieve changes in gender and age while maintaining the original identity.

PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding

Identity Mixing

If the users provide images of different IDs as input, our PhotoMaker can well integrate the characteristics of different IDs to form a new ID.

PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding

For identity mixing, our method can adjust the merge ratio by either controlling the percentage of identity images within the input image pool or through the method of prompt weighting.

We first show that how our method customizes a new ID by controlling the proportion of different IDs in the input image pool.

PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding
We then multiply the embedding corresponding to the images related to a specific ID by a coefficient to control its proportion of integration into the new ID.

PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding

Comparisons

Compared to other methods, our PhotoMaker can simultaneously satisfy high-quality and diverse generation capabilities, promising editability, high inference efficiency, and strong ID fidelity. More comparison results can be found in our paper. We display the conditioning prompts at the second column.

Patient safety assistant

Check your symptom safely

Hi, I am RX Symptom Navigator. I can help you understand what to read next and what warning signs need care.
Warning: Do not use this in emergencies, pregnancy, severe illness, or as a substitute for a doctor. For children or teens, use with a parent/guardian and clinician.
A rural-friendly guide: warning signs, when to see a doctor, related articles, tests to discuss, and OTC safety education.
1 Symptom 2 Severity 3 Safe guidance
First safety question

Is there chest pain, breathing trouble, fainting, confusion, severe bleeding, stroke-like weakness, severe injury, or pregnancy danger sign?

Choose quickly

Browse by body area
Start here: Write or select a symptom. The guide will show warning signs, doctor guidance, diagnostic tests to discuss, OTC safety education, and related RX articles.

Important: This tool is educational only. It cannot diagnose, treat, or replace a doctor. OTC information is not a prescription. In an emergency, contact local emergency services or go to the nearest hospital.

Frequently Asked Questions

Is this article a replacement for a doctor?

No. It is educational content only. Patients should consult a qualified clinician for diagnosis and treatment.

When should I seek urgent care?

Seek urgent care for severe symptoms, rapidly worsening condition, breathing difficulty, severe pain, neurological changes, or any emergency warning sign.

References

Add references, clinical guidelines, textbooks, journal articles, or trusted medical sources here. You can edit this area from the RX Article Professional Blocks panel.