software version :
diffusers == 0. 29.2
Bug1:
AttributeError: 'CLIPTextModelOutput' object has no attribute 'pooler_output'
Because text_encoder is CLIPTextModel , it output is transformers.modeling_outputs.BaseModelOutputWithPooling, so the output contain pooler_output .
But in text_encoder_2 is CLIPTextModelWithProjection, it output is transformers.models.clip.modeling_clip.CLIPTextModelOutput, it not have 'pooler_output' ,and in your code
|
pooled_prompt_embeds = prompt_embeds.pooler_output |
, you try to get
pooler_output in every text_encoder (both
text_encoder and
text_encoder_2), so it break!!
software version :
diffusers == 0. 29.2
Bug1:
Because text_encoder is CLIPTextModel , it output is transformers.modeling_outputs.BaseModelOutputWithPooling, so the output contain pooler_output .
But in text_encoder_2 is CLIPTextModelWithProjection, it output is transformers.models.clip.modeling_clip.CLIPTextModelOutput, it not have 'pooler_output' ,and in your code
LayerDiffuse_DiffusersCLI/diffusers_kdiffusion_sdxl.py
Line 101 in 3061d9a