Model Comparison: PixArt Sigma vs. Hunyuan DiT vs. SD3 Medium via /r/StableDiffusion


Model Comparison: PixArt Sigma vs. Hunyuan DiT vs. SD3 Medium

Hi Everyone,

With the recent release of the SD3 Medium weights and the public response to some of the issues, I decided to take some time and compare three models I've become more recently acquainted with. PixArt SigmaHunyuan DiT and of course SD3 Medium.

This comparison is not scientific and was merely created so people could see various prompts used for each of the three aforementioned models. My goal was to get a feel for which models I felt had the best prompt adherence and also which were the most aesthetically pleasing. The first goal is pretty objective, but the second is subjective. I would also like to spread some more awareness about open source alternatives for fine tuning since SAI's licensing model isn't great to say the least.

I modified a workflow that was created by the fantastic YouTuber, Nerdy Rodent so that all three models generated the same prompt in a row within a single workflow. He offers the workflow on his Patreon, so if you don't want to create it from hand, you can support him and download it. The modification I made was that I added an SDXL pipeline after the PixArt Sigma portion. I did this because the changes the refinement does is pretty minimal since I set the denoise to .40, and also because I find that PixArt Sigma tends to create artifacts, especially with people's faces.

Here is an example of two images. The first is PixArt Sigma with no refinement and the second is after a .40 denoise using Zavy's excellent ZavyChromaXL v7.0 model. As you can see the difference is an improvement but the image retains nearly everything.

https://preview.redd.it/j03wmthusw6d1.png?width=800&format=png&auto=webp&s=8fbdce6382e9e08e0a400ea6b3741d27416cb7d9

https://preview.redd.it/3q73nquxsw6d1.png?width=800&format=png&auto=webp&s=4f7cf2e9292c093fc2398024c5d4045734b48418

The main reason why I chose to do this is a selfish one. I've been really enjoying the output of the PixArt Sigma model, especially after refinement. I've been fortunate enough to have a couple of my images here and here featured on CivitAI that were created using PixArt Sigma and an SDXL refiner.

I was going to include the images directly in this post, but I used 31 prompts and a total of 93 images. I have everything listed in a Google Sheet here. I should add that I chose prompts that were longer than you'd expect. The reason being was that I've read that SD3 responded better to more natural language prompting. I even included a huge prompt as part of the test and lets just say the results weren't great for any of the models.

I tried to include a variety of different styles including photo realistic, anime, illustration, some recognizable characters as well as a couple of prompts with text. Spoiler alert, PixArt and Hunyuan are terrible at generating text.

I'd love to get any feedback you have if you find it useful at all.

Thanks!

EDIT: Ok, I added Lumina-Next-T2I to the comparison matrix as requested. Thanks everyone for the feedback!

Submitted June 16, 2024 at 04:18AM by LawrenceOfTheLabia
via reddit https://www.reddit.com/r/StableDiffusion/comments/1dh4k61/model_comparison_pixart_sigma_vs_hunyuan_dit_vs/?utm_source=ifttt

Leave a comment

Create a free website or blog at WordPress.com.

Up ↑