Limitations and Improvements
1. Incorrect workflow selection
We make use of Llama 3.1 8B-Instruct-FP8 for our prompt optimization and workflow selection as this is currently the best model available on AkashChat with function calling, however we have noticed that the model will occasionally select the incorrect workflow. In particular it seems to sometimes choose the Anime workflow in cases where anime should not be used.
2. Limited number of workflows
For this proof of concept, we only include 3 workflows (Anime, Fantasy, Realistic). There is clear room for improvement where new workflows could be added for different genres/styles e.g. Painting, B&W, Abstract.
Thankfully, it is easy to add new workflows with our class system and we encourage developers to play around with the API and potentially find some more optimized workflows.
3. Switch from ComfyUI to custom workflow
By switching from ComfyUI to a custom, code based image generation pipeline, we would have more control over adjusting workflows and could create custom filters and functions for the images. We would also remove the overhead of running ComfyUI, improving performance.
A custom solution would also make it easier to implement parallel processing of prompts, improving the scalability of the backend and allowing more users to use the site concurrently.
4. Performance / quality improvements
The easiest way to improve the performance of the image generation is by using a better GPU. The demo video for our project was recorded with the deployment running on an NVIDIA H100.
The frontend and API run on minimal hardware, using a couple vCPU and 512MB RAM each. The main compute bottleneck is the ComfyUI backend and the previous point already mentions how switching to a custom solution could improve performance.
The quality of results could be improved by using a better base model. Our current workflows make use of SDXL as it provides a good balance of performance and quality and has a large amount of community support with custom checkpoints and LoRAs however a model like FLUX.1-dev would provide better quality results at the cost of performance.
The addition of other quality improvements such as detailers to fix problematic generations such as hands should also be explored.
5. Self host S3 container
Our current deployment makes use of Cloudflare R2 for image storage. There are other, open source S3 compatible containers available such as MinIO that could be self hosted on Akash for better control of the data and potential cost savings.
Unfortunately due to time constraints, we were unable to make use of MinIO in our hackathon submission.
6. Decrease deployment file size
The current allocation for the ComfyUI image is 50GB, although we were able to successfully deploy an instance using 40GB of storage. This is far from massive but there are still ways storage could be improved. Our ComfyUI install uses 18GB of storage if we switch from ComfyUI like previously mentioned, a large space savings could be made.
Since all our checkpoints are trained off SDXL, we could save some storage by downloading the CLIP and VAE decoders separately so they can be shared across and downloading models sans these files.
Further savings could be had by optimizing model choice. We currently make use of a different checkpoint for each workflow for the best possible image quality although it would be more space efficient to use one checkpoint and LoRAs on top of that checkpoint for different workflows. This is more complex and it is harder to find good checkpoint/LoRA combinations.
Conclusion
We welcome any contributions to the project, particularly in any of these aspects. Feel free to fork the project or make a PR.
Thank you for reading!
Last updated