This will work on a 24gb gpu with like 80GB RAM used, could maybe lower duration to use less or something.
To install I would use python 3.10 and torch 2.8. Linux is much less painless for you to use.
python3.10 -m venv env
source env/bin/activate
pip install torch==2.8.0+cu128 torchvision --index-url https://download.pytorch.org/whl/cu128
pip install -r requirements.txt
python k1.pyyou need to clone the full model into the kandinsky5 folder to make it work or edit the .yaml in configs for the model you are using.
git-lfs clone https://huggingface.co/kandinskylab/Kandinsky-5.0-I2V-Pro-sft-5s-Diffusersto use the gui:
source env/bin/activate # or env/scripts/activate on Windows PowerShell
python k1.pyopen browser goto 127.0.0.1:7860
I added some mixed models here: https://huggingface.co/maybleMyers/kan/
You can input them in the DiT Checkpoint Path in the gui.
Windows install is hard, I would recommend activating linux before trying to run, someone said triton-windows 3.3.1 works.
I added support to save latents and continue from saved latents checkpoint because some of the hd generations take a long time.
denoise videos is still a wip. Video joining seems to work ok with 1-2 conditional frames and 2 normalizations.
download https://huggingface.co/kandinskylab/Kandinsky-5.0-I2V-Lite-5s/blob/main/model/kandinsky5lite_i2v_5s.safetensors that model and put it in the lite_checkpoints subfolder. You need to have the full i2v pro diffusers cloned in the root directory. Select mode - i2v and model configuration 5s Lite (I2V) . Either mess with the vae config or set 1 block swapped for now to offload before vae decoding.
Discord for help etc. https://discord.gg/wDaEfNGuCX
Changlog:
12/3/2025
Add ultravico for long video generation, some v2v supports.
11/21/2025
Add mag cache, token counter, some other bugs. You need to pip install tiktoken or reinstall requirements.
11/20/2025
Add support for fp8 scaled, disabled compile, intermediate generation saving, hd gens.
11/18/2025
Add support for lite i2v https://huggingface.co/kandinskylab/Kandinsky-5.0-I2V-Lite-5s/tree/main/model
make previews better
11/17/2025
Add preview support. Add int8 support to drastically lower ram/vram reqs.
Project Leader: Denis Dimitrov
Team Leads: Vladimir Arkhipkin, Vladimir Korviakov, Nikolai Gerasimenko, Denis Parkhomenko
Core Contributors: Alexey Letunovskiy, Maria Kovaleva, Ivan Kirillov, Lev Novitskiy, Denis Koposov, Dmitrii Mikhailov, Anna Averchenkova, Andrey Shutkin, Julia Agafonova, Olga Kim, Anastasiia Kargapoltseva, Nikita Kiselev
Contributors: Anna Dmitrienko, Anastasia Maltseva, Kirill Chernyshev, Ilia Vasiliev, Viacheslav Vasilev, Vladimir Polovnikov, Yury Kolabushin, Alexander Belykh, Mikhail Mamaev, Anastasia Aliaskina, Tatiana Nikulina, Polina Gavrilova
@misc{kandinsky2025,
author = {Alexey Letunovskiy, Maria Kovaleva, Ivan Kirillov, Lev Novitskiy, Denis Koposov,
Dmitrii Mikhailov, Anna Averchenkova, Andrey Shutkin, Julia Agafonova, Olga Kim,
Anastasiia Kargapoltseva, Nikita Kiselev, Vladimir Arkhipkin, Vladimir Korviakov,
Nikolai Gerasimenko, Denis Parkhomenko, Anna Dmitrienko, Anastasia Maltseva,
Kirill Chernyshev, Ilia Vasiliev, Viacheslav Vasilev, Vladimir Polovnikov,
Yury Kolabushin, Alexander Belykh, Mikhail Mamaev, Anastasia Aliaskina,
Tatiana Nikulina, Polina Gavrilova, Denis Dimitrov},
title = {Kandinsky 5.0: A family of diffusion models for Video & Image generation},
howpublished = {\url{https://github.com/ai-forever/Kandinsky-5}},
year = 2025
}
@misc{mikhailov2025nablanablaneighborhoodadaptiveblocklevel,
title={$\nabla$NABLA: Neighborhood Adaptive Block-Level Attention},
author={Dmitrii Mikhailov and Aleksey Letunovskiy and Maria Kovaleva and Vladimir Arkhipkin
and Vladimir Korviakov and Vladimir Polovnikov and Viacheslav Vasilev
and Evelina Sidorova and Denis Dimitrov},
year={2025},
eprint={2507.13546},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2507.13546},
}
We gratefully acknowledge the open-source projects and research that made Kandinsky 5.0 possible:
- INT8 Suport — Lodestone's ramtorch for int8 drop in support and triton kernel.
- PyTorch — for model training and inference.
- FlashAttention 3 — for efficient attention and faster inference.
- Qwen2.5-VL — for providing high-quality text embeddings.
- CLIP — for robust text–image alignment.
- HunyuanVideo — for video latent encoding and decoding.
- MagCache — for accelerated inference.
We deeply appreciate the contributions of these communities and researchers to the open-source ecosystem.
