Skip to content

CUDA out of memory issue #96

@Hotohori

Description

@Hotohori

I switched because of the support of max_memory to the actual build (so NOT v1.1.0) and it was running without any problem to the end until I need to selected which trial I want to save locally.

I select a trial, give it a name and when it should start saving I directly get a "CUDA out of memory" error. The app still runs, but it happened 3 times again, then I switched back to the trial selection, select the same one again and this time the app crashed completely with the same message.

Don't know what issue it is, maybe a memory handling problem.

I tried to use heretic on my local RTX 4090 on the Mistral Nemo (12B) model which didn't fully goes into VRAM. I run into a out of memory issue with 1.1.0 and could only use a batch size of 32, failed on 64, with limiting the VRAM to 18GB I could use a batch size of 128 what, even when more of the model is on CPU, still a lot faster. But I hit that problem at the end.

Nvidia-smi (I'm on CachyOS) said heretic used 22,5GB VRAM while I only selected a trail to save, so before saving.

I added my tries to save it with all errors until the app crashed into the log.txt.

log.txt

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions