Using the raw Flux Kontext Dev open-weight model of 23GB...

nico 09/07/2025

Using the raw Flux Kontext Dev open-weight model of 23GB with a LoRa on Lambda.ai cloud GPU.

I have the models stored on my own hard drive, and I upload them directly to the Lambda instance via Filebrowser through an SSH port forward on port 8080. I also expose the ComfyUI dashboard through the tunnel on port 8188.

^{Host LambdaComfy}

^{Hostname 192.9.251.153 #ip lambalabs}

^{User ubuntu}

^{LocalForward 8080 127.0.0.1:8080}

^{LocalForward 8188 127.0.0.1:8188}

I'm using an Nvidia A10 with 24GB of VRAM. It's a bit tight for the full 23GB model, but it works... it takes about 1 to 2 minutes to generate an image.

Depending on your available VRAM

_{32GB VRAM --> Full-speed, full-precision model (24Go)}

_{20GB VRAM --> FP8 quantized (12Go)}

_{≤ 12GB VRAM --> Int8/GGUF small version (~5Go)}

Install on Lambda.ai

_{$ git clone https://github.com/comfyanonymous/ComfyUI}

_{$ curl -LsSf https://astral.sh/uv/install.sh | sh}

_{$ uv add torch torchvision torchaudio}

_{$ uv pip install -r requirements.txt}

_{$ uv run main.py # run ComfyUI dash}

_{$ curl -fsSL https://raw.githubusercontent.com/filebrowser/get/master/get.sh | bash}

_{$ filebrowser -r ComfyUI/ # helping upload models}