I have 2 of them. I would advise against if you want to run things like vllm. I have had the cards for months and I still have not been able to create a uv env with trl and vllm. For vllm, it’s works fine in docker for some models. With one gpu, gpt-oss 20b decoding at a cumulative 600-800tps with 32 concurrent requests depending on context length but I was getting trash performance out of qwen3.5 and Gemma4
If I were to do it again, I’d probably just get a dgx spark. I don’t think it’s been worth the hassle.
If I were to do it again, I’d probably just get a dgx spark. I don’t think it’s been worth the hassle.