Question from a noob: how good would it be to run those on a computer with AMD A...

eurekin · on April 30, 2023

Another noob here. If I had to guess, it's because current models are mostly memory bound. The AI learning gpus (A100, H100 etc.) are not the best TFlop performers, but they have most vram. It seems that researchers found a sweet spot for neural network architectures that perform good on similar configurations, i.e. near real time (reading speed in LLMs). Once you bring those models to cpu, they might get performance bound again. Llama.cpp somehow illustrates that a bit, for bigger models you tend to wait a lot for the answer. I suspect the story would be similar with igpus

hawski · on April 30, 2023

So possibly some basic iGPU (maybe even Intel) with lots of VRAM assigned could be enough?

crowwork · on April 30, 2023

You can try out the demo and benchmark yourself

junrushao1994 · on May 1, 2023

As long as there is a Vulkan SDK for your AMD APU (likely there is), MLC-LLM can use TVM Unity to generate code for it