Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I was hoping for the /v1/messages endpoint to use with Claude Code without any extra proxies :(


This is a breeze to do with llama.cpp, which has had Anthropic responses API support for over a month now.

On your inference machine:

  you@yourbox:~/Downloads/llama.cpp/bin$ ./llama-server -m <path/to/your/model.gguf> --alias <your-alias> --jinja --ctx-size 32768 --host 0.0.0.0 --port 8080 -fa on
Obviously, feel free to change your port, context size, flash attention, other params, etc.

Then, on the system you're running Claude Code on:

  export ANTHROPIC_BASE_URL=http://<ip-of-your-inference-system>:<port>
  export ANTHROPIC_AUTH_TOKEN="whatever"
  export CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1
  claude --model <your-alias> [optionally: --system "your system prompt here"]
Note that the auth token can be whatever value you want, but it does need to be set, otherwise a fresh CC install will still prompt you to login / auth with Anthropic or Vertex/Azure/whatever.


yup, I've been using llama.cpp for that on my PC, but on my Mac I found some cases where MLX models work best. haven't tried MLX with llama.cpp, so not sure how that will work out (or if it's even supported yet).


Well, to whoever downvoted my comment: It's supported now!!!! https://lmstudio.ai/blog/claudecode




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: