So, following a blog post[1] from @webology and some guides about #VSCode plugins online, I set up #ollama with a number of models and connected that to the Continue plugin.
My goal: see if local-laptop #llm code assistants are viable.
My results: staggeringly underwhelming, mostly in terms of speed. I tried gemma3, qwen2.5, and deepseek-r1; none of them performed fast enough to be a true help for coding.
[1] https://micro.webology.dev/2024/07/24/ollama-llama-red-pajama/
@webology if it turns out I went about this the wrong way I'd love to know! I'd love for this to be viable, and I'd also love to be able to do local video transcription.
@phildini I don’t know that those models are optimized for that use-case. They have models optimized for code but i don’t use them because Claude 3.7 is better. At least one of those models has think tags so it should be slower.
I use MacWhisper and I think https://pypi.org/project/mlx-whisper/ for some bulk transcriptions. Start with the first and see how fast it goes.
@phildini @webology If you want to optimize for latency I can recommend checking out what the upstream llama.cpp folks have been up to:
https://github.com/ggml-org/llama.vscode
https://github.com/ggml-org/llama.vim
It requires some tuning on the llama setup side to optimize for latency; these two plugins are a good start.
@phildini @webology but also let's be realistic: If you try this on a laptop and it's not one of those recent Macs you gonna have a bad time.
Saying that as someone who tried it on a four year old ThinkPad X1 on its CPUs with AVX2 and even the small models are taking too long to be viable for latency-critical workflows.
@djh @phildini When I played around with it, I was using Zed and connecting to Ollama over Tailscale on an M2 64 GB Mac Studio and it wasn't as slow as I expected it to be, but it didn't inspire me to use it.
Local tool calling and Agents are more what inspire me than Copilot. I skip the IDE completely and use Claude Code or the Claude Projects + MCP and I'm not sure why people like Copilot after seeing that work well.