Managing Barista and Janitor @proprietor

Recent searches

Search options

Only available when logged in.

So, following a blog post[1] from @webology and some guides about #VSCode plugins online, I set up #ollama with a number of models and connected that to the Continue plugin.

My goal: see if local-laptop #llm code assistants are viable.

My results: staggeringly underwhelming, mostly in terms of speed. I tried gemma3, qwen2.5, and deepseek-r1; none of them performed fast enough to be a true help for coding.

[1] https://micro.webology.dev/2024/07/24/ollama-llama-red-pajama/

micro.webology.dev · Jul 24, 2024🦙 Ollama Llama 3.1 Red PajamaFor a few weeks, I told friends I was excited to see if the new Llama 3.1 release was as good as it was being hyped. Yesterday, Llama 3.1 was released, and I was impressed that the Ollama project published a release to Homebrew and had the models ready to use. ➜ brew install ollama ➜ ollama serve # (optionally) I run Ollama as a background service ➜ brew services start ollama # This takes a while (defaults to the llama3.

Mar 15, 2025, 05:26 AM·

3boosts·6favorites

**phildini** @phildini · Mar 15

Mar 15

phildini @phildini

@webology if it turns out I went about this the wrong way I'd love to know! I'd love for this to be viable, and I'd also love to be able to do local video transcription.

**Jeff Triplett** @webology@mastodon.social · Mar 15

Mar 15

Jeff Triplett @webology@mastodon.social

@phildini I don’t know that those models are optimized for that use-case. They have models optimized for code but i don’t use them because Claude 3.7 is better. At least one of those models has think tags so it should be slower.

I use MacWhisper and I think https://pypi.org/project/mlx-whisper/ for some bulk transcriptions. Start with the first and see how fast it goes.

pypi.orgClient Challenge

**Daniel** @djh@chaos.social · Mar 15

Mar 15

Daniel @djh@chaos.social

@phildini @webology If you want to optimize for latency I can recommend checking out what the upstream llama.cpp folks have been up to:

https://github.com/ggml-org/llama.vscode

https://github.com/ggml-org/llama.vim

It requires some tuning on the llama setup side to optimize for latency; these two plugins are a good start.

GitHubGitHub - ggml-org/llama.vscode: VS Code extension for LLM-assisted code/text completionVS Code extension for LLM-assisted code/text completion - ggml-org/llama.vscode

**Daniel** @djh@chaos.social · Mar 15

Mar 15

Daniel @djh@chaos.social

@phildini @webology but also let's be realistic: If you try this on a laptop and it's not one of those recent Macs you gonna have a bad time.

Saying that as someone who tried it on a four year old ThinkPad X1 on its CPUs with AVX2 and even the small models are taking too long to be viable for latency-critical workflows.

**Jeff Triplett** @webology@mastodon.social · Mar 15

Mar 15

Jeff Triplett @webology@mastodon.social

@djh @phildini When I played around with it, I was using Zed and connecting to Ollama over Tailscale on an M2 64 GB Mac Studio and it wasn't as slow as I expected it to be, but it didn't inspire me to use it.

Local tool calling and Agents are more what inspire me than Copilot. I skip the IDE completely and use Claude Code or the Claude Projects + MCP and I'm not sure why people like Copilot after seeing that work well.

**Matasoft** @matasoft@mastodon.world · Mar 15

Mar 15

Matasoft @matasoft@mastodon.world

@phildini @webology that's why I am eager to buy HP Z2 mini G1a, as soon as it will be available.

**Edward** @cosmoscalibur@col.social · Mar 15

Mar 15

Edward @cosmoscalibur@col.social

@phildini @webology you can use qwen2.5-coder as code assistant, it is also available in ollama. 7b and 3b versions are good candidates.

Drag & drop to upload

Recent searches

Search options

Administered by:

Server stats:

Recent searches

Search options

Administered by:

Server stats:

Back