wandering.shop is one of the many independent Mastodon servers you can use to participate in the fediverse.
Wandering.Shop aims to have the vibe of a quality coffee shop at a busy SF&F Convention. Think tables of writers, fans and interested passers-by sharing drinks and conversation on a variety of topics.

Server stats:

884
active users

@webology if it turns out I went about this the wrong way I'd love to know! I'd love for this to be viable, and I'd also love to be able to do local video transcription.

@phildini I don’t know that those models are optimized for that use-case. They have models optimized for code but i don’t use them because Claude 3.7 is better. At least one of those models has think tags so it should be slower.

I use MacWhisper and I think pypi.org/project/mlx-whisper/ for some bulk transcriptions. Start with the first and see how fast it goes.

pypi.orgClient Challenge

@phildini @webology If you want to optimize for latency I can recommend checking out what the upstream llama.cpp folks have been up to:

github.com/ggml-org/llama.vsco

github.com/ggml-org/llama.vim

It requires some tuning on the llama setup side to optimize for latency; these two plugins are a good start.

GitHubGitHub - ggml-org/llama.vscode: VS Code extension for LLM-assisted code/text completionVS Code extension for LLM-assisted code/text completion - ggml-org/llama.vscode

@phildini @webology but also let's be realistic: If you try this on a laptop and it's not one of those recent Macs you gonna have a bad time.

Saying that as someone who tried it on a four year old ThinkPad X1 on its CPUs with AVX2 and even the small models are taking too long to be viable for latency-critical workflows.

@djh @phildini When I played around with it, I was using Zed and connecting to Ollama over Tailscale on an M2 64 GB Mac Studio and it wasn't as slow as I expected it to be, but it didn't inspire me to use it.

Local tool calling and Agents are more what inspire me than Copilot. I skip the IDE completely and use Claude Code or the Claude Projects + MCP and I'm not sure why people like Copilot after seeing that work well.

@phildini @webology that's why I am eager to buy HP Z2 mini G1a, as soon as it will be available.

@phildini @webology you can use qwen2.5-coder as code assistant, it is also available in ollama. 7b and 3b versions are good candidates.