How do you measure tokens/sec? Here's my attempt on a new M4 Max 128GB, does abo...

state_less · on Dec 7, 2024

LM Studio puts stats at the bottom of each reply like: 2.09 tok/sec, 346 tokens, 1.74s to first token. This was for a 259 word response, so ~ 0.75 words/token. If that ratio holds, you might be getting 8 tok/sec on you M4 Max?

Looks like LM Studio is available for ARM based Macs, if you want to give that a try, that'd be one way to get these stats. LM Studio also surfaces up some parameters to play around with, and keeps a record of past conversations if that might appeal to you.

pmarreck · on Dec 7, 2024

Yeah, I already use it along with Ollama, I just didn't notice those stats I guess!

evilduck · on Dec 7, 2024

Just add "--verbose" to your run command, e.g. "ollama run mistral-nemo:latest --verbose", it'll dump the token counts and timing info after each message.