Yi 34B Chat has not done well on my new NYT Connections benchmark and it's only ... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

zone411 on March 10, 2024 | parent | context | favorite | on: Yi: Open Foundation Models by 01.AI

Yi 34B Chat has not done well on my new NYT Connections benchmark and it's only in the 22nd place on the LMSYS Elo-based leaderboard (151 Elo below GPT 4 Turbo). It's doing better in Chinese. When it comes to models with open-sourced weights, Qwen 72B is clearly stronger.

Yenrabbit on March 10, 2024 | [–]

Ooh I also use connections as a benchmark! It tends to favour things with 'chain of thought' style reasoning in the training mix somewhere since directly producing the answer is hard. Do you have public code you could share?

nbbaier on March 11, 2024 | [–]

I'd also love to see how you're doing this benchmarking!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact