“munhitsu” — Fediverse search results on home.social

I'm playing with G-Eval to test the LLM outputs using LLM. Sounds very meta, but there is logic to it. And it roughly works until it doesn't.
How am I supposed to reason with test result explanation:
"the actual output's prompt is in Polish which mismatches the language-prompt specified as Polish, aligning correctly"
???
#llm #gpt #deepeval

Search