Arabic LLM Evaluation Benchmark measures how well large language models handle Arabic compared to English across five evaluation dimensions: factual accuracy, hallucination rate, coherence, RTL ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results