Arabic LLM Evaluation Benchmark measures how well large language models handle Arabic compared to English across five evaluation dimensions: factual accuracy, hallucination rate, coherence, RTL ...