Itay Itzhak

2 papers · Latest: April 15, 2026

From Feelings to Metrics: Understanding and Formalizing How Users Vibe-Test LLMs

This paper formalizes user "vibe-testing" of LLMs, developing a pipeline that personalizes evaluation to better reflect real-world usefulness beyond benchmarks.

2604.14137Apr 15, 2026

Natural Language Processing

Growing Pains: Extensible and Efficient LLM Benchmarking Via Fixed Parameter Calibration

This paper introduces an IRT-based framework for extensible and efficient LLM benchmarking, using anchor items to ensure score comparability over time.

2604.12843Apr 14, 2026

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.