Itay Itzhak
2 papers ยท Latest:
Natural Language Processing
From Feelings to Metrics: Understanding and Formalizing How Users Vibe-Test LLMs
This paper formalizes user "vibe-testing" of LLMs, developing a pipeline that personalizes evaluation to better reflect real-world usefulness beyond benchmarks.
2604.14137
Natural Language ProcessingGrowing Pains: Extensible and Efficient LLM Benchmarking Via Fixed Parameter Calibration
This paper introduces an IRT-based framework for extensible and efficient LLM benchmarking, using anchor items to ensure score comparability over time.
2604.12843
๐ฌ Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week โ summarized, scored, and delivered to your inbox every Monday.