Mayur Naik
2 papers ยท Latest:
Natural Language Processing
MathDuels: Evaluating LLMs as Problem Posers and Solvers
MathDuels is a self-play benchmark where LLMs author and solve math problems, revealing decoupled capabilities and dynamic evaluation.
2604.21916
Artificial IntelligenceDetecting Safety Violations Across Many Agent Traces
Meerkat uses clustering and agentic search to detect rare, complex safety violations across many agent traces, outperforming existing methods.
2604.11806
๐ฌ Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week โ summarized, scored, and delivered to your inbox every Monday.