Lianmin Zheng

2 papers · Latest: September 12, 2023

Efficient Memory Management for Large Language Model Serving with PagedAttention

PagedAttention introduces a virtual memory-inspired method to efficiently manage key-value cache memory in large language model serving, significantly boosting throughput and reducing memory waste.

2309.06180Sep 12, 2023

Natural Language Processing

Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena

This paper demonstrates that strong large language models like GPT-4 can effectively serve as judges to evaluate other LLM-based chat assistants, closely matching human preferences on open-ended tasks.

2306.05685Jun 9, 2023

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.