Lianmin Zheng
2 papers ยท Latest:
Machine Learning
Efficient Memory Management for Large Language Model Serving with PagedAttention
PagedAttention introduces a virtual memory-inspired method to efficiently manage key-value cache memory in large language model serving, significantly boosting throughput and reducing memory waste.
2309.06180
Natural Language ProcessingJudging LLM-as-a-Judge with MT-Bench and Chatbot Arena
This paper demonstrates that strong large language models like GPT-4 can effectively serve as judges to evaluate other LLM-based chat assistants, closely matching human preferences on open-ended tasks.
2306.05685
๐ฌ Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week โ summarized, scored, and delivered to your inbox every Monday.