Tag

Microsoft Research

Stories with this tag. Sections and all tags live in the Topics menu; for full-text use search.

Co-occur with these stories — for navigation and internal links.

Microsoft's DELEGATE-52 benchmark shows LLMs still can't be trusted with document editing

A Microsoft-led benchmark of 19 LLMs across 52 professional domains finds that even frontier models lose a quarter of document content after 20 delegated edits, with Python the only reliably automatable domain.
May 13, 2026
llm reliability enterprise AI document automation AI benchmark Microsoft research