Thinking Machines wants to build an AI that actually listens while it talks

SiliconFeed EditorialMay 12, 2026

AI interaction model full duplex Thinking Machines Lab Mira Murati

Sections and tags — in the Topics menu Search the feed

At a glance:

Thinking Machines Lab unveils TML-Interaction-Small, an AI model enabling full-duplex conversation.
The model responds in 0.40 seconds, matching human conversational speed.
Limited research preview expected in coming months, with wider release later this year.

The Evolution of AI Conversation

Current AI interactions follow a rigid turn-taking pattern: you speak, the model listens, it responds, and you listen again. This sequential process, while functional, lacks the fluidity of human dialogue. Thinking Machines Lab, founded by former OpenAI CTO Mira Murati, aims to shatter this mold with its new "interaction models," starting with TML-Interaction-Small. The core innovation is "full duplex" capability, allowing the AI to process input and generate responses simultaneously. This approach promises a more natural, phone-call-like experience compared to the staccato rhythm of text-based chats. The company demonstrated that its model achieves a response time of 0.40 seconds, which aligns with the natural pauses in human conversation and outpaces similar offerings from industry giants.

Technical Breakthrough and Benchmarks

Full duplex in AI refers to the ability to handle multiple communication channels at once, akin to how humans can listen and talk concurrently in a lively discussion. Thinking Machines claims TML-Interaction-Small operates at roughly this speed, a significant leap from the often noticeable delays in models from OpenAI and Google. While exact benchmarks for competitors aren't disclosed, the 0.40-second mark is presented as a milestone. This performance is not just about raw speed; it's about reducing cognitive load for users, making interactions feel more intuitive and less like commanding a tool. The model is currently in a research phase, with the company emphasizing that real-world usability will be the ultimate test once it becomes widely available.

The Team Behind the Innovation

Thinking Machines Lab launched last year with Mira Murati at the helm, leveraging her experience as OpenAI's former chief technology officer. Murati's background includes leadership roles in developing advanced AI systems, and her new venture focuses on creating more interactive and responsive AI. The startup's emergence highlights a growing trend among former Big Tech AI researchers to pursue ambitious, user-centric projects outside large corporate structures. This independence may allow for faster iteration and bolder design choices, such as prioritizing real-time interaction over other capabilities. The company's vision extends beyond mere efficiency; it seeks to embed interactivity as a native feature of AI models, potentially redefining how humans and machines collaborate.

Standing Out in a Crowded Field

In the competitive landscape of AI assistants, speed and responsiveness are key differentiators. OpenAI's models, like GPT-4, and Google's offerings, such as Gemini, currently dominate but operate on traditional half-duplex principles. Thinking Machines' approach could carve out a niche for applications requiring seamless dialogue, such as real-time translation, customer service, or therapeutic chatbots. However, achieving low latency without sacrificing accuracy or increasing computational costs remains a hurdle. The company's claims, while impressive, will need independent verification once the model is released. If successful, this technology might pressure larger firms to accelerate their own full-duplex research, sparking a new arms race in conversational AI.

Potential Applications and User Impact

The implications of full-duplex AI extend across various domains. In education, tutors could engage in more dynamic, back-and-forth explanations. For accessibility, real-time conversational agents could assist users with disabilities more naturally. In professional settings, AI mediators might facilitate meetings by processing multiple speakers simultaneously. Yet, challenges persist: ensuring the AI doesn't interrupt inappropriately, managing overlapping speech, and maintaining context over long exchanges. The user experience will hinge on subtle nuances—does the AI feel like a thoughtful partner or an intrusive interrupter? Early research previews will be crucial for gathering feedback and refining the balance between speed and coherence.

Release Roadmap and Industry Implications

Thinking Machines plans a "limited research preview" in the next few months, followed by a broader release later this year. This phased rollout allows the team to test the technology in controlled environments before public deployment. The timeline suggests confidence in the underlying research, but also caution given the complexity of real-world interactions. For the AI industry, this move signals a shift toward more immersive interfaces, potentially influencing product strategies at larger companies. Investors and developers will watch closely: if TML-Interaction-Small delivers on its promises, it could set a new standard for responsiveness, encouraging more human-like AI designs. Conversely, any shortcomings might temper enthusiasm for full-duplex approaches until further advancements emerge.

Conclusion: A Step Toward Natural AI Dialogue

Thinking Machines Lab's venture into full-duplex AI represents a bold attempt to make artificial intelligence feel less like a tool and more like a conversational counterpart. While benchmarks are promising, the true measure will be user experience once the technology is in people's hands. As the company navigates the research-to-product journey, the AI community will observe whether this innovation can transition from a technical feat to a practical, everyday utility. For now, it adds a compelling chapter to the ongoing quest for more intuitive human-machine interaction, with potential ripple effects across tech development and user expectations.

Editorial SiliconFeed is an automated feed: facts are checked against sources; copy is normalized and lightly edited for readers.

FAQ

What is full duplex in the context of AI models?

Full duplex refers to an AI's ability to process incoming information and generate responses simultaneously, mimicking natural human conversation where people can listen and talk at the same time. Thinking Machines Lab's TML-Interaction-Small is designed to operate at this level, achieving response times around 0.40 seconds, which is comparable to human conversational speed.

When will TML-Interaction-Small be available to the public?

The company has announced a "limited research preview" scheduled for the next few months, with a wider public release planned for later this year. This staged approach allows for testing and refinement before full deployment.

How does TML-Interaction-Small compare to models from OpenAI and Google?

Thinking Machines claims its model responds in 0.40 seconds, which is significantly faster than comparable models from OpenAI and Google. However, specific benchmarks for those competitors are not provided, and the real-world performance will depend on factors like accuracy and context management once the model is widely used.

More in the feed

Prepared by the editorial stack from public data and external sources.

Original article