Hardware

I'm letting siri see my life on vision pro, and it's a sign of things to come

At a glance:

  • Vision Pro developer preview adds a visual‑intelligence Siri that can identify real and virtual objects on demand
  • The feature works as a one‑snap‑per‑request assistant, not a continuous live feed
  • VisionOS 27 introduces a panoramic‑photo‑to‑3D‑background conversion tool

First impressions of visual‑intelligence siri

Apple unveiled the next generation of Siri in the Vision Pro developer preview at WWDC, branding it as a “visual‑intelligence” companion. When the author said, “Hey, Siri, what’s in front of me?” the system emitted a chirp, captured a still image using the headset’s cameras, and then displayed a text box describing everything it saw. The description included real‑world items like a bookshelf packed with titles such as Uzumaki and Wonderbook, as well as virtual overlays like a Parisian window and a floating clock widget. Siri manifested as a glowing 3D orb that could be dragged around the room, casting soft light on the desk thanks to VisionOS’s spatial graphics engine.

The experience feels like a blend of Apple’s long‑standing voice assistant and the emerging camera‑aware AI found on competing headsets. While the author notes the preview is still early—Siri tends to stay locked on a single view until the orb is closed or moved—the overall impression is that Apple is ready to bring a powerful, context‑aware assistant to mixed‑reality hardware.

How the visual‑intelligence mode works

Unlike iOS and iPadOS, where visual‑intelligence features launch through the Camera app, Vision Pro lets users invoke the capability simply by saying, “Hey, Siri.” The headset then uses eye‑tracking to determine the focal area, snaps a still image, and runs on‑device object‑recognition algorithms. The response is delivered as a text overlay rather than a live video feed, meaning each query is a discrete “snap‑and‑describe” interaction.

In the author’s test, Siri correctly identified both physical and virtual objects. Real items such as a red Virtual Boy headset and a Steam Deck console were named, while virtual elements—including the panoramic Paris window and the clock widget—were also recognized. This dual‑reality awareness underscores Apple’s ambition to let developers blend digital content with the physical world without sacrificing contextual understanding.

Limitations and future expectations

The current preview is limited to single‑snapshot queries; there is no continuous live‑mode like Samsung’s Galaxy XR or Meta’s Quest glasses. Siri also appears to linger on the first captured view until the user explicitly resets it, which can feel a bit sluggish for rapid‑fire queries. Nevertheless, the author speculates that this early version is a stepping stone toward more fluid, always‑on visual assistants that could eventually power Apple’s rumored smart glasses.

Pricing remains a barrier: the Vision Pro costs $3,499, putting the technology out of reach for most consumers today. The author wonders how the feature will translate to cheaper form factors once Apple releases its anticipated AR glasses, which would likely compete with Google’s and Xreal’s Project Aura offerings.

New panoramic‑photo‑to‑3D background feature

VisionOS 27 also introduces a panoramic conversion tool. Users can select any panoramic photo from their library, and the system renders it as a large 3D “wraparound” background that sits behind their workspace. The effect is not a fully immersive environment—there is no ambient sound or parallax movement—but it does provide a sense of depth, with the edges of the real‑world office still visible.

The author experimented with a family photo taken in a backyard during the pandemic; the converted background made the scene feel present in the virtual office. Not all panoramas convert perfectly at this stage, but the feature hints at future improvements such as multi‑photo Gaussian splat captures, a technique the author uses on Meta Quest.

What this means for the next generation of wearables

By exposing a visual‑intelligence Siri now, Apple signals that future wearables will likely embed similar assistive capabilities. If third‑party apps can hook into Siri’s framework, developers could build bespoke workflows that blend voice, vision, and spatial computing. The ability to ask an AI to “summarize a Notes app” or list open browser tabs on a MacBook demonstrates a workflow‑centric vision where the headset becomes a central hub for both digital and physical information.

The preview suggests a roadmap where Apple’s ecosystem expands beyond iPhone‑centric AI, leveraging the headset’s cameras and eye‑tracking to deliver context‑aware assistance across a range of devices—from watches to AR glasses. While the current implementation is modest, the underlying technology points toward a future where mixed‑reality wearables act as truly perceptive assistants, blurring the line between the real and the virtual.

Editorial SiliconFeed is an automated feed: facts are checked against sources; copy is normalized and lightly edited for readers.

FAQ

What is the visual‑intelligence mode on Vision Pro?
Visual‑intelligence mode lets users summon Siri with a voice command, after which the headset snaps a still image of the area the eyes are focused on and returns a text description of both real and virtual objects. It works as a one‑snap‑per‑request assistant rather than a continuous live feed.
How does Siri identify objects in the headset?
When you ask Siri a question, Vision Pro uses its built‑in cameras and eye‑tracking to capture a single frame, then runs on‑device object‑recognition algorithms. In tests it correctly named items such as a bookshelf, the books *Uzumaki* and *Wonderbook*, a Virtual Boy headset, a Steam Deck, and virtual elements like a Paris window and a clock widget.
Can panoramic photos be turned into 3D backgrounds on Vision Pro?
Yes, VisionOS 27 adds a panoramic‑photo conversion feature that transforms any panoramic image into a large 3D wraparound background. The result is a static depth‑enhanced scene that sits behind your workspace, though it lacks full immersion features like ambient sound or motion parallax.

More in the feed

Prepared by the editorial stack from public data and external sources.

Original article