ARKit 3.0 and WebXR: What is the Future?
(disclaimer: I work for Mozilla right now, but the following represents my personal opinions, not those of the company)
When I look at the great new features in ARKit 3.0 (such as people occlusion, multi-user anchors, and human motion tracking), I find myself pondering what the future of AR on the web is.
The W3C Immersive Web Working Group is getting close to finalizing the initial version of the WebXR Device API, the API that evolved from WebVR, with the goal of providing web pages access to low-level AR and VR platform capabilities. The initial version is primarily focused on VR, so that the many sites that had experimented with WebVR can move to an officially supported API.
In the community, we've been pondering AR specific capabilities, but the diversity of the existing platforms has posed something of a challenge. Some relatively simple common features are bubbling out of discussions, ones that seem reasonable to require all platforms to implement (some notion of anchors, hit-testing against the world, access to some representation of real-world geometry, simple lighting estimation). Other seemingly obvious features (such as access to geospatial positioning) are proving harder (in this example, because many current platforms do not have any notion of geolocation or sensors to support geospatially-aligned orientation).
This set of features supports some pretty compelling use cases, I think. Especially when you imaging that the same web-app could potentially run on any reasonably modern phone, or AR head-worn display. The team I'm on at Mozilla has even implemented many of these ideas in Javascript WebXR API the WebXR Viewer, an experimental AR-focused web browser on iOS; Gheric Speiginer, a student in my lab at Georgia Tech, has also implemented many of them in an unreleased WebXR-focused version of our Argon series of browsers.
But there are some "elephants in the room", some big features that are problematic, such as image and object tracking, persistent anchors and shared experiences. ARKit provides decent image tracking, for example, but most platforms don't. Google, Microsoft and Apple all provide SDK's to create, store and share persistent anchors (i.e., mark locations in the world that virtual content could be attach such that multiple people could see it over time), but they aren't standardized, or open, so it's impossible to imaging how we might standardize on them. Numerous startups are attempting to tackle problems such as persistence and shared localization, but they are no more open. Perhaps one of them will open-source their platforms, so that they could be adopted by everyone as a baseline? It's hard to imagine.
Some advocate exposing camera and sensor information into the web pages themselves, so that cross-platform APIs could be developed to provide solutions to these problems. But it's unclear if that's a realistic solution for more than trivial capabilities; it should be possible to provide simple capabilities (such as basic image tracking) this way, but there are very good reasons that it's taken huge teams of the very best technical folks, working on closed platforms, to finally make progress on some of the more complex problems.
Apple's ARKit 3.0 announcements forced me to ponder this problem, anew.
The community has been discussing these issues since before ARKit 2.0, and the technical evolution of these platforms (ARKit, likely ARCore on Android, and whatever Microsoft releases with Hololens 2) is fast and seems to be diverging rather than converging. A few months ago, I would have said that real-time people tracking and segmentation from the monoscopic cameras on the back of iOS devices wasn't likely. But here it is, and that's amazing to me ... what comes next? Will Google/Microsoft/Facebook/Amazon do something to compete with this? What things will each of these companies release that's interesting and different?
I have no doubt that we will create and deliver a web API that provides a useful set of features, such as the ones I linked to above. And for many applications, the advantages of the web will outweigh the allure of the custom features possible with native apps, especially as people start to realize the privacy implications of the data these APIs are collecting (and handing over to all these apps).
But I can't help but wonder how the web will deal with this explosion of native capabilities that are different across these emerging platforms. One very good reason that the WebXR Device API focused on VR first is that web-based VR is more well-defined (dare I say, trivial) by comparison.
Finally, as this evolution plays out, will the web be able to remain sufficiently powerful, without succumbing to platform-specific extensions and platform divergence? I hope so, but I find myself worrying what the right balance will be.
(banner image: Ron Amadeo, arstechnica wwdc-2019 liveblog)