Some thoughts on AR

Over the last few months, I've talked to countless reporters, and answered questions via phone and email. It occurred to me the other day that it might be good to start sharing some of these questions and answers here, so more people can see them; if I'm going to the trouble to explain AR and my views on it, why never put those ideas out there?   So, without further ado, here are some current thoughts on AR.

Can you explain what augmented reality is?

Augmented reality is a technique for presenting information to a person by blending it with their perception of the world around them. Typically, this has meant using see-through displays to merge 3D graphics with the physical world; it can also mean using spatialized audio to put sound out in the world. The see-through displays could be real transparent displays, such as the Microvision virtual retinal displays; more typically right now, the see-through displays are created by combining a camera with a normal display (such as the camera on the back of a smart phone) and adding graphics to the video before displaying it. When you do this in real-time, these "video-see-through" displays give the illusion of being see-through.

The key is that the information is "registered" (i.e., aligned) with the view of the physical world. There are many interesting things that can be done with location-sensitive information, such as doing location-based search via google-maps, or telling stories or creating games based on the place or time the game is played. These can be fun and useful, but I would call these "mixed reality" or "context-aware computing." Augmented reality is a subset of these concepts, that adds the idea of perceptually merging the content with the world.

To overlay graphics (or sound) correctly on the view of the world, the computer (or phone) needs to know precisely where the world is in relation to the camera. The more accurately the position and orientation of the device is known, the more accurately the graphics and the physical world can be combined.

On a smart phone, you can try and use the GPS and other sensors (e.g., compass, accelerometers) to estimate the position and orientation of the phone relative to the world; the problem is that the limited accuracy of those sensors restricts the capabilities of these applications. In talks and presentations recently, I have been showing a video I made of the Yelp Monocle system, where I look around the courtyard in front of my building, and none of the labels align with what they are referring to. This is not a bug in the Yelp iPhone application; it's a limitation of the accuracy of the GPS, and is common to all GPS-based AR applications on mobile phones today.

Alternatively, computer vision can be used (with smart phones, or with web-cameras on computers) to find known objects in the world (e.g., the little black and white square markers used in a lot of AR demos) and determine the position of those objects relative to the camera. Since this can be done very accurately, these systems can put graphics very tightly on these objects. If you look at the videos of our AR games (i.e., on our youtube channel at www.youtube.com/aelatgt) you'll see that we mostly do these sorts of techniques.

How is this technology syncing with today's electronics on the portable side?

The growing proliferation of smart-phones with cameras, GPS, compass and accelerometers means that primitive AR applications can be created now. There is a wealth of geo-located data available on the web (e.g., all the location-searchable content on Google and Bing, plus Flickr images and soon Twitter tweets). So, it's straightforward (though not entirely trivial) to create a "browser" for one of these information sources. As the technology gets better, and the sources of geo-located data grow and is refined, there will be a natural progression of quality and capability of these applications.

The real question, of course, is "Are any of these geo-located information-browsing applications useful?" Time will tell. I think there is some utility over the top-down 2D view, but there are a lot of negatives imposed by the limited quality of the sensors and the crude quality of the data (see my comment about the Yelp application above). So, right now, there is excitement generated by novelty, but eventually these applications will need to do more.

The next big breakthrough will come when there is enough infrastructure in place to support combining these sensors with computer vision outdoors, away from known markers and objects. This is the focus of significant research in universities and companies right now, and will lead to some advances as early as the next few years. When vision-based tracking can be used outside, there will be a dramatic change in the kinds of applications that will be able to leverage AR.

And on the stand-alone side?

Consider desktop computers and consoles.  The technology is here to do really interesting things. Companies like Imagination (in Austria), Total Immersion and Metaio have great technology, which hasn't really be leveraged non-trivial ways yet; most of the uses have been pretty simple advertising sites, or gimmicky applications like the Topps baseball card augmentions (which are cute, but of dubious long-term interest). Some applications (like the Lego kiosks) are actually pretty nice, since they solve a problem (even if only a minor one) at the place they are deployed ("what's the model inside this box look like?").

If you look at Sony's EyePET game, you can start to see what will be possible in the desktop camera space. The interesting thing there is that the PS3 is actually a pretty weak machine by today's desktop computer standards (compared to a souped up gaming PC or Mac, for example), so there is a lot of potential for interesting applications.

The big problem with the desktop is the relationship between the camera and the screen. Over the many years I've worked with AR, it has become obvious that the first-person perspective created by see-through displays (e.g., head-worn displays or handheld smart-phones) is far easier to understand, far more compelling and useful than the "magic mirror" effect you get when you have a camera that is not attached to the display. When using a web-camera or something like the PSEye and a computer display or HDTV, where the camera is either looking out at you or something you hold in your hand, the merged world you are "seeing" on the display is not connected to where you are physically looking. While people can learn to use the systems, and they can still be fun, they are nowhere near as natural or immersive as the first-person ones.  Think about the difference between playing tennis (looked at your opponent) and playing Wii Tennis (standing beside your opponent, with both of you looking at the screen).  It's similar in some ways, but not the same.

You need to try both sorts of AR experience to understand the difference. When you look at videos on the web, they look the same; but the experience is completely different, and the videos are often staged to hide these differences. For example, in the EyePET videos, the kids are often looking at the little monkey-like creature on the ground in front of them; however, in the game, they would be looking up at the screen, to see the merged view, and they would see themselves looking at the screen, not at the pet.

We've been able to do these sorts of "magic mirror" AR things for years, but it is the handheld applications that have really sparked peoples imagination.

How do you see this technology being used for videogames?

I see it being used as an extension of what we are currently seeing. In the near future, we'll see a lot more of the Sony EyePet and Sony Invisimals kinds of games, that use fixed cameras on consoles or use markers and game-boards with the handhelds.

Handheld consoles combined with game boards and markers are the area where I believe the most exciting things will occur, and the limits of what is possible is largely dictated by the underlying tracking technology. The current GPS plus compass plus camera setup on current mobile phones will not allow any really good games to be built, in comparison, but once we can use the video to do precise outdoor tracking, we will start to see amazing things on these devices as well. Imagine taking our ARhrrrr! game outdoors, where you are fighting zombies on the street, or shooting at them from your windows. We will be creating an outdoor AR game this year, using some experimental tracking software one of my students is developing with Nokia. Our goal is to have a multiplayer outdoor game on Nokia N900's, with the same kind of tight registration between the physical and virtual worlds you see on our other tabletop games.

What impact do you see augmented reality having on entertainment?

In addition to games, I see lots of potential in the social applications of AR, akin to all of the little applications you see in Facebook right now. We ran a class last year, where we asked the students to imagine "mobile AR facebook" applications, and the prototypes they created were very exciting.

Similarly, I see lots of little casual "AR toys" that let kids mix virtual and physical content. We have a project where we are taking MIT's Scratch programming environment and adding AR to it, and hope to have a "player" for it running on smart phones this year. The idea is to let kids create little AR games and toys and share them with each other. When the next generation of mobile camera-based handheld game devices come out (e.g., the next DS or the camera-enable iPod Touch) we can start imagining kids creating their own AR experiences.

What are some examples of what your lab has done in the gaming space?

You can see videos on www.youtube.com/aelatgt and on our lab web site.

There are three main games that are interesting, that we've learned interesting things from.

Bragfish (the paper on this was published last December at ACE2008) was the first one we created that we actually studied. It's a two-to-four-person "around the gameboard" game, meant to explore the social experience of playing a combination of a board game and computer game. The key things we found are that when the graphics are really tightly registered with the physical world (the gameboard in this case) people start to treat the handheld as a window into the play space, which is "on" the table. This is important, because people then start being able to use all of their perceptual and physical skills to interact and to understand what the other players are doing. The second thing we found is that, not-surprisingly, the experience of playing these games was very different than other multi-player computer games, because the players felt like they were "in" the same space.

In Art of Defense (published this summer at the SIGGRAPH Games track), we looked at the same issues in a collaborative game. We also included a lot more tangible props and game pieces. We found that the shared understanding of the space carried over to collaboration and that people really could smoothly collaborate by leveraging the physical props, rather than having to refer to the small screens. Again, the experience was very different than other collaborative computer and handheld games, according to the players.

Finally, my students and I created the ARhrrrr! game with Tony Tseng and his students at SCAD-Atlanta (and with help from NVidia and Daniel Wagner at Graz University) to explore a much richer, engaging experience. This game points to one future category of table-top handheld AR games, where a 3D world is superimposed on a table, and people can play around it. We hope to do a lot more in this space over the next year, and I can see building many commercial games based on the ideas we've had.

What do you think we'll be seeing at CES in January that will help propel augmented reality into the mainstream?

I don't expect much revolutionary to appear, honestly, but I do expect some obvious evolutionary steps.

I expect new head-worn displays to appear (companies like Vuzix have been promising that), which will generate some excitement, but not to have a huge impact yet because they won't be that well integrated into the platforms nor have much in the way of compelling experiences.

I expect a flood of new phones, based on the high-end chipsets (e.g., the Tegra, the Snapdragon, the OMAP3, etc). The real question in this area will be "will those phones have the right APIs and sensors to support AR experiences that can fully leverage the platform?". We could build AR games like ARhrrrr! on the iPhone right now, if we could get at the camera video stream efficiently in real time; the limit is not the hardware, but the OS. And so it will be for the new devices. Another example: Android doesn't allow efficient access to the video for computer vision, so the Droid may not end up being the breakout AR platform it could be.

We'll also see a host of new UMPC style devices that start to look like smartphones, but have better technology. It's unclear what this will mean though, since they typically aren't aimed at kids.

The big question will be in the space of devices aimed at kids. If Apple opens the video API, and releases a camera-based iPod Touch, this will be a game changer because of the size of the market. Kids don't own iPhones; the phone is cheap, but the plans cost more than $1000/year, which most kids can't possibly afford. But any kid could have an iPod touch. If Apple doesn't do it, perhaps someone else will. Will Sony get behind the PSP with camera, and do more games like Invizimals? The PSP and DSi are pretty weak platforms, so they games are limited, but what will the next portable look like? Will Microsoft release a XBox GO based on the hardware they are developing for their phone, akin to the iPod Touch/iPhone dynamic?  Who knows.