How to Defend the Metaverse

A call to arms

Apr 17, 2023

The metaverse needs a reckoning.

Like the term itself, the space feels shrouded in ambiguity, garnering far more questions than answers.

Amidst the fog, sentiment is low, the skeptics are out in full force, and the enthusiast’s faith is fading. A classic trough of disillusionment.

If we’re going to climb out, we need to shift the narrative, because the current one is rather uninspiring, plagued by legless avatars, dormant VR headsets, Hollywood grade fraudsters, and a notion unrelatable to most: an entirely virtual world to take us away from the real one we so adore.

To that end, this essay proposes a new framework for defending the metaverse. It’s designed to arm you with better talk tracks for educating and inspiring your friends, your colleagues, and your customers.

The framework consists of three pillars:

The ‘what’ (a more simple and refined definition)
The ‘why’ (why this future matters)
The ‘how’ (how we’re going to get there, and the progress thus far)

I’m going to guess that if you’re reading this, you are well versed in the ‘what, why, and how’ of crypto and NFTs. So, we’re going to mostly focus on my area of expertise; what we at AWS call ‘spatial computing’.

Think of spatial computing as a spectrum of immersive technology, ranging from real-time 3D, to augmented reality, to virtual reality, and all the wonderful experiences they bring to bear, e.g. simulations, games, digital twins, virtual worlds, 3D e-commerce, the list goes on.

We’ll also tease out spatial computing’s ultimate pairing: generative AI and large language models (albeit a topic worth an entire essay in itself).

Without further ado, let’s jump straight in.

The What

As a first order of business, we need a less hand-wavy definition of the metaverse; one that focuses on what it more practically is, versus how it might theoretically manifest (because that’s quite hard to predict).

The metaverse is just the internet, with an upgrade at two layers of the tech stack: the data layer and the experience layer.

At the data layer, it’s an evolution towards more open, immutable databases, allowing for digital ownership, provenance, persistence, and portability. These ‘ledgers’ lead to the ‘personification’ of digital objects, imbuing them with real-world properties and allowing them to become first class citizens within the economy and our culture.

The experience layer is an evolution towards more spatial and intelligent interfaces, with increased immersion, shared presence, and more natural and intuitive interaction (e.g. our voice, hands, eyes, and mind).

Visually, these interfaces conform to how our brains have evolved as humans, to interpret and reason about the world in full dimensions. They also merge the digital world with the physical, bringing our attention back to the real world, and to each other.

As for intelligence, generative AI and large language models catalyze a true merging of mind and machine, bringing the digital world to life in the form of the ultimate co-pilot; automating, predicting, and interpreting our every need.

More simply stated: the metaverse is the internet, with better visualization, intelligent applications, and more empowering forms of data storage/transfer.

The Why

Because we already live in the metaverse, it’s just half-baked and leaves much to be desired.

We already find love in virtual spaces (Bumble, Hinge), forge friendships in virtual worlds (Fortnite), shop in virtual stores (Amazon.com), work in the virtual rooms (Zoom, Slack, etc), and shape geopolitics in virtual town squares (Twitter).

But this half-baked version has its flaws. Be it our obsession with phones and the ensuing consequences, e.g. dwindling attention spans, cricks in our necks, the general disconnection from those around us. Be it the learning curve for building, creating, and using software. Or be it the feudalism that comes with massive, closed repositories of our data, trapped within the databases of a few megacorps, beholden to their terms of service with unfair and unpredictable take rates.

If anything, it’s this current version of the internet that is holding us back, with back-end architectures and interfaces that constrain in so many ways.

The metaverse is our attempt to change this and bring the internet full circle, properly integrating the existing digital world with the ‘real one.’

Success looks like the internet you already know and love, but more experiential, more tangible, more accessible, and more human, with the same properties and liberties that create prosperity in the real world; ones that connect us deeper with all the ‘real world’ things we love most, largely culture, and all of the intangible value locked within; brands, narrative, memes, music, art, etc.

Properties like nonfungibility, and the liberty to own, transport, transform, and transact.

Properties like shared presence, eye contact, and voice interaction, liberating us from tiny rectangles, the need for fervently dexterous thumb, and the need for a computer science degree to code/build.

Now don’t fret. This future won’t entail a brick on your face and complete immersion into a virtual world.

While VR is cool and great for entertainment, it won’t be how most people prefer to access the metaverse.

The preference will be AR and natural language, via a sleek and fashionable pair of glasses that will be advantageous in so many ways.

And no, you won’t have to wear AR glasses all-day long (although, some most certainly will). But you will use them to enhance and enrich particular moments throughout your day, injecting your perceptual system with limitless agency, in any environment, and any situation.

This could be increased agency as a storyteller in the classroom, as a parent sharing a memory, a doctor during surgery, a front-line worker doing maintenance, or a designer creating the perfect home.

It's these 'moments’ that will be the more impactful and lucrative version of ‘the metaverse’; one that blends immersive experiences more naturally and contextually within our day-to-day activities, rather than forcing you to escape and become completely immersed.

At the highest layer of abstraction, consider AR the ultimate tool for communicating any idea; augmenting language with visuals and digital information to improve knowledge transfer and cement understanding (between both humans and machines/LLMs).

Think of AR as ‘Language +’. Or what Terrence McKenna calls, ‘the embodiment of language’.

McKenna is most well known as a pioneer in psychedelics and peak experiences, but he was also one of the metaverse OGs. Here’s a throwback of him waxing poetic about communication in cyberspace.

He says, "Imagine if we could see what people actually meant when they spoke. It would be a form of telepathy. What kind of impact would this have on the world?"

This question is even more pronounced today, amidst two backdrops. On one hand we have extreme social and geopolitical polarity, with both sides of all issues continuously talking past each other. On the other, generative AI, turning our language into media, action, and knowledge.

McKenna goes on to describe language in a simple but eye-opening way, reflecting on how primitive language really is.

He says, "Language today is just small mouth noises, moving through space. Mere acoustical signals that require the consulting of a learned dictionary. This is not a very wideband form of communication. But with virtual/augmented realities, we'll have a true mirror of the mind. A form of telepathy that could dissolve boundaries, disagreement, conflict, and a lack of empathy in the world."

Of course, we shouldn't discredit language. It’s the most powerful technology we’ve created to date, created in the presence of our second most important technology: fire.

The combination of the two brought our ancestors into communion, circling around a campfire to connect, tell stories, teach, make plans, and merge minds. The outcome was the application of humanity’s most powerful ‘inner tech’: our imagination.

Augmented reality and generative AI are creating humanity's next generation camp fire. This combo will produce shared context, enhance the stories we tell, and enliven the lessons we teach; all while keeping us in flow through maintained eye contact, mutual wonder, and shared digital experience.

Human imagination has gotten us pretty far to date. But now, with this next-gen campfire, the limits to realizing our imagination will be removed, our misconceptions about the other will dissipate, and our ability to turn dreams into reality becomes limitless.

The How

Lastly, we need a clearer roadmap towards this end-state.

As much as it pains me, this starts with moving AR/VR further out on the timeline.

While the above vision should be what drives us, the game changing experience we all want/need is further out than originally expected. This article from Matthew Ball explains why.

The TLDR: mainstream XR (extended reality) is at least another 8-10 years away (mayyyyyybe 6-8 with some lucky breakthroughs). Let’s accept that and be patient.

As a silver lining: up until now, we were just feeling in the dark, unsure how to solve AR/VR’s thorniest technical challenges. But now, we have the blueprint and we’re well on our way.

Meanwhile, we can make progress at other critical parts of the spatial computing stack, with tech that works and is rapidly maturing. Particularly with 3D content and 3D scanning of the physical world.

3D content should be the near-term focus: Even if we had the ultimate pair of AR/VR glasses today, they wouldn’t be terribly useful. Outside of games, most brands/companies suck at 3D content creation.

But that’s starting to change.

Brands and companies are starting to think/operate ‘3D first’, arming themselves with the right talent, 3D services, and 3D workflows to crank out immersive experiences, repeatedly and affordably.

When done right, 3D content won’t come in a slow trickle. It will be explosive, like striking a rich oilfield of 3D data.

Most of the brands you know and love; the Nike’s, the BMWs, the Gucci’s, the Ikea’s; they’re already sitting on a wealth of existing 3D assets, most commonly as CAD models of products (digital shoes, watches, cars, buildings, planes, etc).

For the most part, and compared to what’s possible… This 3D data is lying dormant, woefully underutilized beyond the design phase, stuck in fragmented data silos and disparate 3D creation tools.

To tap this 3D oilfield, companies are starting to look/feel like a AAA game or VFX studio. They’re pulling talent from games/VFX and establishing 3D content pipelines and 3D asset management systems designed to capture 3D assets at their source (the product design/R&D teams), and then funnel them downstream across the organization.

In the wake of this activity, new 3D tools and services are coming to light to accelerate the transformation.

At AWS, we recently open-sourced a 3D pipeline and asset management solution called VAMS (Visual Asset Management System). NVIDIA is gaining momentum with Omniverse, revolutionizing collaborative 3D content creation and interoperability between disparate 3D data/tools. Epic Games acquired photogrammetry leader Capturing Reality and SketchFab for 3D asset management/sharing. They’ve since launched Fab: a 3D asset marketplace to help connect 3D content creators to consumers/users. Even old-school Shutterstock is getting into the game, recently acquiring a leader in 3D asset marketplace and asset management product called TurboSquid.

And now, we’re being graced by the ultimate 3D accelerator… artificial intelligence (AI).

A new technology called NERF recently emerged, leveraging AI to better turn 2D photos and videos into 3D. This tech could do for 3D what digital cameras and JPEG did for 2D imagery; dramatically increasing the speed, ease, and reach of 3D capture and sharing.

If you want to geek out on NERF, here’s a good deep dive.

NERF tech is now being merged with computer vision, allowing machines to understand and reason about both the physical and virtual environment, and all the things/items within. Check out this video.

And don’t get me started on the potential of generative AI for 3D…

Today, AI tools like Dall-E and Midjourney can create 2D images of vast, static worlds. Soon, and I mean very soon, these tools will be able to produce 3D objects from scratch, with nothing but a basic prompt (be it text or imagery). The creation of full-on virtual worlds won’t be far behind.

We’re already seeing LLMs being used today as a co-pilot for 3D modeling, e.g. this combo of Stable Diffusion and Blender or this text-to-3D tool from Spline. Absolute game changers.

In other words, the great democratization of 3D creation is well underway.

If we double down on 3D data/content, the incentive for AR/VR adoption will be much higher when it finally has its moment. 3D content is the initial chicken/egg (pending your worldview).

In parallel to 3D creation, we’re also making strides towards the much-needed standards required for the metaverse's holy grail capability: 3D interoperability. Brands and developers are rallying around standard 3D formats such as gLTF and USD, and the Metaverse Standards Forum is cementing best practices in their wake.

This ‘3D data layer’ is not as sexy as headsets, so it avoids mainstream coverage. But it’s arguably the most important layer to prioritize and get right. Everything else will get pulled through as a result.

In the meantime, real-time 3D on a screen is proving a good enough window into the metaverse for many use cases. Let’s focus here for now and let ‘path dependency’ work its magic. Neal Stephenson, father of the term metaverse, explains this best:

“Thanks to games, billions of people are now comfortable navigating 3D environments on flat 2D screens. The UIs that they’ve mastered [keyboard and mouse for navigation and camera] are not what most science fiction writers would have predicted. But that’s how path dependency in tech works. We fluently navigate and interact with extremely rich 3D environments using keyboards that were designed for mechanical typewriters. It’s steampunk made real. A Metaverse that left behind those users and the devs who build those experiences would be getting off on the wrong foot… My expectation is that a lot of Metaverse content will be built for screens (where the market is) while keeping options open for the future growth of affordable headsets”

The AR Cloud: With 3D content well on its way, the next item on the roadmap is the AR Cloud, particularly, it’s key building blocks: LiDAR scanning and VPS (visual positioning systems); all accessible via the sensors on your smartphone today.

Think of the AR Cloud as a ‘digital twin’ of the world; consisting of machine-readable 1:1 scale models (or 3D maps via 3D scans), of places (parks, neighborhoods, cities) and things (buildings, stadiums, factories, classrooms). These 3D ‘maps/scans’ will act as anchor points for interactive, digital layers on top of physical environments, accessible by AR devices based upon context (where you are, what you are doing, who you are with).

When paired with VPS (visual positioning systems) and AI, this effectively becomes a ‘search engine’ for the physical world; connecting atoms to bits and allowing users to access digital information and services/apps associated with any place, person, or thing. As for developers, this will look/feel like an ‘API to the World’, making anything & everything ‘discoverable’, ‘shoppable’ and ‘knowledgeable’.

The AR Cloud concept flies under the radar. But it’s becoming the next Big Tech battle ground, with implications for the future of navigation, shopping, entertainment, news, robotics, health/safety, travel, education, just to name a few. Many believe it will be the most important developer platform since the internet itself… especially when paired with AI and LLM (large language models).

While it sounds sci-fi, this tech is here and now.

Apple offers Spatial Anchors within Apple Maps. Google has similar AR features within Google Maps. Microsoft provides Azure Spatial Anchors, and while historically limited to the Holo-Lens AR headset, it’s becoming more device agnostic. Snap and Niantic have also made major leaps towards this end-game, with Snap Lens Cloud and Niantic’s Lightship SDK: marketed as the developer platform for the real-world metaverse. Lesser-known companies are also on the rise, like Matterport and Prevu3D, both making it very easy to scan the built world with commodity hardware.

Conclusion

I hope this framework leaves you feeling empowered to better defend/justify the metaverse.

Because the narrative shift we need isn’t going to come from Big Tech and mainstream media. It’s going to come from people like you, evangelizing in a more bottoms up, grass roots fashion.

So please, go forth and reshape minds. Help us rid the industry of imprecise definitions, misguided hype, and sensational headlines; remind people we’re already living in a proto-metaverse, with much to be desired; explain how AR can create more connection and understanding; show how ChatGPT makes computing accessible to all; point to the meaningful progress being made upstream of headsets; and most importantly, encourage people to take a more optimistic stance.

Because here’s the reality: the technology genie is out of the bottle, and there’s no turning back. In light of this fact, we have two choices.

We can over index on tech’s past failures, operate out of fear, and complain about the world we’re currently in.

Or… we can learn from tech’s mistakes, anticipate negative externalities, and work to shape the world we want to see.

The choice is yours.

Medium Energy: Spatial Computing, AI, and Being Human

Discussion about this post