Knowledge graphs are web3

Post

People have been discussing the idea of web3 since 2014, but is it clear what this platform is? After the publicity of 2021, when asked, most people would probably say that the core idea of web3 is ownership. As we’ll see, this focus on assets may have prevented us from seeing what web3 really is.

So what is web3?

Web3 is a new platform for decentralized applications. But while having programmable money and assets is a great addition to the toolkit, the core of web3 is information. You can think of DeFi as being about value and web3 about information. The key properties that define web3 are: verifiability, openness and composability. Let’s take them in turn.

Verifiability

On the web, you have to trust the app developer to manage data in your best interest. In practice many decisions get made in the interest of the developer and against the users. In web3, we want to remove this trust. Users should be able to inspect any piece of information to see the full provenance of where it came from and why they’re seeing it in that location.

The first step for verifiability is cryptographic signatures. Every action needs to be cryptographically signed by the user. With blockchains like Ethereum, we also get consistency and consensus on time so everyone can agree that an action was performed at a certain time. On top, we can perform various forms of verifiable computation with differing security, cost and latency tradeoffs. These tools are evolving, but the key piece is that web3 enables users to verify that what they’re looking at is correct and exactly where it came from.

Openness

An open platform is one that anyone can build on without limitation or fear. Web2 is characterized by platform risk. It turns out that the only way to guarantee openness is to have the underlying compute infrastructure entirely provided by decentralized networks. With decentralized networks, anyone can run a node so if one service provider changes the deal, someone else can pick up and provide the original service. Token incentives ensure that you don’t have to rely on altruism for these services to exist and operate with high quality. The Graph, Filecoin, Livepeer and Arweave are all decentralized web3 infrastructure networks.

Openness also means portability. That applies to private data, shared data and public data. Users should be able to move data across applications freely, and they should be able to create personalized software and AIs on top of any personal and all public data.

Composability

The final and maybe most important point to understand is composability. We have decent versions of verifiability and openness in the web3 ecosystem but where’s composability? Do we have a universe of amazing composable web3 apps that share complex data? What makes information composable?

Developers spend a lot of their time wrangling data. Transforming it from one format to another, storing it in databases, moving it around. And yet most applications live in silos. Composability is about increasing the reusability of software. Making it so that any application can reuse both data and code from other applications.

The most common type of databases, relational databases, and traditional APIs are inherently non-composable. They have rigid schemas that may work for some applications but not others.

Events example

Let’s take a look at an example. Say we want to build a web3 app for Events. There may be a conference coming up with Speakers and Talks at a Venue in a nearby City. Say we want to enable not just one but lots of different applications to operate on this shared data. How would we do that? The first thing we need is IDs to reference the objects we’re describing. IDs are common in software and their main characteristic is that they’re unique - but in web3 we have to go one step further. IDs have to be globally unique. Who controls those IDs? What’s the ID for San Francisco? What’s the ID for the concept of Art? For Vitalik's Devcon talk?

Next we need a way of describing the structure for the data. What fields does an Event have? A Venue? A Talk? Are they always the same? Are they different based on the context? It turns out that the same words can have different meanings in different contexts. A Crypto event may be different from a Music event or a Sports event, but some fields may be the same. Ontologies, or the way information is structured, need to be scoped to a given domain. But while ontology lives within a context, information is relational and is all interconnected. Knowledge forms a global graph.

Knowledge Graphs

Knowledge graphs are the most flexible way of representing information. Instead of using tabular information with a fixed schema, we use individual data points called Triples. A triple has an Entity ID, an Attribute ID and a Value, or EAV for short. An Entity is the thing that you’re talking about - like a specific Event, Venue or City. The Attribute is the property of that thing you want to describe. For example, a Venue has a Name, Address and a Geo Location. And the Value is the value of that Attribute.

Triples are the atomic unit of knowledge graphs, and they’re infinitely composable. Because the Entity IDs and Attribute IDs are globally unique, they are fully self describing and can be created and consumed by any application. Not only that but different domains can have different views on the same data. Applications can pick and choose what they want to use with maximum reuse and maximum flexibility at the same time.

If you had a global knowledge graph, you could ask The Graph any question. Questions like “Where did this idea first pop up?” and “How are we doing on this goal?” or “What jobs are available for someone with my skills towards this mission?” You could aggregate public facts and preferences across communities. Each community would have their own knowledge graph and all those knowledge graphs would be interconnected.

The Semantic Web

I’m not the first person to realize the power of Knowledge Graphs. In fact, they were always supposed to become the foundation of The Web. It was called the Semantic Web. Tim Berners-Lee and the original architects of the web have been working toward the Semantic Web for decades. But the rise of centralized platforms took them by surprise and changed the course of the web for the last 25 years.

Why did the Semantic Web fail to take off? The core technologies and W3C specs are a knowledge graph data format called RDF and graph query language called SPARQL. I think they failed for 3 main reasons: 1) The developer ergonomics were too cumbersome. These standards were made by academics and they weren’t easy for developers to pick up. 2) You need some kind of standards or governance process for groups to agree on shared ontologies. If only there were a new way to get groups of people to consensus. 3) There was no incentive to run the infrastructure and the centralized platforms outcompeted the open ones by capturing value and investing it in better infrastructure and user interfaces.

For these reasons, the original vision of the web never materialized. One where users are in control, communities are democratic and everything is open and decentralized. But times have changed. The pendulum of centralization is swinging the other way. A new set of tools have arrived to address the problems that prevented freedom and composability from reaching the masses.

Knowledge graphs are web3. And web3 is ready to rise.