← Back to BlogDEAuf Deutsch lesen

The Secret Rise of the Semantic Web

I've been thinking about the semantic web lately. Not because I'm into standards from 2001 or anything, but because something is happening: the thing everyone said was dead is kind of... not dead?

Let me back up.

In 2001, Tim Berners-Lee wrote a paper in Scientific American about software agents that could crawl the web and actually understand what they were looking at. Not just matching keywords, but reasoning across datasets. You could ask "find me a sewing pattern for a dress with pockets, size 12, suitable for linen, that doesn't require a zipper" and the machine would just... find it.

He called it the Semantic Web.

Unfortunately, that didn't work. We still don't have that kind of search.

The Standards Graveyard

The W3C (World Wide Web Consortium, the standards body for the web) invested a lot of time in the semantic web idea. Between 1999 and 2014, they shipped a pile of standards with acronyms — RDF for describing relationships between things, OWL for defining categories and rules, SPARQL for querying graph data, RDFa for embedding metadata in HTML pages. There was this famous "Layer Cake" diagram with seven layers of abstraction. Unicode at the bottom, trust at the top. Very academic. Very formal.

The problem? No normal person used it.

Aaron Swartz (who was actually involved in the RDF working group early on) eventually called it "overly-complicated hairballs with no basis in reality." To write down someone's name in RDF/XML, you needed eleven lines of XML. Eleven lines. For a name.

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:foaf="http://xmlns.com/foaf/0.1/">
  <rdf:Description rdf:about="http://example.org/person/1">
    <foaf:name>Sabine Schmaltz</foaf:name>
  </rdf:Description>
</rdf:RDF>

No working developer looked at this and thought "yeah I'll sprinkle this all over my site."

The Real Problem: Nobody Wanted To Do The Work

The whole vision depended on website owners annotating their data. Voluntarily. For free.

Cory Doctorow wrote this thing called "Metacrap" back in 2001 that basically said: people are lazy, people lie, and any system that needs millions of humans to carefully structure their content is doomed from the start. He wasn't wrong. Google had already learned this lesson with <meta> tags — people stuffed them with garbage to game rankings, so Google started ignoring them.

Classic chicken-and-egg. No structured data, no cool tools. No cool tools, no reason to structure your data.

The Name Theft (This Part Still Bothers Me)

Before crypto, "Web 3.0" meant the semantic web. Tim Berners-Lee himself used the term.

Then in 2014, Gavin Wood from Ethereum coined "Web3" to mean blockchain stuff. By 2021, crypto hype had completely taken over the name. The semantic web people — who'd been working on this for over a decade — basically had their brand stolen.

Berners-Lee was not happy about it. He called blockchain Web3 "not the web at all." But nobody was listening anymore.

The semantic web was officially dead. Or so we thought.

Wait, It's Kind Of Alive?

Here's where it gets weird.

In 2011, Google, Bing, and Yahoo quietly launched Schema.org. Same idea — structured data — but they actually solved the incentive problem: if you mark up your content, you get "rich snippets" in search results. That's when Google shows extra information directly in the search results — star ratings, recipe cooking times, event dates. More visible, more clicks. People actually did it.

And instead of RDF/XML, they pushed JSON-LD — a format that looks like JSON (the simple data format developers already use for everything) with some extra context. You can copy-paste it. No PhD required.

As of 2024, 41% of web pages have JSON-LD. Over half have some form of structured data. That's up from 5.7% in 2010.

Google's Knowledge Graph (you know, the thing that shows you movie ratings and restaurant hours in the search sidebar) is basically the semantic web, built by a trillion-dollar company. Launched in 2012 with the tagline "things, not strings," it quickly grew to hundreds of millions of entities. Not exactly the open, decentralized dream Tim Berners-Lee had in mind. But it works.

And Then LLMs Made It Easier

Here's my actual hot take: the semantic web didn't fail because the idea was wrong. Structured, queryable data is powerful. It failed because it needed humans to do the structuring, and humans are lazy (see above).

LLMs (Large Language Models — the AI systems behind ChatGPT and similar tools) mostly solve the one thing that killed the whole project.

You can point an LLM at a janky restaurant website — hours buried in a photo, menu in a PDF, address in a Google Maps iframe — and it'll extract structured data. Not because anyone tagged anything. Because the model has seen enough restaurant websites in its training data to predict where the relevant information probably is.

Think about what the semantic web actually needed. Every doctor's office publishing specialties and insurance and availability in structured format. Every restaurant tagging menu items with ingredients and allergens. Every event with proper time and location metadata.

Nobody was going to do that. That was always the fantasy.

But an LLM can crawl the site and make a decent guess. "This is probably a dermatologist in Saarbrücken, looks like it takes AOK and TK, seems to have slots Thursday." Not perfectly every time, but well enough. At scale. Without asking anyone to write XML.

Ontology Wars Were Never the Real Problem

The other thing that killed adoption was everyone fighting about vocabularies. Should reviews go under hasReview or reviewedBy? The W3C spent years trying to standardize this stuff. The real world did not cooperate.

LLMs don't solve this by finding the one true vocabulary. They solve it by understanding context. A field labeled "size" on a sewing pattern page means something different than "size" on a shoe store. The model can figure that out from the surrounding page, not just the schema. Ten websites describing the same thing ten different ways? The model can probably normalize it, most of the time, because it's reading the whole page not just the tags.

So What?

Berners-Lee's 2001 paper imagined software agents that could collect content from all over the web and actually understand what they were finding.

In 2026, that's... basically normal. The semantic web isn't dead. It just needed different workers — LLMs that could populate the structured data that humans never would.

The semantic web is sort of here. It just doesn't look like anyone expected.


I'm not a semantic web historian, just someone who's been thinking about how LLMs change what's possible.