What happened to the Semantic Web? (2017) [pdf]

echelon · on Sept 24, 2020

I was replying to another commenter, but they deleted their post. I still think my response is legitimate, and I'd like to discuss. I'm including the post and my response to it:

> This might be an unpopular opinion, but the Semantic Web was yet another idea that only could've come from Architecture Astronauts. Neither web developers nor web users wanted or needed it, but the Astronauts just kept pushing forward, with no support whatsoever, until people with actual skin in the game of the future of the Web got fed up and sidelined the W3C in favor of WHATWG/HTML5.

> When the Architecture Astronauts produce a product no one wants/needs at a big megacorp like Microsoft, it's easy to make fun of them, but for whatever reason when it happens in an open-source environment or in academia, people are a lot more likely to lament "Oh, why couldn't they understand?", which has it backwards: it's your job to understand them, and in your arrogance you deliberately didn't do that.

The Semantic Web flew in the face of enterprises like Google. It posited to put all information into a queryable ontology. With a rich and expressive grammar, who needs Google's algorithms to rank anything? All you need to do is ingest and you can weight the edges based on who else consumes them.

Google actually knee-capped the web with the WHATWG's focus on less semantic documents. The platforms with their army of engineers are what did the technology in.

If the Semantic Web had a chance to grow before the giants of tech emerged, we might be looking at a vastly superior internet today.

detaro · on Sept 24, 2020

I'm quite sceptical that "a queryable ontology. With a rich and expressive grammar" would be so obviously easy to use and great that it threatened Google, rather than Google being a frontend for said ontology (that you still need to crawl first to use!). And indeed, what remains of semantic-web like data is massively pushed by Google today, because it makes it a easier for them to provide results based on that - and if you want to convince someone to add it to a website in a commercial setting, "it helps Google understand our site" is the primary argument that sticks (even though it also helps others parse sites).

Which IMHO points to the main problem: Publishing semantic data is work, and had no clear value proposition you could sell businesses on, and it ran out of steam before someone made a convincing one. Niches that see the need for such data publishing still are willing to use this stuff or alternatives, but for the general majority of publishers it isn't there, or actively seen as a negative.

It also didn't help that IMHO the focus was to much on what was theoretically possible, but not on making it actually easy to use, which made even more devs ignore it or build alternatives because the entry hurdle is steep. Plenty APIs could build on semantic web tech, but they don't because a custom REST API is typically just easier to do and thus more familiar. (Despite semantic tech having the groundworks for lots of what's seen as new-ish trends like API generators/machine readable API docs/...)

jcranmer · on Sept 24, 2020

> Google actually knee-capped the web with the WHATWG's focus on less semantic documents.

There's one teensy problem with your thesis: Google came late to the WHATWG.

The WHATWG started initially as a joint Mozilla-Opera project [1] to basically standardize what you had to do to parse the crap that passes for webpages (including de-facto standard behavior for DOM at the time). At the same time, it also included a variety of speculative features to push the capabilities of the web forward: <canvas>, <audio>, <video>, not to mention Web Forms (eventually morphed into all the nice, new <input> types). It even had semantic elements: <section>/<header>/<footer> originate from WHATWG, after all. Not to mention microdata, too.

Google Chrome doesn't come out until 2008. XHTML 2.0 is put out of its misery in 2009, but that the WHATWG was now the driving force in HTML innovation was clear by 2006 or 2007, by virtue of all browser vendors focusing on WHATWG HTML support and not W3C XHTML 2.0. The W3C tries to regain relevance by staring from WHATWG HTML instead of XHTML for HTML 5 in 2007.

[1] Apple was also involved very near the beginning, but I don't know to what degree it was involved.

Karrot_Kream · on Sept 25, 2020

> Google Chrome doesn't come out until 2008

Oh of course, Google didn't have a horse in the race for the application web at all until 2008. Except wait, they had released GMail in 2004 and Google Maps in 2005. In fact, WHATWG made frequent references to GMail and Google Maps as examples of the application web and deficiencies that W3C was not addressing in its XHTML standard.

Then Ian Hickson, the main editor behind the WHATWG HTML5 standardization process, went to Google. One of my favorite quotes from Hickson about the whole process (as I noted in a sibling comment):

"The Web is, and should be, driven by technical merit, not consensus. The W3C pretends otherwise, and wastes a lot of time for it. The WHATWG does not."

This says a lot both about the outlook of the WHATWG of the time and the way the development community of the time saw web corporations at the time. And now we have 2 major browsers as of this writing. It's funny how much flak Berners-Lee gets over his acquiescence to DRM and yet the WHATWG discussions of the time are completely glossed over. I was fairly young when the WHATWG and W3C power struggle occurred, but it was a great lesson for me as a teenage programmer in the power of value capture.

lmm · on Sept 25, 2020

> Google actually knee-capped the web with the WHATWG's focus on less semantic documents. The platforms with their army of engineers are what did the technology in.

Nah. Those platforms had years or decades to make their overengineered crap work, and they never could. XHTML2.0 was never finished, all those RDF ontologies never got anywhere, and all the semantic search engines couldn't find anything.

Google made a search engine that worked better than any other precisely because it ignored all the semantic bollocks, and eventually, after the web had stagnated for years by focusing on said semantic nonsense, the WHATWG formed to standardise some HTML that websites and users actually wanted. The person you were replying to had the right of it. If the semantic web was actually any good, it would've produced a better search engine than Google.

nwmcsween · on Sept 25, 2020

Yep instead of xlst we now have a dozen javascript or server side solutions. HTML5 was pragmatic but to a fault.

lmm · on Sept 25, 2020

> instead of xlst we now have a dozen javascript or server side solutions.

And not one of them is harder to work with than XSLT. I'm all for declarative transformations in theory, but XSLT just sucks so bad if you actually try to use it.

nwmcsween · on Sept 25, 2020

It used to be a mess rendering a site to pdf, it's much better but still isn't great if you look over the issues on various HTML -> PDF projects. The issue with HTML5 is it threw everything away for pragmatism but in doing so we have HTML and javascript, a lot of javascript. I also really didn't mind xlst

baron_harkonnen · on Sept 25, 2020

> If the Semantic Web had a chance to grow before the giants of tech emerged, we might be looking at a vastly superior internet today.

I worked pretty closely with people doing a lot of semantic web work in the 2006ish era, before Google and ML dominated tech. Even then it was nearly impossible to find a single, useful example of the semantic web in action.

The reasoner a la something like OWL is the true heart of the semantic web, but that never got solved at the scale necessary for the web. For any practical problem or even interesting side projects semantic web technologies didn't offer anything that you couldn't build better using existing tools.

People made RDF data stores because they wanted to use RDF, but I don't recall any interesting demos of semantic reasoners.

wwweston · on Sept 24, 2020

The irony here is that Google was built on the light semantics of the hyperlink: the anchor tag is the semantic web, giving you information that a server-side image map or Java applet or Flash movie or JS-driven navigation couldn't reliably identify without putting a hell of a lot more effort into your crawler.

There were probably always limits to how widely useful (or adopted) a single coherent set of resource definitions / markup conventions could have been, but for all the early value demonstrated by Google itself it sure seems like there really wasn't enough thought about the value proposition.

> Neither web developers nor web users wanted or needed it,

Users want search, first without having to put too much thought into modeling what they're looking for, then maybe with some additional options with an intuition or understanding of how results might be improved. They might not know how the semantic web could help with that any more than they know how Ruby helps their application exist, and it makes about as much sense to say users don't care about the semantic web as they don't care about Ruby. Both are potential waiting to be realized into an experience.

Developers, on the other hand, don't have anywhere near as good an excuse.

evrydayhustling · on Sept 25, 2020

> light semantics of the hyperlink: the anchor tag is the semantic web, giving you information that a server-side image map or Java applet or Flash movie or JS-driven navigation couldn't reliably identify

This, 100x. Complex and precise ontologies don't survive a distributed environment because communication is competitive. Publishers don't seek to describe themselves "accurately" (as if everyone could agree what that means), they seek to attract a certain type of attention in maximum quantities.

The protocols that survive that policy framework are those that "carry their own weight" for both listeners (by compressing information that would be hard to get) and publishers (by allowing sufficient expression to pursue selfish ends without lying).

Hyperlinks nail this trade off (at least well enough that pagerank successfully trims the noise). But Semantic Web gave little incentive for publishers to represent themselves, accurately or at all.

xnx · on Sept 25, 2020

Great point. Individual publishers are to be trusted so little in describing their own content that Google completely ignores the keywords meta tag (and also description now I think).

Soshreektricks · on Sept 25, 2020

'Users want search, first without having to put too much thought into modeling what they're looking for'

But Ads ?

And to the next one phrasing: 'light semantics...'

But google want it (company-patented) exclusive for themself ?

left me confused... (-;

detaro · on Sept 25, 2020

> left me confused...

is what your comment does to me. Could you expand what you are trying to say?

nostrademons · on Sept 25, 2020

[Source/disclaimer: I was working on a number of these markup parsing & unstructured data mining projects at Google in the 2009-2014 timeframe, and also wrote an HTML5 parser with the cooperation of the WHATWG. I left for about 6 years during the time period when Google began to be seen as this evil goliath, but I'm back now. My views are my own and don't represent my employer.]

You've got the cause & effect backwards. Google invested heavily in existing standards-based microformats (RDF, schema.org, OpenSocial, various rel= standards) before ultimately concluding that web developers cannot be trusted to get markup right, and most don't want to do it in the first place. It was only after this failure that we threw out all the markup parsing and went to machine-learning & algorithmic approaches, and then eventually to just owning a bunch of the content platforms where people write content.

Customer desires - in the form of what people will actually do, not what they claim they want to do - ultimately decide the structure of a market. If a large number of competing publishers all decide they want to publish content and users divide their attention evenly among them, you get open standards, protocols, and aggregators. If users decide they all want to view the most popular content, then the dominant firms become more dominant, you get silos and walled gardens (and eventually paywalls), and power shifts from an open ecosystem to a few dominant players. Google was built on the open web and embraced the philosophy of it for longer than any other major company, but people there are very pragmatic. When it became clear that there would be a few dominant companies, the priority shifted to being one of them rather than investing in (and getting screwed by) an open ecosystem that was disappearing.

Karrot_Kream · on Sept 25, 2020

> You've got the cause & effect backwards. Google invested heavily in existing standards-based microformats (RDF, schema.org, OpenSocial, various rel= standards)

Mm yes, much like XMPP and Google Chat/Facebook. Pay lip-service to a standard when you _don't_ own the mindshare, drive mindshare to you, then sunset the standard for your implementation when you have the mindshare. An innocent play, for sure.

That said, of course Google was not a founding member of WHATWG. The reason the association is so stubbornly believed is because WHATWG's Ian Hickson went to Google. At the risk of an aside, one of my "favorite" Ian Hickson quotes goes as follows:

"The Web is, and should be, driven by technical merit, not consensus. The W3C pretends otherwise, and wastes a lot of time for it. The WHATWG does not."

I suspect in today's world this phrase would be out of place. But of course, this was a more innocent time.

> before ultimately concluding that web developers cannot be trusted to get markup right, and most don't want to do it in the first place

I think a massive conflict of interest needs to be disclosed here: ad revenue. Microdata and schema.org was a change from the original promise of the Semantic Web. The Semantic Web was about exposing semantic data on pages so that search engines and other data processors could compete on discovery. Microdata and schema.org took a lighter-weight approach that _specifically focused_ on fields relevant to incumbent search engines, a la Google. The legacy lives on as Microdata today is only used as a form of SEO ranking boosting.

> Customer desires - in the form of what people will actually do, not what they claim they want to do - ultimately decide the structure of a market

I'm going to disagree here. Customer desires dictated nothing of the sort. (From what I remember (though I was young at the time), the general public really didn't have much desire or education about the net or the web as a whole at the time.) The big web companies started to understand the value of data. Even a decade ago, the FAANG equivalents (Google, LinkedIn, Microsoft, et al) were already very open internally about how much of their value was in their data or in their network. It wasn't customer desire that drove any of this, it was monetary desire.

> When it became clear that there would be a few dominant companies, the priority shifted to being one of them rather than investing in (and getting screwed by) an open ecosystem that was disappearing.

What a sad sentiment. I disagree. This reads like something out of Machiavelli's The Prince.

lumost · on Sept 25, 2020

Scalable rigid ontologies are hard to the point of being impossible. Even tasks well suited to ontologies such as Question Answering see better results from non-ontological approaches.

ptsneves · on Sept 24, 2020

I am not knowlegeable in semantic web but isn't a hashtag a form of semantic categorizing of data? If so the hashtag movement is very recent. I always wondered who the hastags serve: humans or machines? I think it was machines firstly but then it got appropriated by social dynamics of people and business.

echelon · on Sept 24, 2020

The semantic web was much richer.

You could build an ontology in RDF [1] or OWL [2] and link the grammars together.

FOAF [3] was a means of describing your friends and social connections, and it allowed rich annotation.

No tools were ever built around this because by the time it started taking off, Facebook and MySpace were already a thing.

The great thing about these technologies is that the graph is public and you could write tools to ingest, export, and exchange the information. We never got to that, though.

The technology was centered around graphs and triple stores, allowing you to apply subject-predicate labels to anything. The URI became a central node, document, and ontology identification scheme.

There were languages like SPARQL for querying rich triplestore databases, but they never really took off. Again, Google and Facebook were already huge by the time these started to mature.

If you think the promise of blockchain is exciting, you should take a look at what the Semantic Web could have been...

[1] https://www.w3.org/TR/rdf-primer/

[2] https://www.w3.org/TR/owl2-primer/

[3] http://www.foaf-project.org/

bawolff · on Sept 25, 2020

> The semantic web was much richer

Which is probably why it failed. Richer means more complex, means higher effort, means harder to justify the immediate effort.

It also means more difficult to figure out which sources are trustworthy because it all gets mangled together. The success of google is not that they are good at indexing, but good at ranking. People are constantly trying to game the search engine system. The naive model of querying a semantic web builtup doesn't mesh well with a certain percent of participants being malicious.

And that's excluding the whole problem that scaling searching large complex graph databases is basically an unsolved problem

codetrotter · on Sept 24, 2020

> I always wondered who the hastags serve: humans or machines? I think it was machines firstly but then it got appropriated by social dynamics of people and business.

My impression is that hashtags grew from people on Twitter. A group of people started putting # in front of keywords in their tweets, to mark them as central to the topic of the tweet, so that others could search for those keywords with the # in front and find only tweets that were specifically about that thing rather than happening to use the same word while talking about something else. Twitter staff noticed and the hashtag was turned into a feature. Other sites then adopted the hashtag feature too.

echelon · on Sept 24, 2020

Tagging became popular at around that time. del.icio.us introduced the concept and several other social websites joined the trend: Digg, Blogger, etc.

Twitter was the first to introduce inline tags with the hashtag, and I believe you're correct that it was user-driven behavior.

jolux · on Sept 24, 2020

WHATWG is also somewhat less subtly a vehicle for Google to turn Chrome into the only functioning web browser.

minimuffins · on Sept 24, 2020

> The Semantic Web flew in the face of enterprises like Google.

This is a good analysis.

Oftentimes the answer to why some technology succeeds or fails is more than just purely technical. It can be really useful to think in these kinds of materialist terms ("Who benefits?") to get a fuller picture.

wombatmobile · on Sept 24, 2020

The idea behind semantic web was to embed intelligence in web content through semantic markup, so that downstream agents could make sense of the content.

However the semantic web didn't have a business model to drive adoption. It assumed web content would be authored with semantic indicators that could later be harvested usefully (including commercially), but failed to consider what incentive or economic paradigm would compel content authors to embed semantic indicators.

Meanwhile, the problem/opportunity was solved in other ways.

Consider this example:

- - -

The promise: machine intelligence

“At the doctor's office, Lucy instructed her Semantic Web agent through her handheld Web browser. The agent promptly retrieved information about Mom's prescribed treatment from the doctor's agent, looked up several lists of providers, and checked for the ones in-plan for Mom's insurance a 20 mile radius of her home and with a rating of excellent or very good on trusted rating services. It then began trying to find a match between available appointment times (supplied by the agents of individual providers through their Web sites) and Pete's and Lucy's busy schedules.”

(The emphasized keywords indicate terms whose semantics, or meaning, were defined for the agent through the Semantic Web.)

- - -

The problem was solved by commercial services e.g. google maps in which the semantic layer is provided, retained and exploited by the service aggregator, google, which monetises the service through advertising.

Ironically, "machine intelligence" has not come to mean semantically enriched source documents, but rather, elaborate computationally intensive processes applied to harvest commercial opportunities from unenriched web content.

sircastor · on Sept 25, 2020

I feel like the semantic web was born out of an older way of looking at the web. One that was more about content and distribution of it rather than how it was cataloged. The old “here’s a cool way to share documents” It didn’t need a business model. But the web became something else. Googles dominance in the search business made content mold to their preferred structure, rather than a competitive industry that took content as it was presented.

greendream17 · on Sept 24, 2020

I've though about improving [[Roam]] with semantics but maybe this is a fools errand.

Human's use of language is somewhere between the bag-of-word model and part-of-speech tagging. Neither a word salad nor a constructive proof based in first-order-logic.

  I had a [[meeting]] with [[John Doe]] about [[Project X]].

Is much more natural then:

  _:meeting1 a schema:Meeting;
      schema:participant [ a foaf:Person; rdfs:label "John Doe" ] ;

The early web is also more like roam than the semantic web, and the reason why something like page-ranks works is because the links form "naturally".

The semantic web wanted to make them "understandable" to computers, but made linking too difficult for people.

Even still, the task of understanding humans has nothing to do with semantics. The academic discipline that studies this is called language pragmatics.

BTW, this problem was why Wittgenstein had a nervous breakdown, trying to proof/compute the meaning of statements with other statements.

At the end of the day words mean what they mean, i mean, I have a Beatle in a box in my head and you have a Beatle in a box in your head. Mine walks and flies though the air, yours drives on the road.

We expect computers to be able to do this but I suspect that this won't happen until they can take a shit.

Maybe it's not a bad goal to have, but, even the semantic web (academic) community has given up on that question and has pivoted its effort to saving the web and giving people back control of their data with the power of a digital twin, whatever that is.

MockObject · on Sept 25, 2020

Lojban is a good hybrid as a spoken conversational language that is logically precise enough to be parsable.

https://en.wikipedia.org/wiki/Lojban

JimmyRuska · on Sept 24, 2020

Semantic web is still here, it's just not a dominant headline.

As an enterprise architect I've evaluated most of the major AWS and GCP offerings. We still decided to look into semantic web for many use cases, and it was still the best solution.

Semantic web is many things, so I'll narrow it down to one of the most impressive offerings; RDFox. https://www.youtube.com/watch?v=tpB_tl1Vc0A

You use RDFox when you want a graph database but you also want reasoning. You can do logical deduction to infer new information. Just like programming languages evolved to have modules and namespace systems, semantic web allows you to namespace entities to more easily share data. It's based on descriptive logic, a subset of first order logic.

The alien features I don't see anywhere else;

* I can add logic to my graph database and have it execute as soon as I insert data into the database

* Recursive queries is way easier than SQL

* Forget materialized views and generated columns, RDFox can automatically apply descriptive logic to update facts as soon as you insert data into the database "incremental reasoning"

* The magical declarative syntax that gets reconciled that have made tools popular, like, kubernetes, terraform, graphQL, react, it's now generalized for you to use in any app at almost any scale

* You can put business rules in your database, and even if they heavily chain off other rules, it's lightening fast. You can also just type "explain" to see how data was derived

* If you have streaming pipelines that you were sending to ETL and import back, in many cases you can use the streaming inference to do this all for you without separate apps

* You can import data from Wikidata and other massive RDF based sources. This abstract wikipedia project coming up may have broad implications to the availability of RDF, especially if they're successful and it's heavily copied everywhere for other domains

leipert · on Sept 24, 2020

Interesting. I worked in a semantic web context before and if I recall correctly, with a few million triples data stores started to run into trouble, especially with reasoning.

At what scale are you using RDFox with reasoning?

andrewem · on Sept 25, 2020

What kind of tasks benefit from this tool? I don't understand what you're doing even in theory "when you want a graph database but you also want reasoning".

thelazydogsback · on Sept 24, 2020

I think we missed the boat -- the focus on fancy rendering rather than on the actual data/content set us back years in functionality and helped create free and for-pay walled gardens where actionable data stays server-side and rendering is forced upon us, rather than just having reasonable defaults and then having rendering (incl. multi-modal interaction, etc.) further guided by user-defined settings and heuristics. Even if data happens to be rendered client-side in an SPA, it's done though application-specific means. I suppose deep-learning based data-extraction from free-form pre- and post-rendered content will eventually make up for this, but at least part of that road may have been avoided -- and of course we're probably going to have to pay someone (one way or another) for the privilege of having the content un-rendered back into actionable data at scale...

bane · on Sept 25, 2020

There was never any incentives aligned to make the Semantic Web work. For it to work, page authors had to go through the trouble of making their own content compatible, but it didn't really buy them anything in return and for big content producers it removed their main revenue source -- ads. As a result, the places that had lots of information never bothered, and small users never bothered, and the SW petered out.

In some ways it's similar to what we see in certain news sites. They only link to other pages on their own site even though it would be trivial to link out to the original information pages. Sites will even host public documents in in-line PDF readers in order to not link back to completely publicly available government sites -- scientific and engineering advancements are often vaguely and imprecisely talked about with no link to the original paper or announcement from the source. By living on ad revenue, these sites want to roach motel you into their hypertext jail and the result is information gets twisted and misreported where it propagates and is repeated in other places. News sites will even reference each other without linking back to the other site's original content.

Ads killed the Semantic Web.

cratermoon · on Sept 24, 2020

Simple. Nobody needs it to sell things and make money, or spread ideas and grab power. Amazon isn't going to spend a penny of its income on things that don't have a profitable ROI. Misinformation sources would prefer things not be cataloged accurately an sensibly. The actual portion of the web dedicated to information based on fact and useful content is dwarfed by everything control by actors who are only there to make money or control people.

xnx · on Sept 24, 2020

This is the classic struggle between structured and unstructured data. As difficult as the task of creating AI that can make sense of unstructured data and pages is, that task is still more tractable than getting millions of people around the world to sufficiently coordinate on a common standard.

em-bee · on Sept 25, 2020

this is the crux of the problem. elsewhere here the following definition was posted[1]:

"The Semantic Web is not a separate Web but an extension of the current one, in which information is given well-defined meaning, better enabling computers and people to work in cooperation."

creating a globally agreed on well defined meaning for everything is impossible. sure it can be done and it is done for a very limited set of things, but i can't even agree on a meaning of something with my own self from 10 years ago.

i have yet to see any tagging system that worked consistently. take flickr for example. tags are all over the place.

i have been working on tagging my own photo collection, and it's the same problem, because tagging is so much work that i end up applying only the most important keywords in order to categorize an image, keywords that matter to me at the time.

and the semantic web wants to expand that to everything published? who is going to do sll that work? and how are we going to make sure it's consistent and well defined?

again, as someone else here said[2], the semantic web works in restricted, well defined areas of application. but not globally.

[1]https://news.ycombinator.com/item?id=24584739

[2]https://news.ycombinator.com/item?id=24584504

rediguanayum · on Sept 25, 2020

Isn't the schema.org annotations the semantic web? My understanding is that the adoption small i.e. Wiki says 17% (https://en.wikipedia.org/wiki/Schema.org). Forrester suggests a lack of awareness of the Semantic web by marketers and content creators (https://advertiseonbing.blob.core.windows.net/blob/bingads/m...)

bawolff · on Sept 25, 2020

I feel like recently the semantic web is gaining a bit of a resurgance with wikidata.

But i think the general vision of individual mutually untrusted individuals participating in an interconnected knowledge graph is pretty dead.

* its hard to do quality/concistency control on that which makes the results less useful.

* RDF is rediculously complicated. Microformats/rdfa is a bit better, but still it essentially requires specialist knowledge to do this properly. This strongly discourages average joe from just adding annotations.

* unclear value proposition at the small scale.

* it is very difficult to scale complex queries on large semantic data sets. It can often be hard to predict how performance will change over time. Compare to traditional relational DBs which have very predictable scalability and lots of DBAs who know how to optimize. I think this is probably the biggest hurdle to large-scale adoption.

* * key example: the example queries at https://query.wikidata.org are pretty magical, but when you try your own, you can quickly run into timeouts, especially when nested deeply in a graph (e.g. find all species of plants matching some property)

bloaf · on Sept 25, 2020

Mathematica seems to be betting on knowledge graphs

https://www.wolfram.com/language/12/rdf-and-sparql

and ctrl+f the RDF section here:

https://writings.stephenwolfram.com/2019/04/version-12-launc...

somewhereoutth · on Sept 24, 2020

The abstraction was at the wrong level - text/images/executables/etc is the right level, so essentially simple file formats upon the underlying wire/binary.

Ontologies and so forth are an example of classic modernism (a clockwork universe that we can understand completely). Turns out real life is much more interesting!

donatj · on Sept 24, 2020

Google for many years did not execute JavaScript as part of it's indexing. This had the very positive effect of making the contents of the web far more accessible, as otherwise you simply did not show in search results.

They're kind of bringing this back around with AMP but in a shitty walled garden way.

baron_harkonnen · on Sept 25, 2020

I was a big believer in the semantic web for years, but there is a load of things wrong with it from conceptual problems to practical ones.

For starters the Semantic Web requires an enormous amount of labor to make things work at all. You need humans marking up stuff, often with no advantage other than the "greater good". In fact you do see semantic content where it makes sense today. Look at any successful websites header and you'll see a pretty large variety of semantic content, things that Google and social media platforms use the make the page more discoverable.

This problem is compounded by the fact that ML and NLP solved many of the practical problems that the semantic web was supposed to. Google basically works like a vast question answering system. If you want to find pictures of "frogs with hats on" you don't need semantic metadata.

A much larger problem is that the real vision of the semantic web wreaked of the classic "solution in search of a problem". The magic of semantic web wasn't the metadata; RDF was just the beginning.

RDF is literally a more verbose implementation of Prolog's predicates. The real goal was to build reasoning engines on top of RDF, essentially a prolog like reasoner that could answer queries. A big warning sign for me was that the majority of people doing "Semantic Web" work at the time didn't even know of the basics of how existing knowledge based representation and reasoning systems, like prolog, worked. They were inventing a Semantic future without any sense that this problem has been worked on in another form for decades.

OWL, which was the standard to be used for the reasoning part of the semantic web was computationally intractable in it's highest level description of the reasoning process. If you start with a computationally intractable abstraction as your formal specification, they you are starting very far from praxis.

For this reason it was hard to really do anything with the semantic web. Virtually nobody built weekend "semantic web demos" because there wasn't really anything you could do with it that you couldn't do easier with a simple database and some basic business logic... or just write in Prolog.

A few companies did use semantic, RDF databases but you quickly realize these offered no value over just building a traditional relational database, and today we have real graph data bases in abundance so any advantage you would get form processing boatloads of XML as a graph can be replicated without the markup overhead. And that's not even considering the work in graph representation coming out of deep learning.

Semantic web didn't work because it was half-pipe dream, and not even a very interesting one at that.

iso8859-1 · on Sept 24, 2020

People can't be bothered to run their own websites, that is what happened.

Wikidata was launched, so now you can just host your data there.

Wikidata has better searching than the real semantic web could ever get, since it has a team of devs and sysadmins with a view of the whole dataset.

Also, the Semantic Web had no story for how to contact authors and suggest changes to their schema. DNS does not provide sufficient identity or messaging. But Mediawiki does.

zozbot234 · on Sept 25, 2020

> Also, the Semantic Web had no story for how to contact authors and suggest changes to their schema. DNS does not provide sufficient identity or messaging.

What's wrong with just emailing webmaster@domain.tld ? That usually works.

tomc1985 · on Sept 25, 2020

I think that it is telling that the only times I seem to hear about "The Semantic Web," either Tim Berners-Lee or the ACM are attached to it.

zxteloiv · on Sept 25, 2020

The semantic web would directly benefit the automatic agents/bots rather than human users.

However, lots of use scenarios found it useful to adopt semantic web technologies (RDF, Ontology), including the web services such as searching and recommendation, the technologies like AI and NLP, and the domain-specific requirements like drugs and lawsuit.

The semantic web may not seem useful to us, but definitely affect us.

gklitt · on Sept 25, 2020

Semantic Web postmortem seems like one of those things where everyone is touching their own part of the elephant.

This is the best overview I've found of the basic history:

https://twobithistory.org/2018/05/27/semantic-web.html

dsimms · on Sept 25, 2020

The struggle between semantic and page description is old. In some early arguments HTML "what's this markup thing, why not postscript with hyperlinks?" the most succinct counter argument was "yes, but what does it _mean_". But that was minority for sure.

Someone called this "the revenge of NeWS" (the Sony system) but I can't find a reference for that.

tejtm · on Sept 24, 2020

Instead of being widely distributed w/agents it is stuck in a data store reasoned over off-line and served as a Knowledge Graph (materialized in a solr index) for a limited domain.

in my world anyway

ilamont · on Sept 24, 2020

I took TBL's Semantic Web/Linked Data class 10 years ago (https://www.ilamont.com/2010/09/encounter-with-tim-berners-l...). Here's the description from the syllabus:

"The Semantic Web is not a separate Web but an extension of the current one, in which information is given well-defined meaning, better enabling computers and people to work in cooperation." Linked Data is the use of all that information in a manner that essentially treats the entire web as one virtual database, allowing data to be pulled and manipulated from multiple sites as a single query and, beyond that, allows the sites queried to point to other locations that might have useful information.

Linked Data Ventures is a graduate-level class that combines traditional academics with practitioner perspectives and practical hands-on experience, to empower students to launch new ventures using this technology. From the instructor team, students will learn the technical components that are needed to produce Linked Data offerings, will learn business skills fundamental to growing and sustaining a venture. Weekly guest lecturers will provide insights into how they use the technology in their business and their experience launching a venture. This is a unique course in which teaching is shared between EECS and the MIT Entrepreneurship Center and learning is a cooperative effort between EECS and Sloan students. A team-based project will culminate in sustainable prototypes that could be freeware or have the potential to be commercialized in the future, and will serve as a good launching point for entering into the MIT 100k competition. At the end of the semester, each project team will do a business presentation in front of the whole class and a panel of outside experts and judges, as well as a technical presentation as the conclusion of the lab.

The faculty were clearly interested in encouraging adoption/development of commercial applications. To that end we had several speakers from industry showing off some early-stage applications, and the project teams had to build simple apps based on the technology.

One of the teams actually did achieve success, building a prototype restaurant database using Semantic Web technologies we learned in class. It eventually grew into a commercial service and was acquired by GoDaddy about five years ago. However, to scale it beyond the prototype they had to move beyond Semantic Web concepts and use a different technology approach; IIRC the founder said it just wasn't ready for prime time. He mentioned latency was a huge issue.

coderForFreedom · on Sept 24, 2020

SPAs, mobile apps and Zelenials killed it.

minimuffins · on Sept 24, 2020

I don't see how SPAs and mobile apps necessarily conflict with the ideas behind the semantic web.

Anyway this claim seems to reverse cause and effect. SPAs started getting hyped a lot after it became apparent that big tech institutions weren't really going to pursue semantic web.

There's no real meaning in the claim that "Zelenials" did semantic web in. As far as I know, Zelenials never took a vote and decided not to pursue semantic web. The relevant agents here are huge institutions with the power and desire to shape developments in web technology. Google and whoever.

physicsgraph · on Sept 25, 2020

Many of the commenters here have pointed out that the semantic web did not become popular because the incentives are not supportive of the investment. I agree, and this leads me to wonder if there are domains in which the cost of investing in semantic markup are relevant.

I see scientific publishing as a venue that could clearly benefit from application of a semantic web-like domain-specific approach. There are a variety of possibilities [0], [1], but again many are implemented behind proprietary paywalls. There's value in search and knowledge representation, but the prohibitive cost of implementation is challenging.

[0] https://en.wikipedia.org/wiki/Semantic_Scholar [1] https://en.wikipedia.org/wiki/Web_of_Science

rektide · on Sept 24, 2020

There's two big things missing in this discussion of the Semantic Web to me,

1. Developers. Historically Semantic Web was a lot of RDF & Sparql, which are both imo fairly hostile to developers. There were some decent libraries, but often written in a very oldschool style that made it difficult to even load or use, & with frankly pitiful documentation/tests. A lot of the databases/tooling was paid/proprietary.

The development story is looking much better. Oddball RDF & Sparql are joined by much more mainstream-dev friendly tools: Microdata which is pretty simple marked up HTML & JSON-LD which looks & works like JSON, with a little extra "context" sprinkled in at the top. Libraries are much improved & modernized & mainstream-dev compliant. Datastores like Apache Jena are far more used & there's a lot of ActivityPub & related json-ld-centric data-stores & systems being created & experimented with.

2. Users. The article talks about primary use cases for semantic web, and they are all huge massive industries, not people. We needed semantic web because it would help search. We needed semantic web because it would help social. We needed social web because it would help e-commerce (& look, an article from yesterday about just that![1]).

What's missing is end users. I don't mind that super-large data systems can do interesting things with semantic web. But to me, the purpose was always to enrich the information we users see online with our eyes with powerful & consistent data that our own machines can help use. Our navigator should be helping us, showing us what digital matter we are seeing on the page, rather than letting the page exist as one enormous standalone artifact implicitly composed of arbitrary text & images. There's meaning there, there's thing that we are working with, & semantic web gives us a common operating system for talking about things, & managing them.

Users are still somewhat missing from semantic web. Folks like ActivityPub are doing a wonderful & interesting job using Semantic Web to build common distributed platforms for social, where we can talk about digital matter like Shares and Photos and Favorites in a common way. For now, the semantic web tech remains under the hood, something abstract powering a client that abstracts over the semantic meaning to generate just another anonymous web page, filled with articles and photos and listens and viewings & other social entities, but presented through the veneer of the application, not as discrete social objects unto themselves. I think we're only just starting to explore how to open the Semantic Web up, how to represent semantic data entities & data stores, in a way that will let users interact directly with digital objects, rather than needing the artifice & instrumentation of the application. But this is pretty deep conjecture. What I think is clearer to say is that the end-user has, until very recently, has not seen or understood how semantic web technology might be helping them; it's been a tool for businesses & big data. I look forward to the interesting era of Semantic Web, the era now breaking upon us, when we get to explore how having structured meaningful data can be good for individuals, persons, for personal computing, for small & medium data, & especially, for us to begin to communicate with each other over better structured data. And I think JSON-LD, ActivityPub, & the semantic web is, by far, the most promising & straightforward way to explore these virtues of structured communication.

By contrast, the article's talk about "what's next" is yet more academic projects, machine learning, & trying to represent more things (like actions, which is something absolutely core to what ActivityPub does: represent activities[2]!).

[1] https://news.ycombinator.com/item?id=24557027

[2] https://www.w3.org/TR/activitystreams-vocabulary/

anotheryou · on Sept 26, 2020

[pdf] - oh the irony

The_rationalist · on Sept 24, 2020

It's not that dead, google announced today extended support for it: https://news.ycombinator.com/item?id=24557027

cratermoon · on Sept 24, 2020

Of course it's focused on retail and profit-oriented services.