Having worked with the "original" Watson, I saw first hand how the system stumbled upon a particularly stupid but hard problem as it tried to scale.
In 2014, I saw a demo of the original Discovery Advisor, which was at the time the closest commercial equivalent to the "Jeopardy system." This demo took in Wikipedia as a corpus, and a question was asked: "what country produced the greatest amount of wheat in 2012?" The system returned a list of countries as answers, so it wasn't quite nonsensical, but it was clear the answers were incorrect. The answers were countries like "England," "Norway," or "Zimbabwe." This system also returned passages from Wikipedia as supporting evidence, but the passages weren't about wheat production. Instead, they were about quotes that contained the word wheat... such as "let's cut the wheat from the chaff."
So of course, some smart-alec in the room Googles the same question, and this was before Google had the ability to return factual answers to factual questions, so instead we got a list of web results. The top result, interestingly, was a Wikipedia article titled "Wheat Production by Country." Opening that article presented a table that clearly showed that China produced the greatest amount of wheat in 2012.
Unfortunately, that Watson system at the time didn't read information from tables. I'm not sure if it does now, but I do know that reading data from tables in a manner that can be easily integrated and scaled within a broader semantic processing system is quite difficult. I'm not as focused on the space as I once was, so I'm not sure if the problem has been well solved yet. If not, I'd say it's a worthy area to invest in a solution.
> I do know that reading data from tables in a manner that can be easily integrated and scaled within a broader semantic processing system is quite difficult. I'm not as focused on the space as I once was, so I'm not sure if the problem has been well solved yet.
I saw a presentation on this paper at SIGKDD this year. https://dl.acm.org/doi/10.1145/3394486.3406468
"Multi-modal Information Extraction from Text, Semi-structured, and Tabular Data on the Web"
> I'm not sure if it does now, but I do know that reading data from tables in a manner that can be easily integrated and scaled within a broader semantic processing system is quite difficult. I'm not as focused on the space as I once was, so I'm not sure if the problem has been well solved yet. If not, I'd say it's a worthy area to invest in a solution.
In the problem space of “reading data from tables in a manner that can be easily integrated and scaled within a broader semantic processing system”... I would assume that “reading data from tables” isn’t the hard part.
You would assume correctly. The core issue is that one can't interpret meaning from a table and its values from semantics alone. A table's layout conveys a great deal of meaning.
I remember looking at a couple of systems that would try to do a visual-based zonal tagging of a table, but I think the challenge there was how to logically integrate the zonal tagging into the broader semantic processing of the surrounding text.
Not being able to construe information from tables is a huge stumbling block for semantic and NLP systems for a large number of use cases that incorporate technical content. Automating patent research is one I looked at 6 or 7 years ago and tables tanked the concept. Semantic search over digitized maintenance manuals is another use-case I've wrestled with that's a tough nut to crack if the underlying manuals aren't available in a structured schema.
Pretty slick but I think calling it "The most comprehensive knowledge platform" is a slight exaggeration. I searched for a few random topics ("urban planning", "pianos" and "bitcoin") and bitcoin was the only one to have a (lengthy) article.
I also don't understand the incentive for users to contribute to a knowledge base that is then being sold: https://golden.com/pricing
Unfortunately this was created because Wikipedia refused to keep articles on obscure altcoins so it’s very crypto heavy. That’s why a16z has jumped in I’d imagine—they very much drank the crypto koolaid and like the synergy.
This was also why I could never really see this becoming a serious product. It’s an SEO trick for ICOs looking to hype themselves, a far cry from a knowledge base. Mismatched incentives will screw up the value prop.
It makes little sense for someone to contribute to a knowledge base like this one.
One of the quotes from the home page - supposedly written by a person who's tried it out - is as follows:
"18 years later, a startup to take on @Wikipedia --> @golden" [0]
This really confuses me. What exactly would it mean to take on Wikipedia in a meaningful sense? You'd have to build the community of Wikipedia with the same kind of values and ethics governing it, then you'd have to get the novel software (the "AI" thing they claim to be using) implemented in a way that doesn't violate said ethics, nor lowers the quality of the information.
Seems like it'd be much better just to work towards putting the AI into use at Wikipedia. Whatever innovation does end up resulting from this company, it will likely get lost if/when it gets acquired, so in the end few if any people will be able to enjoy its benefits.
Really great to see some actual non incremental innovation happening in the search space.
When looking at Golden's value prop, it becomes clear that Google has actually been somewhat, ehm, lazy when it comes to making search better, relying almost 100% on UGC to provide answers instead of trying to structure them in a concise way.
I agree as well with google. When google first showed up, it was a breakthrough since the quality of the results were so good compared to what we are used to. Fast forward to today - more data, more websites and so on has resulted in new problems and I think Google has not kept up. Just because a website has been around for a decade or has 50,000 backlinks doesn't mean it is still good today. Information might be obsolete. Websites like this usually rank high in the search results page while being low-value. Meanwhile, higher-value websites that are more recent get lower rankings even though the quality is superior. Google is not able to make these connections. It seems like AI might be able to improve the quality of search results if applied correctly.
To the contrary, new websites with garbage content, swamped by ads - but published recently - often rank above actual quality sites with well written content from 10 years ago.
Not saying what you described doesn’t happen, but both are problems.
Google has had various products and efforts over the years to do exactly that, so I'm not sure if "lazy" is the right word. Knol was doomed from the start because it foolishly went head-to-head with Wikipedia. Other efforts, like Freebase, were fairly successful, and knowledge graph today is pretty great for what it does, both as part of search results but even more so when powering their ML/vision/NLP APIs.
Looking through Golden's website they seem to want to do all of the same, but using their own (also user-contributed) content, aiming to make it accessible and valuable enough that companies will pay $1000 per month per employee (!) for it. I know almost nothing about the product so will hold off too much judgement, but that sounds like a pipe dream.
I could see this being used more for research type of work. Wikipedia is good for the high-level/popular topics but for very specific fields, there won't be anything significant. Think of areas like drug research.
I gather that the knowledge in Golden is more structured, making it more like a database than a wiki, i.e. more like Wikidata or Semantic MediaWiki than Wikipedia.
Looking at some comments above that maybe crypto was the angle I looked at a few articles and under Ethereum it says that the Constantinople release date was still to be determined (actual release date was 28th February 2019).
It seems there is a decent amount of company data in there (a la pitchbook, crunchbase), but in terms of practical, useful knowledge that is authoritative per the front page, what are some good examples now there?
This particular field is difficult. Other than Google (through Search and also its acquisition of Metaweb), no other company has managed to achieve tremendous revenue with a knowledge base product alone. Cyc, Wolfram Alpha, the original IBM Watson (the expert system, not the ML APIs borrowing the brand) are all surviving but not thriving.
Don't forget the original knowledge base product - Bloomberg. They still bring in a lot of cash each year, but as a private company we can't get exact amounts.
I feel I might be too jaded to comment on this, so let me just preface this as the view from outside silicon valley
> our vision to build an extensive database and graph of knowledge for humanity, including practical commercial tools and community features to aid discovery and decisions.
So you, and your, what? half dozen phds? want to produce a graph of human knowledge. Okay fine. That's a lofty goal, let's set you up like Harry Seldon and check in on you in a thousand years. Oh! You're going to do the almost impossible and have practical commercial tools in our life time. Ok.
Look, I'm not saying that Golden is going to be unsuccessful, it's probably going to be very succesful, they've got those guys that backed that misogynistic online frat house behind them, so there's a certain level of assured successs. I just question why blatantly lying about your goals is a pre-requisit for funding in silicon valley.
Most companies have a large vision. The creation of a far reaching vision/mission is a standard management technique that allows people to judge actions based on whatever it falls in line with the (long term) vision.
This is taught in business school. It’s not a Silicon Valley thing.
... with infinite pockets at the helm and cronies to spin product to which can support the notion of growth far enough in to the hype cycle to secure returns.
Perhaps their CEO will have the moral fiber to sign a commitment guaranteeing a nontrivial (eg. double digit percentage) of annual expenses (not earnings, I'd wager this is a long way from generating profit) going directly to open source database projects they draw from. I'd wager not.
Brings back the question: What problem does it solve? Content? Because it's really lacking. It seems great for a company search, but we have crunchbase for that. I could get a far more detailed explanation on any technical topic from Wikipedia.
Tried searching for Bach, first result besides a bunch of companies (?) is a Canadian product designer. Johann Sebastian Bach isn’t even in the first page of search results somehow.
I am confused as how it can call itself a "knowledge database" when it doesn't even have basic knowledge.
Searched for Falcon, apparently it's a company in the AI industry, has a website falcon.com.cn and is a genus of birds.
Also searched "Apple", got the company, good knowledge base. The fruit Apple is a page that says it's a fruit tree, with the CEO Tim Cook, former CEO Tim Cook, and Timothy Cook.
It seems to just be completely wrong, minus maybe a 100 articles.
I feel like basic encyclopedia information should have at least been pre-populated.
Mozart not much better, but more recent artist seem to score better, though the pages don't contain any content and the meta-information is also not great.
A search for Merkel delivers an arms company before Angela Merkel, GDP delivers some companies.
Relationships between persons don't seem to be present.
A search for Mercedes Benz doesnt deliver anything too great. Snowden requires an Edward to find him, NSA is a company, Chrome has no info.
Maybe I'm looking for the wrong terms, but it seems like they basically just imported companies from public domain and some random stuff on the side, which mostly is just the title of something.
An attempt at an AI enhanced, suped up Wikipedia. Definitely in the model of Freebase.
It'll end in a sell-and-bury exactly as Freebase did, for exactly the same reason: venture capital + knowledge service = only one possible eventual outcome. It's always just a matter of time before the money corrupts the service. The demand for an exit / return (outsized at that, typically) by the owners who have put up a large amount of money forces the matter. Now that big venture capital controls them, they have to pursue revenue and profit as their long-term primary goal for existing, rather than knowledge being at the center of the mission (initially they'll pretend knowledge is at the center of their mission, that will pivot as the return pressure builds on them over time).
When's the IPO? But but but we're a knowledge service, we're here to help humanity. Where's my return? When do I get a 1,000% return on my $10m? But but but we're a knowledge service, we just want to spread knowledge for the betterment of all. Breaking news, July 2024: Golden purchased by Verizon Media [insert big corporate swamp monster here] for $586 million in a fire sale. July 2026, Verizon Media quietly buries Golden.
Andreessen in particular seems bent on driving as many interesting knowledge concepts into the ground as he can. His magic knowledge service touch was all over Rap Genius as well (with dreams of annotate-everything going back to the Netscape days [1]).
There hasn't been a single prominent knowledge service in the history of the Web that has escaped destruction once they've taken big venture capital, except for Stack Exchange and they're starting to teeter on the edge where the owners start to push it in a way that begins the rolling corruption phase (with Stack that inevitable process was delayed for a long time by the influence of its founders and the decisions they made, but eventually papa VC wants his fat return).
The only for-profit knowledge services that survive with their soul intact, are slim independent operations like wikiHow that are not commanded by venture capital and the never-ending need to force an exit.
"But that's just the start. It turns out that Rap Genius has a much bigger idea and a much broader mission than that. Which is: Generalize out to many other categories of text... annotate the world... be the knowledge about the knowledge... create the Internet Talmud."
"Back in 1993, when Eric Bina and I were first building Mosaic, it seemed obvious to us that users would want to annotate all text on the web"
I think this is a reasonable prediction. To paraphrase something I saw on here recently: "in the long run, business model trumps culture". I think there are hundreds of directions the business could go (Freebase being one, the next Bloomberg another, a simple shutdown being the most likely route in VC startups). But I agree with your skepticism that in the long-run the "we're here to help humanity" ethos will take a back seat to the profit motive.
But all that being said, there's always at least a chance that the organization somehow bucks the trend. Or, even if the organization eventually becomes dominated by the profit motive in the long run, that's not to say that it won't build really beneficial things before that happens. Freebase eventually sold and stopped maintaining it, but it built a free database that anyone in the world could use (and still could use). It pioneered a concept. I don't know what Rap Genius is up to these days, but I thought their annotations ux was really innovative and I'm sure pioneered a whole lot of other sites. So even if an organization's mission eventually takes a back seat to profit, it can create ton of value along the way.
Personally I find this startup very interesting and am excited to see where they go.
> The only for-profit knowledge services that survive with their soul intact
I agree about the corrupting influence of VC. The following isn't a super popular opinion on HN lately, but this is exactly why I've been a believer in Medium since they launched their subscription service. It's the rare startup where I could see their financial incentives and also think those incentives would be good for me as a reader. They made the knowledge the product and removed the incentive to use the knowledge as a sales pitch for some other product, i.e. content marketing. And they have to constantly push for articles that qualify as subscription worthy. That means focus on quality. I don't think they've tipped over yet, but what I've seen so far is that the more subscribers Medium gets the more they spend to get better and better articles. And as the payouts to authors get better, better authors come on board.
hadn't heard of it. Sounds like the semantic web or Wolfram Alpha. It's very ambitious and I think almost like an AGI type problem, because parsing human queries to the point where the system can actually reason about semantics, and on its own create ontologies and relationships of everything you find on the web that are actually useful and accurate is difficult.
I wish the pricing/business model supported niche wiki creation. I want to put together a broad public knowledge base about a niche product segment, that connects common data elements for companies and products with deep technical models of the products themselves. Golden's tooling looks super useful, but too expensive for this use case.
In 2014, I saw a demo of the original Discovery Advisor, which was at the time the closest commercial equivalent to the "Jeopardy system." This demo took in Wikipedia as a corpus, and a question was asked: "what country produced the greatest amount of wheat in 2012?" The system returned a list of countries as answers, so it wasn't quite nonsensical, but it was clear the answers were incorrect. The answers were countries like "England," "Norway," or "Zimbabwe." This system also returned passages from Wikipedia as supporting evidence, but the passages weren't about wheat production. Instead, they were about quotes that contained the word wheat... such as "let's cut the wheat from the chaff."
So of course, some smart-alec in the room Googles the same question, and this was before Google had the ability to return factual answers to factual questions, so instead we got a list of web results. The top result, interestingly, was a Wikipedia article titled "Wheat Production by Country." Opening that article presented a table that clearly showed that China produced the greatest amount of wheat in 2012.
Unfortunately, that Watson system at the time didn't read information from tables. I'm not sure if it does now, but I do know that reading data from tables in a manner that can be easily integrated and scaled within a broader semantic processing system is quite difficult. I'm not as focused on the space as I once was, so I'm not sure if the problem has been well solved yet. If not, I'd say it's a worthy area to invest in a solution.