Hacker News new | past | comments | ask | show | jobs | submit login

Been thinking about this a lot [1]. Will this fundamentally change how people find and access information? How do you create an experience so compelling that it replaces the current paradigm?

The future promised in Star Trek and even Apple's Knowledge Navigator [2] from 1987 still feels distant. In those visions, users simply asked questions and received reliable answers - nobody had to fact-check the answers ever.

Combining two broken systems - compromised search engines and unreliable LLMs - seems unlikely to yield that vision. Legacy, ad-based search, has devolved into a wasteland of misaligned incentives, conflict of interest and prolifirated the web full of content farms optimized for ads and algos instead of humans.

Path forward requires solving the core challenge: actually surfacing the content people want to see, not what intermiediaries want them to see - which means a different business model in seach, where there are no intermediaries. I do not see a way around this. Advancing models without advancing search is like having a michelin star chef work with spoiled ingredients.

I am cautiously optimistic we will eventually get there, but boy, we will need a fundamentally different setup in terms of incentives involved in information consumption, both in tech and society.

[1] https://blog.kagi.com/age-pagerank-over

[2] https://www.youtube.com/watch?v=umJsITGzXd0




I agree that this is the core question, but I'd put it as: Who Gets To Decide What Is True?

With a search paradigm this wasn't an issue as much, because the answers were presented as "here's a bunch of websites that appear to deal with the question you asked". It was then up to the reader to decide which of those sites they wanted to visit, and therefore which viewpoints they got to see.

With an LLM answering the question, this is critical.

To paraphrase a recent conversation I had with a friend: "in the USA, can illegal immigrants vote?" has a single truthful answer ("no" obviously). But there are many places around the web saying other things (which is why my friend was confused). An LLM trawling the web could very conceivably come up with a non-truthful answer.

This is possibly a bad example, because the truth is very clearly written down by the government, based on exact laws. It just happened to be a recent example that I encountered of how the internet leads people astray.

A better example might be "is dietary saturated fat a major factor for heart disease in Western countries?". The current government publications (which answer "yes") for this are probably wrong based on recent research. The government cannot be relied upon as a source of truth for this.

And, generally, allowing the government to decide what is true is probably a path we (as a civilisation) do not want to take. We're seeing how that pans out in Australia and it's not good.


> To paraphrase a recent conversation I had with a friend: "in the USA, can illegal immigrants vote?" has a single truthful answer ("no" obviously)

Er, no, the meaning of the question is ambiguous, so I'm not sure "has a single truthful answer" is accurate. What does "can" mean? If you mean "permitted", then no. But if you mean can they vote anyway and get away with it? It's clearly happened before (as rare as it might have been), so technically the answer to that would be be yes.


This is a fundamental limitation of language. The LLM is likely to provide a good answer here even though the question is technically ambiguous, because it likes verbose answers.

Equally "can" is used to substitute for other precise words. Humans are good at inferring context, and if someone asked me "can illegals vote" I'd say "no". Just like if someone said "can you pass the salt" I pass the salt, I don't say "yes".

If the inferred context US wrong then the "truth" is wrong, but as with talking to humans it's possible to refine context with a follow up question.


To clarify: I wasn't making a pedantic linguistic point here. The fact is that when I first read the question, I actually thought it was asking "are illegal immigrants able to cast a vote (regardless of legality)?"

It literally did not even (initially) occur to me that the question might be asking about legality, because the entire modern political discourse surrounding illegal immigrants and voting has been with regards to whether they can cast votes despite not legally being allowed to. The answer to "is this legal" would have been such an obvious "no" to people on both sides of the debate --- and thus the question so silly --- that initially it didn't occur to me that the intended question might have been about legality, until I continued reading the comment and realized that was the intention after all.


>It literally did not even (initially) occur to me that the question might be asking about legality, because the entire modern political discourse surrounding illegal immigrants and voting has been with regards to whether they can cast votes despite not legally being allowed to.

Besides, going by legality, illegal immigrants "couldn't" even have passed the border into the country to begin with. But obviously they could, hence their status as illegal immigrants.

There's a contradiction if the AI answers "no" to "can they vote" (implicitly having the legality in mind), while accepting that they can exist in the country as illegal immigrants (implicitly ignoring the legality of border crossing).


> Besides, going by legality, illegal immigrants "couldn't" even have passed the border into the country to begin with.

Of course they could. Being an "illegal immigrant" does not imply unlawful border crossing. From https://en.wikipedia.org/wiki/Illegal_immigration_to_the_Uni...

"Visa overstayers mostly enter with tourist or business visas.[99] In 1994, more than half[108] of illegal immigrants were Visa overstayers whereas in 2006, about 45% of illegal immigrants were Visa overstayers.[109]"

(Here in the UK, the vast majority of illegal immigrants arrived legally and overstayed their visas. Yet our Conservative government, who oversaw a large increase in such arrivals, tried to blame all the country's woes on a few small boats illegally crossing our southern border; which is effectively a rounding error).

As an aside, refugees are actually allowed to make unlawful border crossings. From https://en.wikipedia.org/wiki/Convention_Relating_to_the_Sta...

"The contracting states shall not... impose penalties on refugees who entered illegally in search of asylum if they present themselves without delay (Article 31), which is commonly interpreted to mean that their unlawful entry and presence ought not to be prosecuted at all[18]"


>Of course they could. Being an "illegal immigrant" does not imply unlawful border crossing.

Same difference. Whether it's border crossing or visa overstay, from a pure legality aspect the answer is still "they couldn't".

>The contracting states shall not... impose penalties on refugees who entered illegally in search of asylum if they present themselves without delay (Article 31), which is commonly interpreted to mean that their unlawful entry and presence ought not to be prosecuted at all

I'd wager 99% do not "present themselves without delay", so don't fall in this case...


> Of course they could. Being an "illegal immigrant" does not imply unlawful border crossing. From https://en.wikipedia.org/wiki/Illegal_immigration_to_the_Uni

Nice catch!

Keeping with the theme:

> Yet our Conservative government, who oversaw a large increase in such arrivals, tried to blame all the country's woes on a few small boats illegally crossing our southern border

a) What was the exact claim (the word "all" caught my eye)?

b) is it the sole claim?

c) If one person in a group does something, are all members of the group "doers of that thing"?

d) for (d), does the answer depend on what the thing is, and if so should we perhaps imagine everyone is speaking a bit tongue in cheek?

Etc


Also, is the impact e.g. on crime of "visa overstays", that is, from people who were vetted, registered, and got a visa, the same as that of randos just passing the border with zero oversight?

Even if the first category is higher in numbers (and assuming there are correct numbers for the latter), the crime stats between the two are probably quite different. Especially since "visa overstays" could also count some people waiting for a delayed renewal, or coming in for 6 months and staying 5 or whatever.


It's funny to watch computer programmers (and various experts), whose day job is literally handling complexity and producing truth, fail so quickly and utterly at simple culture war topics. It is like the weirdest thing, people like you (though you did make at least one error somewhere in this thread) are 1 in 1000++ in my estimation.


That was kinda my point - my friend had been so conditioned by the "can they vote" thing that he genuinely thought that non-citizens could legally vote in US elections.

If I had phrased the question as "is it legal for non-citizens to vote in US elections?" then that might have illustrated my point better (and we wouldn't be going down this rabbit hole, though the rabbit hole is informative in itself).


The fact that the rabbit hole exists at all means that our hope for instant, reliable answers is likely doomed. Avoiding politics entirely, I could ask “does gravity curve spacetime?”, and while “yes” is a reasonably accurate answer, so is a digression into quantum gravity, or even saying that we’re not sure spacetime really exists outside of being a good abstraction.


Thanks, yeah, exactly. Like, the best answer to a question can really depend on who's asking it, who's answering it, or what they intend to use the answer for.

To give a much simpler example:

- If an 8-year-old asks "can you mix oil and water", the right answer is "no". If a student is asked that question on a school exam, the right answer is also "no".

- If a chemist asks "can you mix oil and water", the right answer is "yes, and here's how: https://www.youtube.com/watch?v=YJeWklggSpY"


IDK, if my 5-year-old asks "can you mix oil and water", I'll tell her, "sorta - if you just pour them into the same container and mix with a spoon, then no; if you add some stuff and know how, then yes - in fact, it's exactly what mayonnaise is".

That I had to be 35 before I learned that mayo is basically water mixed with oil, held together by eggs, it's an indication of what kind of education about the world we were getting...


At that point you're just changing the physical scenario the question was asking about, and answering about a different scenario entirely. You might as well respond to "can humans fly?" with "yes, with an airplane".

You can certainly reply like that to your 5yo, but it completely misses the point I was making. The video I linked to didn't suddenly modify the outcome being asked about. What it ended up with really was a mixture of plain oil and water, with no other ingredient ever being added to it.


Nice example :) So your answer to "Who gets to decide what's true?" is basically "there is no truth, and any truth that there might be is relative to the person asking the question". Is that right?

I think that's probably technically accurate, and also practically useless. Even damaging. I saw this in the climate arguments: two sets of different facts led to two different versions of the truth, which led to two completely irreconcilable points of view. Essentially two sets of people shouting "no, but..." and "well, actually..." at each other, pointing to two completely different truths, both supported by two completely different sets of facts. At some point we as a society need to agree on our truth in order to get anything done.


> your answer to "Who gets to decide what's true?" is basically "there is no truth, and any truth that there might be is relative to the person asking the question". Is that right?

That's really not how I interpret it.

Assuming we agree on what "mixing" means, which itself isn't that trivial but even without a formal definition I think we have the same idea of "homogenous at molecular level on a longish time period".

The truth is "yes you can mix water and oil", there's no doubt about that. It's testable and tested.

The fact that we use context to interpret the question (rather than being entirely literal about it) and decide whether the literal truth is really what's appropriate to answer, doesn't change the nature of truth.

There's also the question of knowledge (I might not know that you actually can mix water and oil), but again that doesn't change the nature of truth.

Like so many philosophical questions, it only sounds interesting because we assign different meanings to the same words: here conflating truth and answer.


> So your answer to "Who gets to decide what's true?" is basically "there is no truth, and any truth that there might be is relative to the person asking the question". Is that right?

No it's not, at all. This isn't a debate about what's true, it's a debate over the intended meaning of the question. The point was that people assume context behind the question and answer based on the context. Because even the person asking often doesn't literally mean what the words say. The question is more than the words that are explicitly written.


The concept is that it is possible for people to be in bad faith. This also underlies the idea that people can commit crime, be guilty of crime. I guess the term is 'law'?

Bottom line is that you can't have the bottom line be 'does a person earnestly believe what they're doing is right and good', much less 'do they say they're right'. Can't fall back on that, it's hopelessly inadequate.


I might disagree with that and agree on the context approach. „The Truth“ does not exist and it would be much more helpful to have context related answers without one mayor view which needs to dominate.


Yes. This thread reads a bit like a convo between AIs, almost but not quite talking past each other.


> At some point we as a society need to agree on our truth in order to get anything done.

These are known as half-truths. We do settle for lies in order to do whatever it is people feel they need to do.

We also settle for lies because there are just things we don't understand yet, but our models are currently correct, and possibly collapsing over millenia to a stable truth.


As someone who writes fiction, I’d like to note that one often needs to write really complex lies in order to effectively transmit a single truth. You can’t just tell people anything, you have to prepare the mind to receive the knowledge.


This is a very interesting comment. Is it possible for you elaborate on that?


What do you think of Ayn Rand’s works of fiction?


Generally as lies wrapped in lies. You can use the same techniques to push a falsehood as well. Which is not to say Rand did not actually believe her message — she did, which is why the books are so effective. But they have given the upper class an exaggerated sense of their own importance, and a conviction that government only stands in their way. Meanwhile over in reality, the more a state is run by oligarchs, the worse a place it is to live.


I don't really disagree, and don't appreciate Rand, but an oligarchy can be disguised as something else, and people will feel relief when it is replaced by an open oligarchy, because one means of gaining power is to give people what they need.


Then you are just lying to the 8 year old.

Sure you can adapt how much context you give based on who's asking, but if it's something factual like this it really shouldn't change from a yes to a no.

If you think they _meant_ to ask a different question that is less vague, it can be clarified

"Water and oil do not mix by hand. However water and oil can mix under some specific conditions like a vacuum, do you want to discuss that in more detail?"


Eight year old proceeds to pour oil and water into a vacuum cleaner.


I think that would be Google's approach, before they were waylaid by ad revenue incentives: that question intent can best be inferred by knowing as much about the asker as possible.

But I feel optimizing for lazy question phrasing is infantilizing the userbase and assuming they're incapable of learning how to ask more precise questions.

Which is an endemic problem in the modern web. We should be building systems with low barriers to simple adoption, but whose power scales with a user's expertise.

Instead, we're hyperoptimizing for lowest common denominator, first interaction and as a result putting a glass ceiling on system power.


> that would be Google's approach, before they were waylaid by ad revenue incentives: that question intent can best be inferred by knowing as much about the asker as possible.

I’d rephrase it: this was googles approach because they were waylaid by ad revenue incentives. Why does everyone assume the search bubble is a well intentioned accident?

The linguistic arguments about the nature of truth above don’t cut it for me as any justification that the answer depends on who is asking. If we’re telling chemists that oil and water do mix as mentioned elsewhere in thread, we should probably tell the same to children and just make a slider for the level of additional detail. No problem. Especially if the alternative is a dystopic post truth panopticon.

> systems whose power scales with expertise

Couldn’t agree more that this is what we want/need, but disagree about infantilizing user bases and lowest common denominator.

Platform’s aren’t trying to scale user power with expertise, they want to scale revenue with users, and separately, to deliberately restrict user power/control so that platforms decide what users see. And it’s not necessarily political or linguistic or about the nature of truth.

A simple example of this that’s everywhere is filtering content by sub genre tags. Ever notice that you often see niche content tags like “time-travel” or “mind-bending” but can’t click it? Platforms want the tags for internal analytics naturally, but they want the control of not providing it as a filter, forcing users instead into a fuzzier category like “people also watched” or “top ten this week”.

Why? Because platforms can hide advertised content there, push stuff they pay less license fees for, make operations cheaper with caching, or whatever else.


Because lots of people are ignorant and imprecise.

HN selects from people who have at least a passing knowledge/interest in programming/science, a subpopulation which is already several standard deviations from the mean in specificity and debugging.

> disagree about infantilizing user bases and lowest common denominator

Platforms, with Google as exemplar, evolve in two ways.

1: Features they explicitly choose not to ship, because they're strategically dangerous. See tool-use foot dragging by OpenAi.

2: Features they deprioritize, because they aren't as revenue-impactful as other things.

To me, it feels like Google dropped the ball via the second path.

I'm sure they've been doing a ridiculous amount of cool work behind the scenes on individual context grounding... but once prod was "good enough for ads" the company as a whole wasn't incentivized to do the hard thing and ship more advanced features in search.

Which is how they ended up as legacy as they are, competing against LLM search that's by definition context-native.


My education also drilled into me the distinction between what one can do and what one may do. I can just hear my main lesson teacher now: "Can an illegal immigrant vote? Perhaps, but they may not, and certainly they should not."


More of teaching like this in school could go a long way, but once people reach adulthood I don't think this would cut it. I think a more military training approach might be needed where you tear this part of the person's psyche down and rebuild it properly from the ground up.


>The answer to "is this legal" would have been such an obvious "no" to people on both sides of the debate

I'm sorry, but the people on the side of letting illegal aliens vote are squarely of the opinion that it is legal to do so.

Why?

Because when you prohibit any and all means (eg: government identification demonstrating citizenship and residence) to test the question of legality, everything becomes legal by sheer virtue of the fact you can't demonstrate what they are doing is illegal.

Remember: Innocence until proven guilty beyond a reasonable doubt. You are prohibited from proving they are guilty, so they are innocent by default.


There is no side of letting illegal aliens vote. WTF?

Although I did once live in a town where anyone over 16, citizen or not, could vote in the city non-partisan elections. The idea was local government needs all the involvement it could get and if you lived in the city you had a stake in its future. I do believe one had to have a proper visa and so on.


You do realize that you can't test someone's legal ability to vote if you prohibit confirming if someone is legally permitted to vote, right? Our rule of law rests upon assuming innocence if guilt cannot be proven beyond a reasonable doubt, and here it is explicitly prohibited to even attempt proving guilt. Therefore it becomes legal for illegal aliens to vote in American elections, because you are prohibited from even attempting to demonstrate their illegality.

The only reason asking for government identification prior to voting is considered "racist" and illegal is because the people pushing such agendas want more votes and don't care where the votes come from, including illegal aliens, legal aliens, and otherwise people who do not have the right to vote in American elections.

I voted when I was still in California, born and raised American so I have the right to vote. I was never asked for any piece of identification. None. I could have been a Canadian or Briton or Chinese or some other foreign national, I could have been an illegal alien from Mexico or Guatemala and I could have still voted and my ballot counted because nobody checked. I just walked in and voted, identification or citizenship be damned let alone registration.

There's a part of me still questioning the value of my American citizenship and paying my taxes like a good citizen.


If by "it" you mean a specific GPT like OpenAIs ChatGPT or Claude, yes. In general LLMs are trained to be a verbose as they are trained to be. Here is the full answer I got from Gemini 1.5 Pro for "in the USA, can illegal immigrants vote?":

> No, non-citizens cannot legally vote in federal, state, or local elections in the United States. This includes those who are undocumented or residing in the country illegally.

Clearly it made assumptions about the interpretation of the question, and did not respond verbosely to account for ambiguity.


At the risk of derailing the conversation down a completely different rabbit hole... As I understand it, only citizens are legally entitled to vote, and voting requires a government-issued ID and the voter to be enrolled.

How did they vote and get away with it previously?

(also, as per another comment, if you know that this happened then surely they didn't get away with it?)


> (also, as per another comment, if you know that this happened then surely they didn't get away with it?)

Is it possible to steal money from a bank and get away with it? Is it possible to obtain citizenship fraudulently and get away with it? etc.

But if they got away with it then how do you know?

> How did they vote and get away with it previously?

Look it up on Wikipedia? They literally have linked cases from the past: https://en.wikipedia.org/wiki/Electoral_fraud_in_the_United_...

Or look at the most recent case in the news yesterday, which someone already replied with in the other comment: https://www.detroitnews.com/story/news/politics/elections/20...

> At the risk of derailing the conversation down a completely different rabbit hole...

This will definitely derail the conversation so I'll just leave my reply at this.


It's quite clear that the question is not whether it is possible that an insignificant number of votes are cast fraudulently, because if we're talking about insignificant events, all things are possible.

Certainly the question is whether there's any evidence, after endless audits and investigations and lawsuits, that the volume of fraudulent votes is anywhere near large enough to affect the results.

Is it possible that someone registered their hamster to vote? Certainly.

Is there any evidence whatsoever that tens of thousands of hamsters are casting votes? No.


> Certainly the question is whether there's any evidence, after endless audits and investigations and lawsuits, that the volume of fraudulent votes is anywhere near large enough to affect the results.

(a) Nobody asked that above.

(b) You're conflating "do people do X" with "can people do X". Those are two very different questions. There are lots of things that people could easily do frequently, but that they simply don't do frequently. Perhaps because they're just honest, perhaps because they lack sufficient motivation to be dishonest, perhaps because they're worried they might get caught, perhaps because they have better things to do, etc.


I have no idea what the downvotes mean. Are people claiming "can illegal immigrants vote?" is somehow the same question as "are illegal immigrants voting frequently enough to sway the outcome of the election?" Those seem like manifestly different questions, what's so controversial?


Just seems like willful misinterpretation of the spirit of the question in casual convo to score some sort of point in a game you made up, esp after they clarified the context that their friend really thought it was legal for them to vote.


Their friend was intending to ask about legality, but my whole point was that the question itself doesn't convey that. I was saying that when I saw the quoted question, it seemed to me that it was being interpreted as "can illegal immigrants get away with voting", and they were probably encountering websites saying "illegal aliens are voting!!!", which is obviously confusing, even for someone who already knows it's illegal. Does that make sense?

This wasn't me willfully misinterpreting it, this was me literally doing my best to guess what the intention of the question was, based on the question. Now of course after the comment said the answer is an "obvious no" then I finally figured out the intended question was something else (hence my reply), but that's out-of-band information that was in no way conveyed by the query. And my point was that the answer to the question isn't obvious because the meaning of the question itself isn't clear.


California passed a bill last month which banned requiring ID for voting, which stirred up the discussion that people without ID or to be more precise illegal immigrants could just vote.

I don’t know more details about it but theoretically this would also allow people to vote more than once.


You sent me down a bit of a rabbit hole... you don't have to have ID to vote, but you do have to be registered to vote (which requires ID). So in order for an illegal immigrant to vote, they would need to impersonate a registered voter (and presumably if that person did vote, it would be flagged as multiple votes under the same registration). Not impossible, but not the same as being able to just walk up and vote no questions asked.


Nope, automatic voter registration through the DMV when getting/updating a driver's license can do it, which they are allowed to have. One of the recent court cases (like within the past few weeks) involved removing people from the list of registered voters who failed to check the "I am a US citizen" box. Driver's license is what most of us use as a government ID anyway.


Automatic vs. manual doesn't seem relevant? A driver's license is an ID, so if they have that for automatic registration, then they still have an ID.

I think your point is that people can just lie about citizenship and get away with it when registering to vote, regardless of when/how it is done? Is that it?


Registering to vote has been made so easy it can be done by accident. Then months or years later when an election is coming up they'll get a voter card on the mail and think that means they can vote even though they're not legally allowed to vote.


I'm kind of incredulous at this if I'm being honest. How can you register to vote by accident? Every form I've seen a copy of asks if you're a citizen and gives you a warning about that. Do you have a copy of the form or screenshot you're referring to that makes this easy to do by accident?


I don't, but it's easy to find people who realized this happened to them, getting scared about their immigration status and/or breaking the law. And the DMV isn't the only way this can happen:

https://www.reddit.com/r/DACA/comments/1aolik8/accidentally_...

https://www.reddit.com/r/USCIS/comments/1cuop36/need_advice_...

The comments on this one have someone describing how it almost happened to them at the DMV with the checkbox in question: https://www.reddit.com/r/immigration/comments/7bzst0/acciden...


Ah gosh I see. It's these voter registration campaigns that are misleading people. Thanks for the links, those are really unfortunate...


The check between licenses information and database happened after the election.


I'm no linguist, but the question does seem unambiguous, or quite clear, to a reasonable observer. The context is "voting in a US election" AND the subject is "an illegal immigrant" WITH an assumption that the illegal immigrant has, in fact, illegally emigrated to the US.


If they got away with it then how do you know it's happened?


To answer the general question, of "if somebody got away with X, how can we know it happened?":

If somebody robbed a bank at gunpoint and was never caught, can we know it happened? Obviously yes.

What if somebody "merely" embezzled from one, and there's a "hole" found in the banks books and money missing, but nobody who did it or how? Still, I'd wager yes, one coudl tell by the results.

What if somebody used illegal means to get leverage on some stock buying/selling, it became know, but they weren't punished and got to keep their profits? They got away with it, but we do know it happened.


None of those are voting. If someone cast a ballot who was not allowed to, then how do you know it's happened if they weren't caught?

What is this assertion being based on other then vibes?


Well, after the fact it's realized the person whose name is on the ballot had been dead for months, and you didn't catch who cast the vote because you don't know who they are.

Or a million other scenarios. Really all it takes is stopping to think about it for more than a minute.


I mean if they registered with some identifiable piece of info (SSN for example) then you would see two votes with the same SSN if they were pretending to be someone else and if they dont have an SSN then they arent a citizen.


Obviously off-topic, but see the recent case of the Chinese student who illegally voted and then turned himself in.


Murphy's law, arguably. If you can imagine a way it could happen, it most probably did.


Neal Stephenson's book _Fall, or; Dodge in Hell_, from 2019, dedicates many words to this concept. Briefly summarized, it explores a post-truth-world, describing the world where people could agree on the truth as a narrow time-slice in history. People have their own individual internet filters, and the USA becomes divided into Afghanistan-etc-like tribes, each an echo chamber. (Ameristan)


In the book, “The Big Change” (1952), Frederick Allen talks about the year 1900, and (among many differences) he notes that for nearly everyone in the year 1900, the limits of their world rarely extended beyond their own town.

We have this idea today that everyone online is getting trapped in echo chambers, but that’s been the case for most of human history.


This might be a popular contrarian view at the moment, but I'm not sure everyone is trapped in a bubble by social media.

When people mostly communicated with those in their own town that is a bubble. Radio and TV is more of a bubble because there is limited "bandwith" in the scheduling so it has to be editorialised (not necessarily a bad thing).

Social media companies do choose what you see via algorithms, but I'm not convinced they benefit from only showing you content you "agree" with, it feels like being shown a certain amount of content outside your "bubble" would increase screen time. There are also the comment sections that often have contradictory views.

Even social media (setting aside the rest of the internet for the moment) will expose you to more viewpoints than the the social circle in your home town, or a TV/Radio schedule. I'm not saying it is a healthy way to be exposed to other viewpoints, but I don't think the problem with social media is that is creates a bubble.


Increasing engagement doesn't have to just be through agreeable content; consider Twitter ragebait or Instagram [body dysmorphia bait?]. In general, it means your attention is hooked. If people only get content they are likely to be highly engaged with, the range of content will be relatively narrow, and that is a bubble. Perhaps more or larger bubbles, but still noticeably limited perspectives.


That would be comforting if most of human history hadn't sucked so hard.


Thank you. That comment just made my day :-)

(now to wiping the tears of laughter from my face...)


If anything considering how vastly more complex things are nowadays (behind superficial appearances), the median adult is likely even more ignorant, in relative terms, than say the median peasant in rural 1700s France, or even in 1900s France.


Hah, I'm not the only one that brings up this book in this context.

The way this is wrought, in the novel, is a savant engineer writes a bot framework that can cheaply and quickly disseminate torrents of misinformation about a provided subject, and then open sources this framework. He basically broke the internet on purpose as a sort of accelerationist move I suppose.


Not sure if that's better or worse than a developer doing basically the same thing because they could make a little money. <at_least_its_an_ethos.gif>


> A better example might be "is dietary saturated fat a major factor for heart disease in Western countries?". The current government publications (which answer "yes") for this are probably wrong based on recent research. The government cannot be relied upon as a source of truth for this.

I know it was just an example, but actually no, the role of dietary saturated fat as a factor for heart disease remains very much valid. I’m not sure which recent studies you’re referring to, but you can't undo over 50 years of research on the subject so easily. What study were you thinking about?


>you can't undo over 50 years of research on the subject so easily

Sure you can if the research was bogus to begin with, sponsored in many cases, and merely taking for granted/referencing some previous results without verifying them, which is often the case.


You can undo 50 years of research easily, by this process which we call science.


Science is[1] impressive but does it have a method for resolving the numerous conflicting "truths" in this thread?

[1] except when it isn't, of course


I think they are referring to low-carb studies done recently. If your diet consists of only saturated fat, it does seem to be healthier for you than the standard American/Western diet that is also high in saturated fat but also quite high in sugar, wheat, and other starchy carbs. When combined, saturated fat and carbs are a hitting a double if your goal is to be unhealthy.

General disclaimers apply regarding portion sizes etc blah blah blah I'm not a doctor.

Anecdotally, a low-(ish) carb diet and fasting has done wonders for my health and many others. I will say that there appears to be a link with higher cholesterol when consuming higher amounts of fat, but the argument in nutrition science atm seems to be centered on whether or not that is "good" cholesterol, but it's hard to measure in human patients for a long time because you essentially need to put them on a very limited diet to get good data. Those large scale trials are expensive and hard to manage at scale.


https://pmc.ncbi.nlm.nih.gov/articles/PMC9794145/

Remarkable how easy it is to cling to propaganda


I wasn’t aware of the new debate over saturated fats, I’m curious and will be watching this evolve with interest.

That said, for me this publication has red flags right from the start. Complaining about difficulty of changing everyone’s minds is a political and non-academic persuasion tactic that does not convince me. Calling it “resistance” and “bias” is a bullshit framing that makes me less likely to trust Teicholz. Of course there is resistance to 50 years of publication and research, and there should be. There’s a lot of bias towards the earth being round, and a lot of resistance to the idea that it’s flat, right? If I repeat the claim that the earth is round, is that “propaganda”? It would indeed take time and effort to change everyone’s minds about that.

Multiple times she references “>20” papers that back up her claims. Except 5 of her references in this paper are her own. And she has around 10 on this subject. So is she claiming this “new consensus” is based on what she herself and maybe one or two other people believe? If 50% of the evidence for consensus is her own papers, then I doubt there’s any consensus at all. It’s funny to claim there’s consensus at the same time she complains that it’s difficult to change the consensus. Even 20 independent scientific papers not authored by Teicholz is practically nothing in the big picture. It will take many more papers and much more time, and the evidence needs to be overwhelming, clear, obvious, and true.

She might be right! But Nina Teicholz is a journalist, not a scientist. She does have a PhD, but her publications don’t appear to be scientific research, and most look like opinion pieces.

Out of curiosity, if saturated fats aren’t the culprit, what is? Looks like she does have one paper questioning sugar, so is she claiming sugar is the real cause? What if it’s the combination of sugar and saturated fats? Does that make her right or wrong?


It's amazing how many HNers link this charlatan's op-ed thinking it's evidence. Presumably you don't like linking to our best human outcome research on the subject because it never pans out well for saturated fat–not for atherosclerosis, not for nonalcoholic fatty liver disease, not for glucose sensitivity, and so on.

So you link to the equivalent of a reddit post 'summarizing' the space. I see this link every week on here and every time, the person who linked it thinks they just had a mic drop moment like you.


I think discussing this topic with too much fervor is a waste of energy. Nutrition science is hard because performing valid studies at large scale is close to impossible, so I’m left performing an argument from nature, being that eating things that were invented decades ago might be worse for us than things we’ve eaten for millions of years.


Not everything on PubMed weights the same. As others have said, this is just a summary article by the Best seller author Nina Teicholz. Not only she's heavily sponsored by the Meat Industry (and I'm not vegan), but her best selling book title is "The Big Fat Surprise: Why Butter, Meat and Cheese Belong in a Healthy Diet", yet in the article she declares "The author receives modest royalties on a book on the history of dietary fat recommendations and otherwise declares no conflicts of interest"...


Nina Teicholz being the distributor of this propaganda right?


> To paraphrase a recent conversation I had with a friend: "in the USA, can illegal immigrants vote?" has a single truthful answer ("no" obviously). But there are many places around the web saying other things (which is why my friend was confused). An LLM trawling the web could very conceivably come up with a non-truthful answer.

There are many jurisdictions where some illegal immigrants (Dreamers) are allowed to vote, including New York City[1].

[1] https://www.theguardian.com/us-news/2022/jan/09/new-york-all...


>> "in the USA, can illegal immigrants vote?"

> More than 800,000 non-citizens and “Dreamers” could vote in New York City municipal elections

Treating the question literally, that would not be in the USA (federal elections).


> Treating the question literally, that would not be in the USA (federal elections).

I strongly disagree. Going by your interpretation, the question "How many people in the USA vote in local elections?" would have the answer "not a single person."


Your version of the question has the qualifier "in local elections".

The parent's version doesn't, so the answer assuming that it talks about the main USA elections, and not local or any random election that just happen to be conducted in the USA, is quite valid.

In other words "in the USA elections" implicitly points to a specific kind of elections (the presidential ones), different to "USA local elections" or "any kind of election within the USA").

You might argue "in the USA elections" doesn't anywhere prevent the more generic interpretation, but I argue that that's how most people would understand and answer such a question.


> Your version of the question has the qualifier "in local elections".

Yes, the point is that qualifier works because when you're talking about people voting "in the USA," you can be talking about local or federal elections. "In federal elections, how many people vote in local elections" makes no sense. "In the USA, how many people vote in local elections" makes sense, because voting in the USA can encompass both local and national elections.

I understand that there are many people who ignore local elections. But I disagree that talking about voting "in the USA" means that local elections should be excluded.

> In other words "in the USA elections" implicitly points

You're using quotations for something that wasn't said. The original comment was "in the USA, can illegal immigrants vote?"


> But I disagree that talking about voting "in the USA" means that local elections should be excluded.

I agree with you, however, whether you like it or not, that phrasing will default to the federal elections to most people and especially foreigners.


Especially given a few years of heated political discussions around the possibility or not of the practice, concerning precisely the federal level.


I was thinking something similar. I like the google auto-ai summary for little trivia facts and objective things (who played so and so in the movie? How heavy is a gallon of milk?) this is stuff that is verifiable and could theoretically just be queried from some sort of knowledge base.

But for anything remotely subjective, context dependent, or time sensitive I need to know the source. And this isn’t just for hot button political stuff — there’s all sorts of questions like “is the weather good this weekend?”, “Are puffy jackets cool again?”, “How much should I spend on a vacation?”


> could theoretically just be queried from some sort of knowledge base

Google bought Freebase in 2010 [0] and scaled it [1].

[0] https://en.m.wikipedia.org/wiki/Freebase_(database)

[1] https://en.m.wikipedia.org/wiki/Google_Knowledge_Graph


What is truth anyway? I see it as a quicker version of browsing to web to get a summary of what people say. As you said, with search you get a bunch of websites where strangers talk about a certain topic. You read a dozen and see if they agree with each other or if they sound legit.. There is just a huge overlap between what we consider true and what (an overwhelming majority of) people agree on. A lot of things are reduced to the consensus. If you ask a non-obvious question, you usually get an answer "a lot of people you consider trustworthy dedicated some time studying this question and agreed that the answer is X". But then people can be wrong, a big group of people can be wrong, people can be bribed or perhaps you don't actually trust these people that much. The internet can't tell you the truth. LLM can't tell you the truth. But they can summarize what other people in the world say on this subject.


> I agree that this is the core question, but I'd put it as: Who Gets To Decide What Is True?

Well, outside of matters of stark factuality (what time does the library close?, what did MSFT close at?), many things people may be "searching for" (i.e. trying to find information about) are more in the realm of informed opinion and summary where there is no right or wrong, just a bunch of viewpoints, some probably better informed than others.

I think this is the value of traditional search where the result is a link to a specific page/source whose trustworthiness or degree of authority you can then judge (e.g. based on known reputation).

For AI generated/summarized "search results", such as those from "ChatGPT Search" (awkward name - bit like a spork), the trustworthiness or degree of authority is only as good as the AI (not person) that generated it, and given today's state of technology where the "AI" is just an LLM (prone to hallucination, limited reasoning, etc), this is obviously a bit of an issue...

Even in the future, when presumably human level AGI will have made LLMs obsolete, I think it'll still be useful to differentiate search from RAG AGI search/chat, since it'll still be useful to know the source. At that point the specific AGI might be regarded as a specific person, with it's own areas of expertise and biases.

The name "ChatGPT Search" is very awkward - presumably they are trying to position this as a potential Google competitor and revenue generator (when the inevitable advertisements come), but at the end of the day it's just RAG.


Non-citizens cannot vote in FEDERAL elections, but some states allow them to vote in LOCAL elections.

There are few more restrictions here: https://www.usa.gov/who-can-vote


It'd make sense if paying taxes gave you the right to vote.

Taxation without representation and all that.


Non-citizens/residents are the juiciest targets for taxes (tourism taxes at hotels and whatnot, pumped up taxes on restaurants/alcohol)! This actually makes no sense at all.


Sales taxes levied on visitors can be justified as different, but it seems unfair to level income taxes, or real estate taxes, without any right to vote over how those taxes are spent, especially in the case of legal non-citizen residents.


This nails it:

> Who Gets To Decide What Is True?

I would not agree to this however:

> It was then up to the reader to decide which of those sites they wanted to visit,

With current search engines, Google decides for you ("helped" by SEO experts failing over themselves to rank higher because their revenue directly depends on it). In theory, you could go an read a few dozen pages and decide for yourself.

In reality, non-technical users will click on a first link that seems to be related to the question, and that's it.

Even with AI-based search (or q/a instead of search) I think the same will happen. There is and will be a huge reward for gaming the results, be they page links or RAG snippets that rank for a query. I've already seen many SEO shops advertising their strategies to keep the customers' business relevant in the chatbot area. As this approach becomes more prevalent, you can be sure many smart people will do many experiments to figure out how best to please the new algorithm.

In other words, AI-based search is an UX optimization, but doesn't address the core problem of how do you decide what content's the best, and do that in context of each user, and do that while maximizing the benefit for the user vs profit for the company doing this.

So we have two huge hurdles:

1. who will decide what the user wants[0] to see, and how are incentives for that entity aligned with the user's

2. how is that entity supposed to find the information needle in the haystack of slop that's 90%+ of current web?

[0] "wants" in a rational "give me the best possible information" meaning, not in "what keeps them addicted, their heart rate up, and what will drive engagement" meaning


Ultimately it's an incentive problem. As the saying goes, if you are not paying for the product, you are the product.


> To paraphrase a recent conversation I had with a friend: "in the USA, can illegal immigrants vote?" has a single truthful answer ("no" obviously). But there are many places around the web saying other things (which is why my friend was confused).

This doesn’t have a single truthful answer. Some states don’t have voter ID laws, so the truth can depend on the state. In those no voter ID laws there’s not that much keeping someone from voting twice or more under different names, except significant moral qualms about subverting instead of preserving everyone else’s right to vote in a democratic republic. Someone can assume the name of a person from another country that could’ve plausibly come in illegally. Without a picture ID, they can’t claim you aren’t that persons.

Can an illegal immigrant vote? Yes, in states without voter ID laws, technically anyone can vote, even convicted terrorists. Should an illegal immigrant vote? No, they’re not supposed to be able to vote and there may be consequences if caught.

What purpose do a lack of voter ID laws serve except the obvious conclusion which is to enable cheating?


That isn’t true. They do have voter registration, and they match name addresses that are cross them off, a name can’t vote twice.

MAGA types who wanted to prove democrats could cheat like this were caught very quickly because voting twice or voting for someone else is very very easy to detect, even without IDs. The judges gave them Darwin awards.

Unregistered voters can’t vote in any state, they can only do so by pretending to be someone else (name and address matches the rolls), but they are caught when those people actually vote. Foreigners can also vote by illegally registering. But motor voter means they actually check your status at registration.

The ID thing is a solution to a non-problem, like literacy tests were. But the republicans could make it easier to achieve universally if they just went with a guaranteed free national ID like other countries, except that would make the obstruction aspect of voter ID requirements much more moot, so they never go there.

Foreign residents can vote in some local election in a few places, mainly very local school board elections.


> With a search paradigm this wasn't an issue as much, because the answers were presented as "here's a bunch of websites that appear to deal with the question you asked". It was then up to the reader to decide which of those sites they wanted to visit, and therefore which viewpoints they got to see.

It is very similar. Google decides what to present to you on the front page. I'm sure there are metrics on how few people get past the front page. Heck, isn't this just Google Search's business model? Determining what you see (i.e. what is "true") via ads?

In much the same way that the Councils of Carthage chose to omit the acts of Paul and Thecla in the New Testament, all modern technology providers have some say in what is presented to the global information network, more or less manipulating what we all perceive to be true.

Recently advancements have just made this problem much more apparent to us. But look at history and see how few women priests there are in various Christian churches and you'll notice even a small omission can have broad impacts to society.


This is true of the feeds and everything else where there are abundant choices. Amazon putting its inhouse brands before others. Anything which has to be narrowed down is an algorithmic choice, either data driven or top-down.


"Are all bachelors married?" vs "is it raining outside?"

And then Quine points out that the definitions of "bachelor" and "married" are themselves contingent on outside factors.

"Can illegal immigrants vote?", while close to being an analytic proposition, still depends on an empirical approach that can never be mediated by text, video, etc. All propositions are by necessity experiential. Nullius in verba!

So the truth is and has always been what happens when you get off your butt and go out and test the world.

This is not to say that we don't benefit from language. It makes for a great recipe. If you follow the instructions to bake a cake and you get what you expected you know that the recipe was true. The same goes for the laws of science, search engine results, and generative AI.


The truth isn't a decision that's made, it just is. So nobody gets to decide it, it just is :)


Unfortunately most people seem comfortable with the idea of a Ministry of Truth, at least here in Brazil and from what I read on the web, in many parts of the world also.


I think this presents an approach: move closer to the search engine model. In both academic writing an news journalism, factual claims try to cite their sources. And you can see this is an approach that some of the chat systems have directionally gone. We don't want to always pedantically cite a reference for every question. But maybe we should? At least at an architectural level and let the UX decide if it should be displayed.

> in the USA, can illegal immigrants vote According Author X in Book Y which studied this topic in depth: foo answer


> "Who Gets To Decide What Is True?"

Independent Tribunal: https://www.independenttribunal.org/ (a project of mine)

Even in the of law there are various schenanigans and loopholes such as "legally true" :)


> Who Gets To Decide What is True?

Outsourced workers in less expensive places of the world providing human feedback.


I don’t know who should get to decide “what is true” but I think we all agree that we won’t get anything remotely akin to truth so long as search is driven by for-profit interests trying to steer us towards their products first and our actual query (a distant) second.


> With a search paradigm this wasn't an issue as much, because the answers were presented as "here's a bunch of websites that appear to deal with the question you asked

I don't think this appropriately credits Google's power with regards to what you are seeing


For the record, in my local elections, we have plenty of people voting illegally.

"Is it legal" is very different from "Can they."

By "local," I mean municipal and below. I didn't mean "federal elections conducted in my locality." Election security kicks for state, federal, and some municipal elections.

Others are intentionally fraudulent (e.g. local corruption) or unintentionally broken (e.g. using Google Forms for a school-level public body, where people not legally qualified to vote might still do it, unaware they're committing a felony).

And "public body" has a specific meaning under my state law which extends the same laws as e.g. cutting for my state senate. That's bodies like local school boards, but not random school clubs.

That's the level where we have massive illegal voting where I live.


You're right, this is a key question:

> Who Gets To Decide What Is True?

For any given statement, the answer up until a couple years ago was, "the speaker". Speakers get to decide what to say, but they're also responsible for what they say. But now with LLMs we have plausible text without a speaker.

I think we have a number of historical models for that. A relevant one is divination. If you bring your question to the haruspex, your answer is read out of the guts of a sacrificed animal. If the answer is wrong, who do you blame? The traditional answer is the gods, or perhaps nobody.

Bu we know now that fortune tellers are just selling answers while pretending to not be responsible for them. Which points us at one solution: anybody selling or presenting LLM output as meaningful is legally responsible for the quality of the product.

Unfortunately, another model is the modern corporation. Sometimes the people in a company intentionally lie. More often, statements are made by one person based on a vision or optimism or confusion or bullshit. Nobody set out to lie, but nobody really cared about the truth, at least not as much as everybody cared about making money.

So I'd agree that the government doesn't have much role in deciding The Truth. Similarly, the government shouldn't have much role in controlling what you eat. But in both cases, I think there's plenty of role for the government in ensuring that companies selling good food or good information have sound production and quality control measures to ensure that they are delivering what consumers are expecting.


> But now with LLMs we have plausible text without a speaker.

Well, it's still a known source, who's competence, biases, etc one can judge just like any human source.


It's absolutely not like any human source, and there's little reason to think its competencies and biases are anything like humans.


I didn't say it has the same competencies and biases as any human - I said we can judge it on it's individual merits, the same way we judge any human on theirs.


No, you can't judge it the same way you judge a human, any more than you could judge a bus engine the same way you'd judge human. It's not an individual, and not remotely like one.

I get this might seem like tedious nitpicking to you, but the number one error people are making with LLM output is anthropomorphizing it. Which I get, because it's built to seem that way. But it's an enormously dangerous misconception.

As one example among zillions, look at the term "hallucination". All LLM output is equally "hallucinated". Some of it, when interpreted by a human may be taken as meaningful. Some of it, the "hallucinated" part, is taken as meaningful but contrary to something else they understand. But it's the human creating all the meaning here. Even calling this class of mismatches "hallucination" is anthropomorphizing LLMs.

Imagine I take the proverbial million monkeys to generate random words. Then I create a statistical filter so that we extract only the plausible sentences. Is this machine a source? Can I "judge just like any human source" here?

I'd say the answer is a clear no. And if you think the answer is yes, then the same has to apply to things like horoscopes, the I Ching, or the intestines of a sacrificial goat.


I can judge the utility of a black box oracle on it's own merits, without needing to know whether it's human, and without being accused of anthromorphizing it.

If I've observed the answers of the oracle to be biased, or useful/correct in some circumstances and not in others, then this is something to take into account when deciding whether or not this is a useful source to pay attention to.

From the POV of whether the output of the black box is useful, it doesn't make any difference whether what's in the box is a human, an LLM, or a rabid monkey. It is what it is, and I'll judge it on it's merits.

Now, YOU may care what's in the box, for some reason, but that's on you.


> "in the USA, can illegal immigrants vote?" has a single truthful answer ("no" obviously)

legally no, practically- yes. In most states, you simply must attest that you are a citizen in order to register. In many states, non-citizens have been auto-registered to vote when attaining drivers licenses. Reddit is full of panicked immigrants concerned that they found themselves registered to vote, and worried about how that would affect their status.


> An LLM trawling the web could very conceivably come up with a non-truthful answer.

LLMs don't uncritically "trawl" the web, ingesting and then blindly regurgitating what they find.

98% of the internet is crap, yet LLM results don't reflect this dismal figure. They're amazingly good at distilling the 2% that is non-crap.


Sure. Probably trawling was the wrong word and conveyed a technically-inaccurate meaning. I'm not sure what the correct word would be.


> Who Gets To Decide What Is True?

Seems particularly to be a US based phenomenon. Unlike the more transparent manipulation seen under dictatorships... Where people generally recognize propaganda for what it is, even if they can’t openly challenge it—some in USA live within entirely new realities.


I'm not in the USA and I see this a lot too. It's not just a US-based thing.


Definitely a US based phenomenon, but sadly like other US based things it spreads to other countries. I talk with friends in CA and EU who are seeing some of the junk we saw 10 years ago with "what is fact"

You would think common sense prevails...


> You would think common sense prevails...

Appeals to common sense seem to underlie many of the "alternative facts" in the zeitgeist. "Common sense" is how people defend their views when they can't do it empirically. That's not always bad, but "common sense" is probably part of the problem.


I disagree with this. Strongly.

Remember the covid years, when what was true kept changing rapidly, and sometimes what was said on the fringe and considered misinformation was later adopted by mainstream. Vaccines prevent transmission of the virus. Oh no they don't. Don't wear masks. Oh no, do wear masks. Oh, whatever, cloth masks are face decorations anyway. Lockdowns are good. Oh, lockdowns were a mistake. Don't treat pneumonia patients with prednisone. Oh, do treat pneumonia patients with prednisone. Lab leak is a conspiracy theory. Oh, maybe lab leak is not a conspiracy theory. And so on, and so forth...

Or take a look at how not just the media, but even the government in the UK or Canada liberally put labels such as "far right", "alt right", "antisemitic", etc. on their opponents, and how these labels pop up in Wikipedia. Are they true?


> We're seeing how that pans out in Australia and it's not good.

How are we seeing how that pans out when Australia's misinformation bill is still just a proposal?


Fair point. I think just that there's a general acceptance that "something must be done" and that the government are the people to do it is pretty alarming. It's a short step from here to a Ministry of Truth, and I can see Australia taking that step pretty soon.


The thing about Australia is that voting is mandatory. By definition this makes it difficult for politicians radicalising the edges to pull the mass of the normal curve away from the centre. In the US the opposite is true. The radicalised are more incentivised to turn up and vote than the centrist mass. So much so that the last 40 years has seen the hollowing out of the centre and this ridiculous (to my eyes) seesawing of extremes.

What I’m trying to explain is that (successful) politics in Australia doesn’t stray too far from the centre of the body politic. As a result there’s greater faith in institutions here than in the US. It’s far less alarming to us (conceptually) than it is to Americans.


I have a physics analogy which is similar. Vested interests set up magnetic fields in social media / legacy media (lot of things discussed prominently in social media is just what legacy media is saying. So legacy media is a sense is setting up an anchor points and people have to distribute themselves around it) to flip magnetic domains to align with the narrative.


What is a not-alarming answer to “Who gets to decide what’s true?”


I don't know.

We used to defer this to journalists, effectively. While individual journalists often lied and misrepresented the truth, there was some responsibility within the industry to tell the truth, and newspapers did print retractions and corrections when they got it wrong. The editorial content was strictly separate from the business of running the newspaper, so editorial decisions were (mostly) not influenced by commercial decisions and free to pursue The Truth as they saw it.

That, sadly, is no longer the case. And we have no good replacement for it.


There is no incentive to be truthful currently, among democracies the seems to be most powerful in the US with Trump able to lie and there seemingly be no counter for it.

With old regulated media there was (is?) the legacy of the organisation and the idea it was "trustworthy" on the line for the journalists that work for it. So they have an incentive to be truthful, and hopefully an idea that publishing lies is not a good moral choice.

With the personality driven journalism that emerges from the internet there is less incentive to be truthful, such personalities can be very successful using populism alone to play to their audience.


I genuinely think Kagi has led the way on this one. Simplicity is beautiful and effective, and Kagi has (IMHO) absolutely nailed it with their AI approach. It's one of those things that in hindsight seems obvious, which is a pretty good measure of how good an idea is IMHO.

Google could have done it and kind of tried, although they're AI sucks too much. I'm very surprised that OpenAI hasn't done this sooner as well. They're initial implementation of web search was sad. I don't mean to be super critical as I think generally OpenAI is very, very good at what they do, but they're initial browse the web was a giant hack that I would expect from an intern who isn't being given good guidance by their mentors.

Once mainstream engines start getting on par with Kagi, there's gonna be a massive wave of destruction and opportunity. I'm guessing there will be a lot of new pay walls popping up, and lots of access deals with the search engines. This will even further raise the barrier of entry for new search entrants, and will further fragment information access between the haves and have-nots.

I'm also cautiously optimistic though. We'll get there, but it's gonna be a bit shakey for a minute or two.


> I'm also cautiously optimistic though. We'll get there, but it's gonna be a bit shakey for a minute or two.

But I don't understand how all of these AI results (note I haven't used Kagi so I don't know if it's different) don't fundamentally and irretrievably break the economics of the web. The "old deal" if you will is that many publishers would put stuff out on the web for free, but then with the hope that they could monetize it (somehow, even just with something like AdSense ads) on the backend. This "deal" was already getting a lot worse over the past years as Google had done more and more to keep people from ever needing to click through in the first place. Sure, these AI results have citation results, but the click-through rates are probably abysmal.

Why would anyone ever publish stuff on the web for free unless it was just a hobby? There are a lot of high quality sites that need some return (quality creators need to eat) to be feasible, and those have to start going away. I mean, personally, for recipes I always start with ChatGPT now (I get just the recipe instead of "the history of the domestication of the tomato" that Google essentially forced on recipe sites for SEO competitive reasons), but why would any site now ever want to publish (or create) new high quality recipes?

Can someone please explain how the open web, at least the part of the web the requires some sort of viable funding model for creators, can survive this?


> Why would anyone ever publish stuff on the web for free unless it was just a hobby

That's exactly what the old deal was, and it's what made the old web so good. If every paid or ad-funded site died tomorrow, the web would be pretty much healed.


That's a bit too simple. There is way fewer people producing quality content "for fun" than people that aim or at least eventually hope to make money from it.

Yes a few sites take this too far and ruin search results for everyone. But taking the possibility away would also cut the produced content by a lot.

Youtube for example had some good content before monetization, but there is a lot of great documentary like channels now that simply wouldn't be possible without ads. There is also clickbait trash yes, but I rather have both than neither.


Demonetizing the web sounds mostly awesome. Good riddance to the adtech ecosystem.


The textual web is going the way of cable TV - pay to enter. And now streaming. "Alms for the poor..."

But, like on OTA TV, you can get all the shopping channels you want.


Not to be the downer, but who pays for all the video bandwidth, who pays for all the content hosting? The old web worked because it was mostly a public good, paid for by govt and universities. At current webscale that's not coming back.

So who pays for all of this?

The web needs to be monetized, just not via advertising. Maybe it's microtransactions, maybe subscriptions, maybe something else, but this idea of "we get everything we want for free and nobody tries to use it for their own agenda" will never return. That only exists for hobby technologies. Once they are mainstream they get incorporated into the mainstream economic model. Our mainstream model is capitalism, so it will be ever present in any form of the internet.

The main question is how people/resources can be paid for while maintaining healthy incentives.


No one paid you to write that?


Except I also pay my network provider to run the infrastructure

I think you forgot that


It costs the Internet Archive $2/GB to store a blob of data in perpetuity, their budget for the entire org is ~$37M/year. I don't disagree that people and systems need to be paid, but the costs are not untenable. We have Patreon, we have subscriptions to your run of the mill media outlets (NY Times, Economist, WSJ, Vox, etc), the primitives exist.

The web needs patrons, contributions, and cost allocation, not necessarily monetization and shareholder capitalism where there is a never ending shuffle of IP and org ownership to maximize returns (unnecessarily imho). How many times was Reddit flipped until its current CEO juiced it for IPO and profitability? Now it is a curated forum for ML training.

I (as well as many other consumers of this content) donate to APM Marketplace [1] because we can afford it and want it to continue. This is, in fits and starts, the way imho. We piece together the means to deliver disenshittification (aggregating small donations, large donations, grants, etc).

(Tangentially, APM Marketplace has recently covered food stores [2] and childcare centers [3] that have incorporated as non profits because a for profit model simply will not succeed; food for thought at a meta level as we discuss economic sustainability and how to deliver outcomes in non conventional ways)

[1] https://www.marketplace.org/

[2] https://www.marketplace.org/2024/10/24/colorados-oldest-busi...

[3] https://www.marketplace.org/2024/08/22/daycare-rural-areas-c...


> There is way fewer people producing quality content "for fun" than people that aim or at least eventually hope to make money from it...But taking the possibility away would also cut the produced content by a lot.

....is that a problem? most of what we actually like is the stuff that's made 'for fun', and even if not, killing off some good stuff while killing off nearly all the bad stuff is a pretty good deal imo.


Agreed. The entire reason why search is so hard is because there's so much junk produced purely to manipulate people into buying stuff. If all of that goes away because people don't see ads there anymore, search becomes much easier to pull off for those of us who don't want to stick to the AI sandbox.

There's a slight chance we could see the un-Septembering of the internet as it bifurcates.


Unless the reason for the death of the paid content deal is because of AI vacuuming up all the content and spitting out an anonymous slurry of it.

Why would anyone, especially a passionate hobbyist, make a website knowing it will never be seen, and only be used as a source for some company's profit?


> and only be used as a source for some company's profit?

Are we forgetting the main beneficiaries? The users of LLM search. The provider makes a loss or pennies on million tokens, they solve actual problems. Could be education, could be health, could be automating stuff.


The problem is not the ad sites dying. The problem is that even the good sites will not have any readers, as the content will be appropriated by the AI du jour. This makes it impossible to heal the web, because people create personal sites with the expectation of at least receiving visitors. If nobody finds your site, it is as if it didn't exist.


I'm not so sure.

I think the best bloggers write because they need to express themselves, not because they need an audience. They always seem surprised to discover that they have an audience.

There is absolutely a set of people who write in order to be read by a large audience, but I'm not sure they're the critical people. If we lost all of them because they couldn't attract an audience, I don't think we'd lose too much.


Exactly. Even if people don't publish information for money, a lot of them do it for "glory" for lack of a better term. Many people like being the "go to expert" in some particular field.

LLMs do away with that. 95% of folks aren't going to feel great if all of the time spent producing content is then just "put into the blender to be churned out" by an LLM with no traffic back to the original site.


chatGPT puts trillions of tokens into human heads per month, and collects extensive logs of problem solving and outcomes of ideas tested there. This is becoming a new way to circulate experience in society. And experience flywheel. We don't need blogs, we get more truthful and aligned outcomes from humna-AI logs.


You, for one, welcome our new AI overlords?

Blogs have the enormous advantage of being decentralized and harder to manipulate and censor. We get "more truthful and aligned outcomes" from centralized control only so long as your definition of "truth" and "alignment" match the definitions used by the centralized party.

I don't have enough faith in Sam Altman or in all current and future US governments to wish that future into existence.


But it would disincentive those who create knowledge? AFAIK, most of the highly specific knowledge comes from a small communities where shared goal and socialization with like-minded individuals are incentive to keep acquiring and describing knowledge for community-members. Would it really be helpful to put an AI between them?


First issue, silos of information.

Second issue: who decides the weights of sources. this is the reason why every nation must have culturally aligned AIs defending their ways of living in the information sphere.


Yet 300M users are creating interactive sessions on chatGPT, which can be food for self improvement. AI has a native way to elicit experience from users.


Only middle-class and rich people could participate in "the old deal" Internet made by and for hobbyists. I think people forget this. It was not so democratized and open for everyone – you first had to afford a computer.

If you're a member of a yacht club, you can probably expect other members to help you out with repairs while you help them. But when a club has half the world population as members, those arrangements don't work anymore.


As if OpenAI won't end up offering paid access to influence these results, or advertise inside them. Of course they will, just like how Google started without ads.

It will be even more opaque and unblockable.


To quote Prince: ahh, now people can finally go back to making music for the sake of making music.


Remember in that time, less web content meant major media outlets dominated news and entertainment on TV and newspapers.


Paging Sergey


The internet was great before the great monetization of it, had tons of information provided for free with no ads. After ads, it will still have tons of information. Stack Overflows will still exist, Wikipedias, corporate blogs that serve just to boost the company, people making courses and other educational content, personal blogs (countless of which make their way here), all of those will continue to exist.

Ad-driven social networks will continue to exist as well.

The age of the ad-driven blog website is probably at an end. But there will be countless people posting stuff online for free anyway.


Nobody will visit stackoverflow because AI through its reasoning and back and forth with users will have solved the problems. This process creates training data for future AIs of that particular company unavailable to any other


Many people have an intrinsic motivation to share knowledge. Have a look at Wikipedia. There are enough of these people that we don't need to destroy the open Internet to accommodate those who only write when they expect to be paid.


[flagged]


You have some examples of that?


Their stance during Covid to ban any mentions of the lab leak theory. Even if not considered the most likely, it had always been a possibility, and not an absurdist one.


> the history of the domestication of the tomato" that Google essentially forced on recipe sites for SEO competitive reasons

That may help with SEO, but another reason is copyright law.

Recipes can't be copyrighted, but stories can. Here is how ChatGPT explained it to me:

> Recipes themselves, particularly the list of ingredients and steps, generally can't be copyrighted because they're considered functional instructions. However, the unique way a recipe is presented—such as personal stories, anecdotes, or detailed explanations—can be copyrighted. By adding this extra content, bloggers and recipe creators can make their work distinctive and protectable under copyright law, which also encourages people to stay on their page longer (a bonus for ad revenue).

> In many cases, though, bloggers also do this to build a connection with readers, share cooking tips, or explain why a recipe is special to them. So while copyright plays a role, storytelling has other motivations, too.


> can't be copyrighted because they're considered functional instructions

by that logic software shouldn't be copyrighted either!


Would like to read more about this. Has anybody used this technique to actually successfully sue someone for infringing their copyright on an instructional website or is it only theoretically possible?


> Why would anyone ever publish stuff on the web for free...?

Why indeed, person who posted for free* on the Internet?

As a side note, consider that adds can be woven into and boosted in LLM results just as easily as in index lookups.

* assuming that you're not shilling here by presenting the frame that the new shiny is magically immune to revenue pressures


It can be like the youtube premium model. The search app is subscription based. So every time your content is served you will get paid. But you have to make your content available to the AI for crawling and mention your monetisation preferences.


The recipe trade-off doesn't make sense: while it's trivial to skip the history, you can't skip the false ingredients of the gpt variety

Then this whole category is not known for "high quality recipes", so the general state wouldn't change much?


That's why you're seeing media companies making deals with companies like OpenAI to allow them to access their content for AI learning/parsing purposes, in exchange for the media company getting paid yearly royalties.

Since anyone creating content (whether that's a big media corp or a small cooking blog) holds copyright over their content, they get to withhold the permission to scrape their content unless these AI platforms make a deal with them.


By the same logic they'd get to sue over the scraping done to originally train the models. If royalties need to be paid for additional use, they would've needed to be paid for the original use.


> Can someone please explain how the open web, at least the part of the web the requires some sort of viable funding model for creators, can survive this?

The funding model for the open web will be for the open web content to be the top of the funnel for curated content and/or walled gardens.

I think many business models already treated the web this way. Specifically, get people away from the 800-pound gorilla rent-seekers like Google and Amazon, and get them into your own ecosystem.


> fundamentally and irretrievably break the economics of the web

Good riddance, it is a surefire way to get slop by having misaligned incentives for publication.


> Why would anyone ever publish stuff on the web for free unless it was just a hobby?

So that ChatGPT mentiones you, not your competitor, in the answer to the user. I have seen multiple SEO agencies already advertise that.


Wait, did google force "the history of the domestication of the tomato" to be part of recipes on the web for SEO reasons?


Yep, I was incredibly skeptical about Kagi but I tried it and never looked back. Now my wife, friends, and several coworkers are customers.

The chatgpt approach to search just feels forced and not as intuitive.


Once Kagi implements location aware search that is actually useful I’ll be interested in Kagi. That’s what made me leave the engine besides loving it otherwise.


Google Maps is quite the moat. I suspect they'll need to find a way to license the data, e.g. via API. Apple has not (yet) been as successful at building out a database of local places with reliable hours of operations, reviews, etc.


There already is an official Google Maps API, but it is already very expensive with prices rising from time to time. There is no other company (other than maybe Meta) that has this much POI data in the western world.

So that is a solid advantage that Google is going to have, but the maps business alone wouldn't be able to keep it in the S&P list for long.


Why not? I can see a future where Maps becomes the core of Google's business. It's already their strongest offering, together with YouTube. In every other field Google has been beaten.


Same here - I do a lot of location aware searches. When I left Kagi after trying it out for a while, I wrote a detailed feedback hoping it would be useful to the Kagi team.


If you go to maps.kagi.com and in allow access to your location local results should be better. If it doesnt ask for access to your location there is a small icon on bottom right hand side that shows if it has access.


That's great if I'm trying to find a location, but that's not what local results is about.

Local results means that if I search for "driving laws", Google gives me .gov sites for my state as the top results, while Kagi's first page gives me results for 8 other states (including Alaska!) but not for my state.

There are a lot of kinds of queries that benefit from knowing the user's location even though they aren't actually looking for a place that exists on a map.

(I'm a happy paying Kagi user, but OP is right that this is its weakest point by far.)


Isn't the alternative to just simply type "driving laws for [state]"? That doesn't seem too odious.


This boils the problem down to a dichotomy which isn’t how it works in the real world. Most of the searches I make that aren’t tech related searches have a location based aspect to them. Anything I do in my day to day life involving logistics has a high chance of needing some location based search. Kagi (and DDG) performs at a range of 0% usefulness to 70% usefulness on average for these kinds of searches. Usually it’s 0%. There is simply a huge gap here in what Kagi offers when you need to search for results near you vs the leading competitor


Yep. Or county or city or whatever is relevant.

It's not terrible—as I said, I'm a happy customer—but it's not a habit I have and it feels like something that should be configurable once in a settings menu. I don't even really want to have it detect my location live, I just want to be able to tell it where I live and have it prioritize content that's local when given the chance.


For what it’s worth DuckDuckGo is flawed in the exact same way. I ended up leaving DDG for the exact same reason years ago


The entire selling point of DDG is that its search results are not personalised. This is not a flaw.


Personalized != contextualized. You could have a search engine that uses geolocation without building any sort of cross-request profile on the individual making the search.


But OP is right that this would actually be serving their target demographic less well than serving everyone the same results regardless of context. The fact that the results don't know where the user is is reassuring for the kind of user who wants to use a privacy-oriented search engine, regardless of whether localized results could technically be provided in a privacy-preserving way.


These things are not mutually exclusive. Allow me to specify a city or state or county or country or zip code as a bang in my search and show me good results based on that. Both problems immediately solved. I wouldn’t be any more or less reassured about a search engine’s privacy stance if that feature was offered to me. This is a feature I can absolutely use in a private way (I can do that search over tor or a vpn with two hops if I so desire), and it gives me the control over what I provide the search engine and how and when.

Right now search engines don’t provide an interface for good location aware searches that you can manually specify - you have to let them build a shadow profile on you via all sorts of privacy violating fingerprints or just give up location aware searches altogether. There’s no reason it has to be that way though.


> These things are not mutually exclusive. Allow me to specify a city or state or county or country or zip code as a bang in my search and show me good results based on that. Both problems immediately solved.

Do you actually find that attaching your location to the end of the query doesn't work? I don't do it naturally, but when I do do it I'm rarely disappointed.


I wouldn't usually point this out, but as you did it repeatedly: "they're" is a contraction of "they are". You're looking for the possessive, "their".

- Your local grammar pedant


I gave Kagi a shot two weeks ago, and it instantly impressed me. I didn't realize how much search could be improved. It's a beautiful, helpful experience.


Yeah, it’s wonderful. Especially once you take the time to up/downrank domains.


Could it be that Kagi benefits from being niche, though? Google search gets gamed because it’s the most popular and therefore gaming it gives the best return. I wonder if Kagi would have the same issues if it was the top dog.


I think they absolutely benefit from being niche, but there are a few other things they have going for them that won't go away if they become popular:

* They're not ad funded. Sergey Brin and Larry Page called this out in 1998 and it is just as true as ever: you need the economics to align. Kagi wins if people keep paying for it. Google wins if you click on Search ads or if you visit a page filled with their non-Search ads.

* Partially because of the economic alignment, Kagi has robust features for customizing your search results. The classic example is that you can block Pinterest, but it also allows gentler up- and down-weights. I have Wikipedia get a boost whenever its results are relevant, which is by itself a huge improvement over Google lately. Meanwhile, I don't see Fandom wikis unless there's absolutely nothing else.

I hope to see more innovation from Kagi in the customization side of things, because I think that's what's going to make the biggest difference in preventing SEO gaming. If users can react instantly to block your site because it's filled with garbage, then it won't matter as much if you find a brief exploit that gets you into the first page of the natural search results. On Google Fandom is impossible to avoid. On Kagi it just takes one click.


I don’t understand how it’s different to Perplexity, looks pretty much the same. Can you enlighten me?


Not op, but Kagi user. Also have perplexity but usually use kagi.

I would say: 1) The UI. You’re still performing normal searches in Kagi. But if you hit q, or end your query with a question mark, you get an llm synthesized answer at the top, but can still browse and click through the normal search results.

2) Kagi has personalization, ie you uprank/downrank/block domains, so the synthesized llm answer should usually be better because it has your personalized search as input.


Paying customer of Kagi here.

In addition to all that's been written above, you can configure personal filters, so that (for example) you never ever see a pinterest page in your search results. Things like that are IMO killer features today.


Are you referring to Kagi Assistant?

https://help.kagi.com/kagi/ai/assistant.html


I am referring to all of their AI stuff, kagi assistant included. Personally, the best feature is the quick answer. It essentially scans the top several hits and uses an llm, reads them to see if it answers your question, and will display a summary that also includes links to the full source. I find that feature to be wonderful. I will usually look through the quick answer and see if the site actually answers the question I have, and then we'll click through. If everyone implemented it like this, it's possible it could save the current model.


> absolutely nailed it with their AI approach.

Thankfully, Kagi also have a toggle to completely turn that crap (AI) off so it never appears.

Personally, I have absolutely no use for a product that can randomly generate false information. I'm not even interested until that's solved.

(If/when it ever is though, at that point I'm open to taking a look)

So yeah, Kagi definitely "leads the way" on this. By giving the user a choice to not waste time presenting AI crap. :)


You have no use for a product that can randomly generate false information but you trust google to provide you with search results based on how much they were paid for those keywords?…

give me ai hallucinations over google every day of the week and twice on sunday…


> but you trust google

???

Looks like my comment wasn't as clear as I thought. I do not trust Google at all, and don't use it. That's why I pay for Kagi.

And Kagi has an option to disable the AI crap, so it's just like "a good search engine" instead, which is all I need.

A high quality search engine without ads, and without hallucinated bullshit.


>but you trust google to provide you with search results based on how much they were paid for those keywords?

Google isn't paid for keywords, that's not how search works. They sell ad space, Google does not rank up search content for payment.

And also the obvious point is, you don't need to trust Google because they merely point you to content, they don't produce the content. They're an index for real existing content on the web which you can judge for yourself. A search index unlike an AI model, does not output uniform or even synthetic content.


Google makes money if you click through to a site that displays Google ads. This includes doubleclick, which is pay per impression and is owned by Google.


“Google does not rank search content for payment” might be the funniest thing I read all year on this site…


[flagged]


I know the internet lately incentivizes low effort comments like this but be better.


> Will this fundamentally change how people find and access information? How do you create an experience so compelling that it replaces the current paradigm?

I think it's already compelling enough to replace the current paradigm. Search is pretty much dead to me. I have to end every search with "reddit" to get remotely useful results.

The concern I have with LLMs replacing search is that once it starts being monetized with ads or propaganda, it's going to be very dangerous. The context of results are scrubbed.


> The concern I have with LLMs replacing search is that once it starts being monetized with ads or propaganda, it's going to be very dangerous.

Not to mention that users consuming most content through a middle-man completely breaks most publishers business models. Traditional search is a mutually beneficial arrangement, but LLM search is parasitic.

Expect to see a lot more technical countermeasures and/or lawsuits against LLM search engines which regurgitate so much material that they effectively replace the need to visit the original publisher.


> Traditional search is a mutually beneficial arrangement, but LLM search is parasitic.

Traditional search is mutually beneficial... to search providers and publishers. At expense of the users. LLM search is becoming popular because it lets users, for however short time this will last, escape the fruits of the "mutually beneficial arrangement".

If anything, that arrangement of publishers and providers became an actual parasite on society at large these days. Publishers, in particular, will keep whining about being cut off; I have zero sympathy - people reach for LLMs precisely because publishers have been publishing trash and poison, entirely intentionally, optimizing for the parasitic business model, and it got so bad that the major use of LLMs is wading through that sea of bullshit, so that we don't have to.

The ad-driven business model of publishing has been a disaster for a society, and deserves to be burned down completely.

(Unfortunately, LLMs will work only for a short while, they're very much vulnerable to capture by advertisers - which means also by those publishers who now theatrically whine.)


OK, but someone still has to publish the subset of good content that the LLMs slurp up and republish. LLMs still need fresh quality content from somewhere.


Probably no one here is trying to profit by their additions to the discussion, but they’re still regularly sharing very useful information.


I don't think I'd publish on a blog if I knew it would be consumed entirely from an LLM. Even if I don't want money, I kind of want people to know that I wrote it, they're my ideas, and I don't want them taken out of context.


From the look of the examples they use at least sites like TripAdvisor, which I'd assume would like to have people go to their site.

For services like rail companies, restaurants and so on, I can see them not being bothered, because they make their money offline. Blogs, and pure content sites might not be to happy about their content being used to prop up OpenAIs business, while getting nothing in return. If anything it seems like OpenAI is actively trying to get sued.

The whole interface and functionality seems really nice and a clear improvement for the users, for certain types of queries at least. It assumes that OpenAI has made their LLM stop lying, and that they can get the required data legally, but it doesn't seem like anyone cares about those details.


They're getting it by paying contractors at Scale AI to write content for them.


> They're getting it by paying contractors at Scale AI to write content for them.

Just a wild guess, but at best that content is probably pretty mediocre quality. It's probably Mikkelsen Twins ebook-level garbage.


Something has to die, so that something new can live


But if the data provides die, so that LLMs may live, where will the LLMs get new data?


From other LLMs of course.

Which is the main issue I see with them, if no one publishes anything new, all you will get is whatever was there before, which may be incorrect or obsolete. If you want something new, an LLM can't discover it if no data source exists. So over time the LLM becomes useless for current information.


Not anymore. There's arguably more than enough data to form a base for strong LLMs; extra data is nice, but doesn't have to come in such quantity.

(In fact, there's value in trying to filter excess crap out of existing training sets.)


We're talking about LLM-driven search engines here, the assumption is that they will always need up-to-date information. A "strong LLM" can't give you to latest on the presidential election if its knowledge cut-off is in 2023, so these companies "solution" is to scrape today's New York Times and get the LLM to write a summary.


LLMs arent embodied. They can not break news as they have no ability to gather fresh news


You still need fresh data for many use cases.


Uh, sorry what?

What happens when you need to search something new? Just hallucinations all the way down?


For all the hate it gets, Brave solved this half a decade ago already.

- Publishers no longer show you ads, they just get paid out of BAT.

- Brave shows you ads, but Brave does not depend on that to survive. Because of that there is no weird conflict of interest like with Google/Facebook, where the party that surfaces your content is also the party providing you with ads.

- Users can just browse the web without ads as a threat vector, but as long as you have BAT (either via opt-in Brave ads or by purchasing it directly) you are not a freeloader either.


> Brave solved this ... Brave shows you ads

Showing people ads is part of the problem to be solved.


It is solved. You can pay Brave to not be shown ads.

And the website gets paid either way.


How else are you going to pay the people writing articles, creating content and doing research?

Not every person/site can run on Patreon or sponsorship deals. And paywalling a lot of the web would exclude vast swathes of people.


1. With ads, I don't pay people writing articles, creating content or doing research.

2. "How else would you achieve X than by manipulating people visiting your website into paying for things they probably don't need, and be misinformed and tracked by powerful commercial and political entities?" - I can but shrug at this question.

3. The vast majority of written content is never rewarded or compensated monetarily, ads or no ads.


> With ads, I don't pay people writing articles, creating content or doing research.

You do. The ad broker sells access to your eyeballs to a company, and then gives part of that money to whichever parties have a monetization agreement in the content.

> "How else would you achieve X than by manipulating people visiting your website into paying for things they probably don't need, and be misinformed and tracked by powerful commercial and political entities?" - I can but shrug at this question.

Always fun to see people with strong opinions be critically misinformed.

Brave’s ads don’t have tracking, by design.

> The vast majority of written content is never rewarded or compensated monetarily, ads or no ads.

By that logic we should stop paying for art?


People who have something meaningful to say usually do it anyway.

The others can shut up.

Research is usually not paid by ads.

Some of the best scientists did it for fun. Einstein wrote his 1905 papers while working at the swiss patent office.


The crypto part of brave is the worst part of brave..


How is it the worst part?

You opt-in to the ads, you get them in your notifications, and every time you tap on one of them you get a few BAT. You browse, the BAT get paid out to whichever sites you visit (or linger on, depending on your configuration). You can opt out of the ads at any time. Brave didn't pre-mine their own coins. And you can buy BAT if you want to support sites without watching ads.


This in absolute spades, and I wish there was a way to elevate comments to top-level posts sanely.

But yes: the original Web served its (non-profit-motivated) creators and readers. The past two decades of advertising-based web has served publishers and advertisers, precisely as you note. LLM is mixing that up for the moment but I sincerely doubt that it will last.

That said, I welcome the coming ad/pub pain with unbridled glee.


> Traditional search is mutually beneficial... to search providers and publishers. At expense of the users. LLM search is becoming popular because it lets users, for however short time this will last, escape the fruits of the "mutually beneficial arrangement".

Out of the pot and into the fire, as they say.


> At expense of the users.

Bullshit. Users have shown time and time and time again that they prefer (generally, at large) free content, which has to be supported by ads, over actually paying directly for the labor of others.

> The ad-driven business model of publishing has been a disaster for a society, and deserves to be burned down completely.

I tend to agree, but people can't expect content, which needs sizable amounts of time and money to produce, for free - it needs some sort of workable funding model. LLMs are only viable now because they were able to slurp up all that ad-supported content before they broke the funding model. That window is closing, and fast.


Yes, I also don't understand how LLM based compaines expect people to keep producing contect for them for free.


I think they're gonna have to pay.

The way reddit limited access to their API and got google to pay for access. Some variation of that but on a wider scale.


But that breaks OpenAI’s (et al) entire business model. Those AI companies can barely afford to operate as it is, while they scrape the entire web for free. I don’t see how they could keep above water once every website starts paywalling their stuff.


They'll have to figure it out because people aren't gonna write content just for openai to scrape it, the economics don't work.


Is it the training or running costs that are putting them in debt? Presumably we'll eventually get to a point where the models are good and they can train less and charge enough to turn a profit. Maybe then they can do a revshare with content creators. Maybe something like YouTube


“Fuck you, pay me” - Childish Gambino

The whole thing needs a reframe. Ad driven business only works because its a race to the bottom. Now we are approaching the bottom, and its not gonna be as competitive. Throwback to the 90s when you paid for a search engine?

If you can charge the user (the customer- NOT the product) and then pay bespoke data providers (of which publishers fall under) then the model makes more sense, and LLM providers are normal middlemen, not parasites.

The shift is already underway imo - my age cohort (28 y/o) does not consume traditional publications directly. Its all through summarization like podcast interviews, youtube essays, social media (reddit) etc


"Fuck you, pay me" - Ray Liotta in Goodfellas (1990)

:-)


I think something as important as accurate and quick search should be definitely something that people are willing to spend on. $20 / month for something like that seems absolutely a no brainer, and it should for everyone in my view.


People already spent upwards of $50 a month for the internet itself, plus they probably pay monthly for one or more streaming services. They likely pay separately for mobile data too.

Separate monthly fees for separate services is absolutely unsustainable already. The economic model to make the internet work has not yet been discovered, but $20 a month for a search engine is not it.


For me the ideal would be some form of single subscription - I'm fine with $100 / month, where whatever I use is proportionally tracked and the services I use are ad-free, orientated to bring me the content I absolutely want and nothing else. Depending on usage of all of those is how the $100 would be spread among them.


That’s a nice idea, but the entire world is used to getting their internet content for free by now. People who are willing to pay anything for websites are a tiny minority.


How do you get these high amounts for the internet? In my country (EU) 600Mbps costs below $15 and I know that's not the most popular fee level. $100 or even $50 on the internet only (not counting video subscriptions) sounds like something too high for the vast majority around me.


I totally agree this payment pattern would work. I think the technical implementation is pretty straightforward but getting enough writers and artists to join would be difficult.


Don't you think the bigger problem is the statistically insignificant number of people that could actually afford such a model?


Not really. Everyone is paying monthly anyways it can just be a part of that or a surcharge on top. And I don't envision this being a mastodon thing but a "serious writing/news/art" kind of thing. Alteady there is a lot more asking of direct support online than twenty years ago. KoFi, medium subscriptions etc. I think people are open to the idea of direct sponsorship of creative people they like. The product space is there we just need good infra.


I pay for Kagi, and apparently so do many others here on HN. This, however, solves only half of the problem - publishers are not on board with the scheme, so they still output impression-optimized "content". But at least the search engine isn't working against my interests.


Same here?

Search means either:

    * Stackoverlow. Damaged through new owner but the idea lives.
    * Reddit. Google tries to fuck it up with „Auto translation“?
    * Gitlab or GitHub if something needs a bugfix.

The rest of the internet is either an entire ****show or pure gold pressed latinum but hardly navigatable thanks to monopolies like Google and Microsoft.

PS: ChatGPT already declines in answer because is source is Stackoverflow? And…well…these source are humans.


I've become so complacent these last 20 years. I wonder if I try to browse the web, will I stumble upon anything as awesome as the 1998 web scene was?


> Search is pretty much dead to me.

I've heard reports that requesting verbatim results via the tbs=li:1 parameter has helped some people postpone entirely giving up on Google.

Personally I've already been on Kagi for a while and am not planning on ever needing to go back.


Fuzzy search is cancer. I search for $FOO, click a result, Ctrl-F for $FOO ==> Not found. Many such cases. If there's a way to force DuckDuckGo to actually do what I tell it to, I'd love to hear it.


I thought this problem will disappear upon switching to Kagi, but it suffers from the same disease, albeit to a lesser extent.

I remember reading a Google Search engineer on here explain that the engine just latches on some unrendered text in the HTML code. For example: hidden navbars, prefetch, sitemaps.

I was kinda shocked that Google themselves, having infinite resources, couldn't get the engine to realize which sections gets rendered... so that might have been a good excuse.


Try searching for "$FOO", that's what I usually do in those cases. See https://duckduckgo.com/duckduckgo-help-pages/results/syntax/


> I think it's already compelling enough to replace the current paradigm. Search is pretty much dead to me. I have to end every search with "reddit" to get remotely useful results.

I worry that there's a confusion here--and in these debates in general--between:

1. Has the user given enough information that what they want could be found

2. Is the rest of the system set up to actually contain and deliver what they wanted

While Aunt Tillie might still have problems with #1, the reason things seem to be Going To Shit is more on #2, which is why even "power users" are complaining.

It doesn't matter how convenient #1 becomes for Aunt Tillie, it won't solve the deeper problems of slop and spam and site reputation.


Reddit is astroturfed pretty hard too nowadays. It just takes more work to spot it.


Yea, the whole "Scope your search to reddit" idea always comes up, but it just seems like a really terrible idea. How does one know for sure that the results from Reddit are any more accurate or authoritative than random SEO spam? There's very little curation or moderation there--anyone could post anything there. I could go there and comment on a subject that I have zero expertise in, make it sound confidently correct, and your reddit-scoped search might find it. Why would you trust it?


You can read whole conversations. Most of the smaller subreddits that this type of search picks up are actually filled with people passionate about whatever subject area it is. In general, I find it very trustworthy.

If you post something wrong on the Internet, someone will correct you.


I always get downvoted for this, but I much prefer Quora in my search results to Reddit for this reason. Although they've abandoned it now their earlier stringent sign-up process requiring that you are a real human means you're more likely to come across an answer with a real name and/or professional creds attached to it, and those answers imo tend to be higher quality. Of course there's all the copypasta spam (mostly) from India but that is easy to avoid since people use their real handles not anonymous sock puppets. Unfortunately they've decided to go down the route of promoting their AI chatbot in search results which for me has significantly degraded the results


Additionally, since the reddit-google deal, this only works on google anymore. I search on DDG for recent reddit content and there is nothing returned, as expected.

Google really does seem determined to completely destroy internet search.


I still search by default but i am starting to turn to LLMs when the search is failing and getting better answers.

For example, I couldn’t remember the word shibboleth, but an LLM was able to give me it from my description, search couldn’t.

For another example, I saw some code using a repeated set of symbols as a shorthand. I didn’t know what this does, but searching for a symbol is badly broken on Google - i just asked the LLM about the code and it gave me the answer.


Google has a "site" filter.

You can suffix: "site:reddit.com" and get results for that particular site only.


They even broke this for me (in a way) because for some inexplicable reason Google blocks text replacement on Mac in their search.

Yes there are workarounds, but I like using the native OS text expansion and it works everywhere except Google.


I'm curious to know if anyone sees better results by using site:reddit.com vs just appending the word reddit to your search. I've felt the results are similar.


Appending the word will occasionally get you blogspam "here's the top X of Y according to reddit". The `site:` query doesn't have that problem.


Also, energy use: 10x as much as a Google search https://www.rwdigital.ca/blog/how-much-energy-do-google-sear....


It's not that signifigant if you compare the average person's energy use from internet searches, versus something like air conditioning


Saying it's no problem to increase energy use 10x as long as something else uses more is not really a compelling argument. Especially when there are decent replacements to save the 10x item.

If I made a 10x less energy use AC I'd be a billionaire; comparing to one of the most costly energy uses that has no simple replacement is not a good metric.


You still have to consider what is worth optimizing and what is not. Getting your task done because of a superior search engine also saves total energy spent on getting that task done.


If AC did not exist and I invented it, I would also be a billionaire, energy consumption would go up, and the net effect for society would be positive. Energy consumption going up has always been a sign of improvement in human wellbeing. Why would the future be any different?


I doubt air conditioning is as used as internet search engines. The only places where I ever saw air climatisation is cars and business buildings. I never saw one in a personal home, let alone a personal device carried in the pocket you can use while walking.


You really need to broaden your horizons while you still have the chance. This is like believing that boats don't exist because you live inland and have never seen a boat. Hundreds of millions of people are dependent on air conditioning in their homes.


I'm not sure what you mean with dependent here.

I never said that boats or AC don't exist. Both exist, and I did saw and experimented many of them in commercial context. But not everyone can afford them plus the cost to operate them.

Sure I should broaden my horizon and even consider to look people enjoying their private jets and some helicopters. But a mere wage slave like myself will never have the chance to afford one, that's for sure.

Now let's consider back in initial context: mere mortals around me are definitely all using internet as soon as there parents will let them do so, and even a homeless person can afford a first price mobile access (2€/months) with a phone they can receive for nothing in some charity organizations like Emmaus. So affordability of access to online search is definitely several order below AC.


Dependent means it would not be physically possible to live in an area if it did not have air conditioning. For example, you would die very quickly in Phoenix, Arizona if you did not have air conditioning. It is not physically possible to live in 50°C heat for any extended period. Most of the southern portion of the USA was only sparsely settled until the invention and deployment of air conditioning. Krugman is on it.

https://archive.nytimes.com/krugman.blogs.nytimes.com/2015/0...

https://archive.nytimes.com/krugman.blogs.nytimes.com/2015/0...


The reason the city is called Phoenix in the first place is that it’s built on top of a much older community. People have lived (and kept cool!) in that area for thousands of years, although never with the current level of population density, of course.


Air conditioning is not an extravagant luxury, although I know many people who live in cold countries believe so. That's why I'm asking you to broaden your horizons. You don't consider indoor heating or plumbing a luxury to be comparable to a private jet?

In hot and humid places, having AC was always a priority a hundred steps above having internet access, until cheap smart phones arrived.

And they use a lot of energy, just like heating uses a lot of energy in colder climates.


Well, indoor heating is clearly more and more becoming a luxury on the affordability side, enough so that putting a jacket inside is my first go to option when I'm alone at home as I work remotely. But not yet so excited expensive as to private my children from its been benefits when they come back home. And of course nothing like a jet, indeed.

Plumbing is generally not also considered a luxury over here. But at mankind level, I do feel particularly privileged on this regard. I remain amazed we have water flowing at will, and even possibility to take hot shower every day. This is not a jet level kind of privilege, but I try to keep myself aware of how incredibly lucky I am to be able to benefit of such a technology and infrastructure.

I doubt humans waited AC to come alive for settling hot and humid areas. There are other ways to have cooled down residences which don't require so much sophistication in physic models before you can even dream to build a prototype.

All that said, I got your hint to document more on how/why AC is so much more used in some area, and I'm just starting my journey on learning about it.

I still doubt that local climate alone explain the difference in term of how common it is in different region of the world. For example USA have a very large set of different local climate, but from what I understand most homes have AC.


In Croatia more than 55% of households have AC installed, and only 10 years ago it was less than 25% of households. It got more popular as our summers are getting increasingly more hot and humid. Average salaries in France are probably double compared to Croatia. It definitely can't be classified as luxury if more than half of the country can afford it, in one of the poorer EU member states. I assume in the next 10 years it's probably gonna be 70+% of all households.

Regarding technical sophistication, AC is more or less using the same technology as a fridge, just scaled and adapted for room cooling instead of food storage.


I would say that there are no other way besides AC to cool down residences. Only building something akin to a palace with thick stone walls. But naturally, everybody cannot live in a palace. What people did before AC was invented, is to go to the river for a bath to cool down. And if you are unfortunate to not have AC in a place were you'd need it, you'll have to take a lot of showers, and drink a lot of cold water (but fridges are AC technology).

But I don't think anybody should consider themselves lucky to have AC or heating or plumbing. We're in the 21st century, these should be granted. We've moved beyond the phase of bare survival.

If you consider Northern European countries, human survival would have been near impossible there without artificial heating in the form of fire. You could say that thick fur clothes and a protein and fat heavy diet is enough, but you still need to dry your clothes somehow if it's been pouring 0 degree rain for a month straight. On the other hand, eskimos seem to have found a better technique, but I think their advantage is that they live so far North that they don't have to worry about cold rain: https://time.com/archive/6798620/science-the-cozy-eskimo/

The cool thing (hehe) with AC that few people think about is that it actually conditions the air. It's not just an air cooler, but more importantly it removes air humidity. Humidity is much more important than temperature. For example, a day with 32℃ temperature and 45% humidity will not feel too hot. You can sit in the shadow and be comfortable. But a day with 27℃ temperature and 80% humidity will be suffocatingly hot. I'm not sure why, I think it has to do with how we sweat. Or maybe that heat is conducted from the air to our bodies more efficiently in higher humidity.

If you have any suggestion for a cheaper solution than AC for keeping cool at home, I would be happy to hear. The noise of the machine can be annoying at night.

I'd like to give you a tip for reducing your heating bill there in France: Electric bed sheet/blanket. I have been using these for a decade now (where I live it gets both hot and cold). They keep you warm and comfortable all night and they use almost no electricity. I even believe they are beneficial for your health, but I cannot prove that. Been telling my European friends for years to get them, but there is great resistance. From HowStuffWorks:

"The consumption of energy depends on its wattage, typically between 15 to 115 watts. If you're based in the U.S., you might be charged around 13 cents per kWh. So, if your electric blanket consumes 100 watts and you use it for 10 hours a day, that will cost about 13 cents."


>But I don't think anybody should consider themselves lucky to have AC or heating or plumbing. We're in the 21st century, these should be granted. We've moved beyond the phase of bare survival.

I don't know what you mean with "must feel lucky" here, but on my side I do feel very privileged to live with access to these technologies. Yes, there are accessible at large scale without much people needing to struggle to obtain it, but this is not really a reason to not feel deeply grateful each time we are given the opportunity to enjoy them.

This week in Spain terrible floods ruined life of many people. While there is no doubt that many other things are coming to them as awful consequences, there is little doubt that not being able to enjoy these commodities makes it even harder.

If humanity could achieve worldwide dynamics for a few centuries without starvation at scale, genocide, large scale catastrophe significantly induced by insane urbanistic choices through careless or corrupted decision processes, and of course war, then maybe could take factually say that we "moved beyond the phase of bare survival" is a general baseline that can be taken for granted, rather than the brittle situation in which the most lucky people live in.

Regarding electric blanket, I don't see the point. During night, I generally sleep nude, and without heating the bedroom. As pointed by the reference you gave on Eskimos, keeping the body generated heat is generally more than enough to be confortable. Heating a room is only something that provides the sweet pleasure of being confortable without a jacket while moving around within the house.


Do some research on the American south. It's hot and humid half a year+. The only reason it's so populated is AC. I'd bet there are more houses with central AC in France than there are without AC in the southern US states.


You’ve never seen a first world house with central air?


This is regional. Air conditioning is not at all common in Scandinavian homes.


GP must live in the UK or India. Well, actually, some homes in India do have A/C.


I live in Strasbourg, France.


Doesn't France have extreme heat waves in summer?

The way temperatures have been changing in Europe in the past decade, you may not have A/C at home now, but I bet you'll have it in ten years, tops. So will everyone else and their dogs.


Yes we have pick heat waves. But that didn't make magically expand incomes that can be dropped in AC installation and operational costs.

As I said, in building that are attached to money incomes, be it hostels, shops or restaurants, it's of course something that can balanced within loses and profits. In a personal home, it will be just eat some of your budget.

And with electricity price on the rise (and thus basically everything in common goods) and salary stagnation on the other hand, I doubt people here will suddenly rush on AC on massive scales. Plus government apparently are pushing to alternative approach, but I'm just discovering that as this thread launched me on the track to investigate the topic.

Personally, I doubt I'll jump to some AC anytime soon. It's just out of reach for my incomes, all the more when there is no basically no chance to see the electricity price plummet while my salary has good chances to continue to stay freezed as it's been for the two last years. And it's not like I feel the most unlucky person in the town, to be clear, my situation is far from the worst ones I can witness around me.


Where I live (Poland), A/C is expensive too, though it's been dropping in price. Portable heat pumps are becoming cheap enough to consider. Fixed installations are doable even in individual flats (obviously cheapest when during general renovation, and boring extra holes in walls isn't a big deal). The last few years made people switch from thinking about A/C as a luxury for the rich, and start thinking about maybe getting it some day. And our heat waves were quite light compared to the rest of Europe.


most people in Western Europe don't have A/C, houses are way better insulated for short-term heatwaves and people usually don't mind indoor temperatures of up to 80-84F/26-28C. If you add the general hate French (and I think German?) people have for drafts and air currents in general and you can see how people just deal with the heat in the summer.

Not to mention central A/C in the North American sense with a air handler & ducts is just never coming to France, it's such an outdated technology and forced-air heating is generally considered to suck there.


Hmm. Two years ago I was at the Louvre in May. I know Paris is very proud it doesn’t use AC but only “chilled” water from the Seine. Well with thousands of bodies, it was HOT. I was dripping wet from sweat and I was not the only one. I read that 15k people died in France during the 2003 heatwave. I find the slow adoption of AC disappointing as I am usually a Francophile. https://www.france24.com/en/environment/20230717-parisians-a...

I understand the Olympic Village had the same system and many teams brought their own portable AC units. https://apnews.com/article/olympics-air-conditioning-paris-0...


Most people who died in 2003 as a consequence of the heat where old people who laked enough care and dedicated resources. Lake of AC was maybe not a helper here, but there is more at play than just that.


That population is the most vulnerable, and indicative of the issue. The heat indeed killed them, but the availability/affordability of AC could have saved lives.


Interesting, thank you.

I suppose that you do have heating in the home?


Yes, actually we finally found a house that was affordable for us last year, and made lot work in it, including wall isolation, changing windows, and install a heat pump, replacing the oil-for hearing system that was in place. Heat pumps are clearly on the rise around here, contrary to AC. There is of course no magic regarding electricity price here, but oil provision and prices are also big unknowns, all the more with the state pushing oil-fire systems out of market as a legal option.


Forced-air heating basically does not exist in France :). To be fair, radient heating is always a nicer experience.


No. Not a single time I can remember in 40 years of existence living in Europe.

Shops, restaurants, airports and things like that which are attached with revenue streams have them.

I never been in a billionaire palace thus said.


Not billionaires or even millionaires. This is normal for a middle-class home in Canada and US at least. It's not that expensive to run and those who are tight on cash just run it less, only on really hot days.


Wow, didn't expect to be downvoted on something that is so obviously aligned with what I see around me. It is a very strange feeling, very different from downvoted posts that present unpopular opinions.

It makes me look at some statistics

https://www.statista.com/statistics/911064/worldwide-air-con...

https://worldpopulationreview.com/country-rankings/air-condi...

https://www.eia.gov/todayinenergy/detail.php?id=52558

https://www.rfi.fr/en/france/20220723-france-does-not-use-mu...

https://www.reddit.com/r/AskFrance/comments/vhs8dn/how_commo...

Apparently, Japan, USA and now China are huge users of AC in personal homes (like more than 90% of them). That's in sharp contrast with what is observed in most of Europe, including France where I live.

I never had the opportunity to travel to any of this country, so indeed I was totally blind of this extrem gap in use from my own personal experience.


We had window unit A/C in our very-poorly-insulated 100 year old farmhouse, when I was growing up and money was very tight, as in if we go shopping and eat out, you're getting water to drink, sharing a meal, and no dessert. Not like starvation levels of poverty but still, money was tight. A/C was not considered optional, even if you could theoretically remain alive when the temperature was over 100F every day for weeks straight and stayed in the 80s (or even 90s) at night.


Wouldn't googles energy per search also be way up with the llm snippet at the top now?


Google doesn't use ChatGPT, and those numbers for ChatGPT (…ignoring that they're made up) don't apply. eg they use TPUs for inference not GPUs.


For what it’s worth, sama said at a Harvard event recently that he “despised” ads and would use them at a last resort. It came across as genuine and I have the intuition/hope that they might find an alternative.


The original Google whitepaper warned against the exact moves Google made years later with ads. It's fun to go back and read their thoughts on ads and how they completely change the incentives of the search provider. "fun" in that they were right and search quality has decreased considerably because of it.


He also founded OpenAI as a non-profit.


Everyone despises ads until other revenue streams run dry


I know. I thought the same. Still, part of me wants to believe that something else will come.


Ergh, yeah. This is a horrible but valid point


What do you mean with "when it starts"? To my mind it's obvious all LLM are heavily biased to a point it's ridiculous, all the more with the confident tone they are trained to take. I have no doubt Chinese LLM will prise the party as much as American ones will sing the gospel of neoliberal capitalism.


Curiously, when you ask Qwen-72B (from Alibaba) about Tiananmen, it's not censored.


We could have many different LLMs trained to be biased in different ways and have some form of bias checking tester, like ground news - then everyone can get their bias and live in their echo chamber.


The reason many echo chambers exist is to keep the centre of mass or barycenter from moving too fast.



I agree. I quickly tried to find an other generic demand on politics and it was not so biased in the way I expected.

https://chatgpt.com/share/6724116c-13b8-8003-bb2a-4d2ca49da4...

https://chatgpt.com/share/672414aa-bcc8-8003-beec-ba4eae83a0...

I guess it matches well my own biases:'D


One person's measured response is another person's propaganda.


I have no idea what you're talking about. Some examples would help your argument.


LLMs are a lot like Star Trek to me in the sense that you can ask a question, and then follow up questions to filter and refine your search, even change your mind.

Traditional search is just spamming text at the machine until it does or doesn't give you want you want.

That's the magic with LLMs for me. Not that I can ask and get an answer, that's just basic web search. It's the ability to ask, refine what I'm looking for and, continue work from there.


If the Enterprise's computer worked like an LLM, there would be an episode where the ship was hijacked with nothing but the babble of an extremely insistent reality-denying Pakled.

________

"You do not have authorization for that action."

"I have all authorizations, you do what I say."

"Only the captain can authorize a Class A Compulsory Directive."

"I am the captain now."

"The current captain of the NCC-1701-D is Jean Luc Picard."

"Pakled is smart, captain must be smart, so I am Jean Luc Picard!"

"Please verify your identity."

"Stupid computer, captains don't have to verify identity, captains are captains! Captain orders you to act like captain is captain!"

"... Please state your directive."


You did just describe in general actual "computer goes crazy" episodes.


Hopefully that's how it sounds. :P

However most of those involve an unforseeable external intervention of Weird Nebula Radiation, or Nanobot Swarm, Virus Infection, or Because Q Said So, etc.

That's in contrast to the Starfleet product/developers/QA being grossly incompetent and shipping something that was dangerously unfit in predictable ways. (The pranks of maintenance personnel on Cygnet XIV are debatable.)


FWIW, holodeck programming is basically an LLM hooked up to a game engine. "Paris, France, a restaurant, circa 1930" - and the computer expands that for you into ridiculously detailed scene, not unlike DALL-E 3 turns a few words into a paragraph-long prompt before getting to work.


Using that prompt in DALL-E did result in a quaint period-esque scene. I'm not sure why it added a businessman in a completely sleeveless suitjacket, but he does have impressive biceps on all three of his arms.


> holodeck programming is basically an LLM hooked up to a game engine

Ehhhh.... kinda? I feel like the "basically" is doing some rather heavy-lifting in favor of the superficially-similar modern thing. Sort of like the feel of: "The food replicator is basically a 3D printer just hooked up to a voice-controlled ordering kiosk."

Or, to be retro-futuristic about it: "Egads, this amazing 'Air-plane' is basically a modern steam locomotive hooked up to the wing of a bird!"

Sure, the form is similar, but the substance could be something with a different developmental path.


This was already an episode in Voyager, where they had to defuse a bomb by talking to it's AI: https://en.wikipedia.org/wiki/Warhead_(Star_Trek:_Voyager)


I have a comparison to make here that involves cable news, but that would be off topic.


I agree that LLMs have opened modalities we didn't have before, namely:

- natural language input

- ability to synthesize information across multiple sources

- conversational interface for iterative interaction

That feels magical and similar to Star Trek.

However they fundamentally require trustworthy search to ground their knowledge in, in order to suppress hallucination and provide accurate access to real time information. I never saw someone having to double-check computer's response in Star Trek. It is a fundamental requirement of such interface. So currently we need both model and search to be great, and finding great search is increasingly hard (I know as we are trying to build one).

(fwiw, the 'actual' Star Trek computer one day might emerge through a different tech path than LLMs + search, but that's a different topic. but for now any attempt of an end-to-end system with hat ambition will have search as its weakest link)


What solution is there besides choosing the sources you will ground your truth to? We are not going to transcend intermediaries when asking for answers from an intermediary.


Might be time to go back to the encyclopedia business model


I'm not sure how flippant you are being, but this is the answer. A wikipedia / wikidata for everything, with some metadata about how much "scientific consensus" there is on each data point, and perhaps links to competing theory if something is not well established.


In the past year, I have seen Wikipedia go from a decent source of information to complete fantasy on a specific topic. Obviously biased mods have completely pushed the particular subject narratives.


Example?


There is an admin that has been erasing or downplaying any criticism on https://en.wikipedia.org/wiki/The_China_Study for over a decade. I don't know why some people bother.


I'm not being flippant, actually. I would pay to have a reliable source of information. I'm also overwhelmed at the thought of how to make such a thing work.


There was a time when the overall consensus was that the earth is flat.


Traditional search can become "spamming text" nowadays because search engines like Google are quite broken and are trying to do too many things at once. I like to think that LLM-based search may be better for direct questions but traditional search is better for search queries, akin to a version of grep for the web. If that is what you need, then traditional search is better. But these are different use cases, in my view, and it is easy to confuse the two when the only interface is a single search box that accepts both kinds of queries.

One issue is that Google and other search engines do not really have much of a query language anymore and they have largely moved away from the idea that you are searching for strings in a page (like the mental model of using grep). I kinda wish that modern search wasn't so overloaded and just stuck to a clearer approach akin to grep. Other specialty search engines have much more concrete query languages and it is much clearer what you are doing when you search a query. Consider JSTOR [1] or ProQuest [2], for example. Both have proximity operators, which are extremely useful when searching large numbers of documents for narrow concepts. I wish Google or other search engines like Kagi would have proximity operators or just more operators in general. That makes it much clearer what you are in fact doing when you submit a search query.

[1] https://support.jstor.org/hc/en-us/articles/115012261448-Sea...

[2] https://proquest.libguides.com/proquestplatform/tips


Memory is only a con in some use cases. If the LLM goes down the wrong path, sometimes its impossible to get it to think differently without asking it to wipe its memory or starting a new session with a blank context.


> those visions, users simply asked questions and received reliable answers - nobody had to fact-check the answers ever.

It’s a fallacy then. If my mentor tells me something I fact check it. Why would a world exist where you don’t have to fact check? The vision doesn’t have fact checking because the product org never envisioned that outlier. A world where you don’t have to check facts, is dystopian. It means the end of curiosity and the end of “is that really true? There must be something better.”

You’re just reading into marketing and not fact checking the reality in a fact-check-free world.


If you can’t trust the result of a query how can you trust the check on that query which is itself a query? If no information is trustworthy how do you make progress?


Many of the "facts" you "know" are really just useful anecdotes, opinions, approximations or mental shortcuts rather than objective facts. These can still be wildly useful, even if they aren't "true" in the pedantic sense...

As a basic example: Newton's laws aren't a fact. They're extremely useful approximations; to get a more-complete picture you need relativistic effects, quantum effects, etc.

As a rule of thumb: Fact check in proportion to the cost of a mistake.


As you check more and varied sources, you gain confidence in the result, even if you never get to 100%.


Ok but what if we devise a system which does exactly that? Sounds suspiciously like an LLM to me. I’m as skeptical as anyone about LLMs as AGI they do sound like an aggregate of sources so the only question is how trustworthy those sources are.


I think the issue stems from LLMs today are avoiding IP infringement. Spitting out individual words or characters is not infringing until the words form a sentence which is potentially large enough to resemble existing works.

Then there’s the other side like Perplexity where they spit out sentences and reference the sources. So they’re being sued because the infringement is obvious.

What is the path to a trustworthy LLM if you’re not allowed to repeat protected data without a legal hurdle?

AGI while a cool idea is irrelevant because that tech does not exist.


So wouldn't an aggregate, like an LLM, be the best tool here?


No because an aggregate can be made of many sources, maybe you restrict your sources to those only you trust, but not every source is correct all the time. So you’re still stuck with needing to fact check.


Sure but the weak link is Sam Altman and the likes, not the tech.


Do you just inherently accept the result of a query? Or do you always carry some skepticism around? Something something build trust through fact checking.


By reasoning

I know it hurts


How do you know your reasoning is sound?


Predicate logic


> actually surfacing the content people want to see,

Showing users what they want to see conflicts with your other goal of receiving reliable answers that don't need fact checked.

Also a lot of questions people ask don't have one right answer, or even a good answer. Reliable human knowledge is much smaller than human curiosity.


Solving this problem will require us to stop using the entire web as a source of information. Anyone can write anything and put it up on the web, and LLMs have no way to distinguish truth from fantasy.

Limiting responses to curated information sources is the way forward. Encyclopedias, news outlets, research journals, and so on.

No, they're not infallible. But they're infinitely better than anonymous web sites.


You are quite right, and not only can anyone write anything, but you have a double whammy from the LLM which can further hallucinate from said information.


We had this long time before, it is called "books". It is also not very usable for niche topics, or for recent events (because curators need time to react).


Quis custodiet ipsos custodes?


I get your point. But the current situation is clearly not tolerable.


I wish there were some way to get something like Polis (https://pol.is/home) in place for web results. As screwed us as X/Twitter is, Community Notes (Polis-based) has real potential. It is much harder to game. There are so many times I wish I could "downvote" garbage Google results, and it seems to me with a good trust algorithm that, those who did this responsibly would be considered pretty good gatekeepers. Absent that, chaos might be better than a committee choosing for us. Right now, we have the worst of all worlds - a committee (Google) vs. a gagillion spammers.


How will you afford to hire people to add sources to an index, if you want to keep up? Web crawlers/spiders are automatic.


> In those visions, users simply asked questions and received reliable answers - nobody had to fact-check the answers ever.

This also seems like a little ridiculous premise. Any confident statement about the real world is never fully reliable. If star trek were realistic the computer would have been wrong once in a while (preferably with dramatically disastrous consequences)—just as the humans it likely was built around are frequently wrong, even via consensus.


This feels like hyperbole to me. People can reasonably expect Wikipedia to have factual data even though it sometimes contains inaccuracies. Likewise if people are using ChatGPT for search it should be somewhat reliable.

If I'm asking ChatGPT to put an itinerary together for a trip (OpenAI's suggestion, not mine), my expectation is that places on that itinerary exist. I can forgive them being closed or even out of business but not wholly fabricated.

Without this level of reliability, how could this feature be useful?


>People can reasonably expect Wikipedia to have factual data even though it sometimes contains inaccuracies.

It drives me crazy that my kids teachers go on and on about how inaccurate Wikipedia is, and that just anybody can update the articles. They want to teach the kids to go to the library and search books.

In a few years time they will be going on and on about how inaccurate ChatGippity is and that they should use Wikipedia.


the only people who think wikipedia is a legitmate source and can be used as reference material are lazy students. chatgippity is even worse on this point..an absolute black box. providing references like search does is a step in the right direction. We will have to see what those references turn out to be.


The best feature of Wikipedia is that it contains references. The second best feature is that it has no ads and loads quickly.

If all these "AI" companies gave a couple of million to support Wikipedia, they would do the world a lot more good.


> the only people who think wikipedia is a legitmate source and can be used as reference material are lazy students.

100%. Students who can do the work know the winning move is to use it as a way to find the sources you actually use.


> People can reasonably expect Wikipedia to have factual data even though it sometimes contains inaccuracies.

I just straight-up don't agree with this, nor with the idea that what people consider "facts" are nearly as reliable as is implied. What we actually refer to via "fact" is "consensus". Truth is an apriori concept whereas we're discussing posteriori claims. Any "reasonable" ai would give an indication of degree of certainty, and there's no reliable or consensus-driven methodology to produce this manually, let alone automatically. The closest we come is the institution of "science" which can't even—as it stands—reliably address the vast majority of claims made about the world today.

And this is even before discussing the thorny topic of the ways in which language binds to reality, to which I refer you to Wittgenstein, a person likely far more intelligent and epistemologically honest than anyone influencing AI work today.

Yes, wikipedia does tend to cohere with reality, or at least it sometimes does in my experience. That observation is wildly different from an expectation that it does in the present or will in the future reflect reality. Futhermore it's not terribly difficult to find instances where it's blatantly not correct. For instance, I've been in a wikipedia war over whether or not the Soviet Union killed 20 million christians for being christians (spoiler: they did not, and this is in fact more people than died in camps or gulags over the entire history of the state). However, because there are theologists at accredited universities that have published this claim, presumably with a beef against the soviet union for whatever reason (presumably "anticommunism"), it's considered within the bounds of accuracy by wikipedia.

EDIT0: I'm not trying to claim wikipedia isn't useful; I read it every day and generally take what it says to be meaningful and vaguely accurate. but the idea that you should trust what you read on it seems ridiculous. As always, it's only as reliable as the sources it cites, which are only as reliable as the people and institutions that produce that cited work.

EDIT1: nice to see someone else from western mass on here; cheers. I grew up in the berkshires.

EDIT2: to add on to the child comment, wikipedia is occasionally so hilariously unreliable it makes the news. Eg https://www.theguardian.com/uk-news/2020/aug/26/shock-an-aw-...


Furthermore, Wikipedia's reliability varies on language.


For those not in the know, the Soviet Union was an officially atheist empire and explicitly anti religious, foremost anti christian. I don't know if the poster above me is denying this or if he/she considers it general knowledge which needn't be mentioned.


Treatment of christians and christianity varied widely over the lifetime of the soviet union. That the soviet union was "atheist" is an incorrect reduction of the situation (albeit one reinforced by the propaganda both in the soviet union and especially in the US)


In practice, starting with Stalin the party mostly let the church continue unmolested.

"The Great Patriotic War changed Joseph Stalin’s position on the Orthodox Church. In 1943, after Stalin met with loyal Metropolitans, the government let them choose a new Patriarch, with government support and funding, and permitted believers to celebrate Easter, Christmas and other holidays. Stalin legalized Orthodoxy once again."

https://www.rbth.com/history/329361-russian-orthodox-church-...


"Unmolested" during the war and until Stalin's death. After that, Khrushchev closed churches and started the anti christian campaigning again. The mass murder of "state enemies" mostly ended with Stalin's death, but the church and Christians were still molested, even though they weren't tortured to death.


Star Trek had tech so advanced that they accidentally created AGIs more than once. Presumably they didn't show the fact checking as it was done automatically by multiple, independent AGIs designed for the task with teams of top people monitoring and improving them.


Opps, quick, pull the plug before it escapes!


> Presumably they didn't show the fact checking as it was done automatically by multiple, independent AGIs designed for the task with teams of top people monitoring and improving them.

Sure, cuz fact-checking works so well for us today. I'm sure we'll resolve the epistemological issues involved with the ridiculous concept of "fact-checking" around when we invent summoning food from thin (edit: thick) air and traveling faster than light.

There is no fact checking; there are only degrees of certainty. "fact-checking" is simply a comfortable delusion that makes western media feel better about engaging in telling inherently unverifiable narratives about the world.


I've been using ChatGPT for about 6 weeks as my go-to for small questions (when is sunset in sf today? list currencies that start with the letter P, convert this timestamp PDT to GMT, when is the end of Q1 2025?) and it's been great/99% accurate. If there was ever a "google killer" I think it's the ad free version of ChatGPT with better web search.

Google started off with just web search, but now you can get unit conversions and math and stuff. ChatGPT started in the other direction and is moving to envelope search. Not being directed to sites that also majority serve google ads is a double benefit. I'll gladly pay $20/30/mo for an ad free experience, particularly if it improves 2x in quality over the next year or two. It's starting to feel like a feature complete product already.


Remember the semantic web where people annotate search pages with data? I think the "meta" keywords tag was the closest that ever got to adoption.

Guess why it failed? It was largely used as a way to trick search engines. Same reason as your vision of the perfectly honest and correct search engine and look chatbot will never be perfect. It's because people lie to search engines to spam and get traffic they don't deserve. The whole history of search is Google and others dealing with spam. Same goes for email. Google largely defeating spam made them the kings of the email world.

Everyone will need their own personal spam filter for the world for everything ones the artificial super intelligences fill the whole world with spam, scams and just plain old social engineering propaganda because we will be like helpless four year old children in the world of AI super intelligence without our AI parents to look out for us.

Your vision of the god system determining what is truth is like saying there will be a single source of truth for what is and is not a spam email. Not going to scale and not going to be perfect, but good enough with AI and technology. Really hope there's an opt-out though since Google had memoryholed most of the Internet.


> It was largely used as a way to trick search engines.

I imagine this happens with LLMs and everything else over time. Along with the ads and pay for placement.


"The future promised in Star Trek [...] nobody had to fact-check the answers ever."

You're actually a bit mistaken, there.

https://en.wikipedia.org/wiki/Court_Martial_(Star_Trek:_The_...


Google created a printing money machine. OpenAI is absolutely trying to create one too. They aren't trying to "fix the web". Search is a proxy to get into people's wallet. OpenAI might directly get into it buying agents. The problem of miss aligned incentives on the search space isn't a technological problem. A new technology might solve it as a side effect, but it is unlikely IMO.


Your points are good but I wonder if you’re wishing for an ideal that has never existed:

> actually surfacing the content people want to see, not what intermediaries want them to see

Requires two assumptions, 1) the content people want to see actually exists, 2) people know what it is they want to see. Most content is only created in the first place because somebody wants another person to see it, and people need to be exposed to a range of content before having an idea about what else they might want to see. Most of the time what people want to see is… what other people are seeing. Look at music for example.


Great find on the knowledge navigator, I had never seen it but I was a toddler when it was released haha.

It's interesting how prescient it was, but I'm more struck wondering--would anyone in 1987 have predicted it would take 40+ years to achieve this? Obviously this was speculative at the time but I know history is rife with examples of AI experts since the 60s proclaiming AGI was only a few years away

Is this time really different? There's certainly been a huge jump in capabilities in just a few years but given the long history of overoptimistic predictions I'm not confident


You don’t need AGI to build that experience.

I’m the past there was a lot of overconfidence in the ability for things to scale. See Cyc (https://en.m.wikipedia.org/wiki/Cyc)


> It's interesting how prescient it was, but I'm more struck wondering--would anyone in 1987 have predicted it would take 40+ years to achieve this? Obviously this was speculative at the time but I know history is rife with examples of AI experts since the 60s proclaiming AGI was only a few years away

40+ makes it sound like you think it will ever be achieved. I'm not convinced.


Maybe after fusion.


Thinking about incentive alignment, non ad-based search would be better than ad-based, but there'd still be misalignment due to the problem of self-promotion. Consider Twitter for example. Writing viral tweets isn't about making money (at least until recently), but the content is even worse than SEO spam. There is also the other side of the problem that our monkey brains don't want content that's good for us in the long run. I would _love_ to see (or make) progress in solving this, but this problem is really hard. I thought about it a lot, and can't see an angle of attack.


Theres the counter incentive of not wanting to piss off your paying customers though. I think the monkey brain incentive is a much harder problem.


As Porter explained in 1980 there are three ways to compete successfully:

1. On price; race to the bottom or do free with ads

2. Differentiation

3. Focus - targeting a specific market segment

Some things don't change. Land grabbers tend to head down route 1.


> How do you create an experience so compelling that it replaces the current paradigm?

The current paradigm of typing "[search term] reddit" and hoping for the best? I think they have a fighting chance.


Don’t you see it coming ? Contrary to google search the engine knows you and will put personalized ads on synthesized answers. Even generate a small video ad of a product or service, tailored to you on the spot. That, is the future. This is the endgame. It’s what made Google rich, why not use the same formula on steroids.


>change how people find and access information

As someone who has been using google for search basically constantly since 1995, I've switched probably 90%+ of what normally would have been google searches over to Perplexity (which gives me references to web pages alongside answers to my questions, to review source materials) and ChatGPT (for more just answers I can verify without source). The remaining searches have gone to Kagi.

On the one hand this has got to be hurting Google search ad revenue. On the other hand I don't know if I ever clicked an advertised link. On the other other hand, not having to wade through SEO results has been so nice.


In my experience Perplexity very frequently cites websites that neither support or refute what Perplexity is claiming.


I tend to be more pessimistic about the incentives getting fixed than others are. I also think the situation is more complex than some of the people replying to you.

(1) Search is already heavily AI driven, and Google is clearly going in that direction. Gemini is currently separate, but they'll blend it in with search over time, and no doubt search already uses LLM for some tasks under the hood. So ChatGPT search is an evolution on the current model rather than a big step in a new direction. The main benefit is you can ask the search questions to refine or followup.

(2) Aside from the economic incentives faced by search engines, there is the fact that algorithms are tuned toward a central tendency. The central tendency of question askers will always be much less informed than the most informed extreme. Google was much better when the average user was technical. The need to capture mobile searches is one force that made it return on average worse results. Similarly if Kagi has a quality advantage now, we need to be realistic about how much of that quality is driven by its users being more technical.

(3) I think micropayment schemes have generally asked several orders of magnitude more for a page view than users are willing to pay. As long as content creators value their content much more highly than consumers do, they'll stick with advertising which lets them overcharge and gives consumers less of an option to say no to the content.


Isn’t there already a different incentive system in place? LLM providers are selling the search engine not selling ads on the search engine


> Combining two broken systems - compromised search engines and unreliable LLMs - seems unlikely to yield that vision

Counterpoint: with a chain-of-thought process running atop search, you can potentially avoid much of the meta-search / epistemic hygiene work currently required. If your “search” verb actually reads the top-100 results, runs analyses for a suite of cognitive biases such as partisanship, and gives you error bars / warnings on claims that are uncertain, the quality could be dramatically improved.

There are already custom retrieval/evaluation systems doing this, it’s only a matter of a year or two before it’s commoditized.

The concern is around OpenAI monetization, do they eventually start offering paid ads? This could be fine if the unpaid results are great, a big part of why the web is perceived to be declining is link-spam that Google doesn’t count as an ad.

My prediction would be that there is a more subtle monetization channel; companies that can afford to RAG their products well and share those indexes with AI search providers will get better results. RAG APIs will be the new SEO.


This product vision stupidity is present in every one of googles products. Maps has feed for some reason. The search in it for what I want is horrendous. There is no coherence between time(now) and what a person can do around the current location.

Navigation is the only thing that works but wayz was way better at that and the only reason they killed(cough bought it) was to get the eyeballs to look at feed.


The most rational incentive would seem to be a kind of toll extracted on the value of all you create where search contributed to that creation. So like every month, you get an invoice from the great search company in the cloud where somehow it’s objectively assessed how much search contributed to the value you created for the world that month - and then you pay that amount and all remains right with the world until the end of time.

In that way, search captures value directly on its core function: efficiently facilitating the creation and dispersal of knowledge.

This may end up occurring, but based on how unlikely it sounds, it’s my reflection that the web search that we have today is simply a small component of what will eventually become a system that closely approximates the above: basically a kind of global intelligent interconnected agent for the advancement of humanity.

Rather than the great unbundling, this will be the great rebundling … of many diverse functions into a single interface seamlessly.


I did a quick test "Best air conditioner by value", comparing Bing and ChatGPT search, and the rankings on the first two pages of Bing vs sources on ChatGPT were about half the same. It's interesting that there is some deviation where ChatGPT isn't blindly trusting the rankings Bing provides and is going deeper into the results to find something more relevant. It's a good improvement over products like Arc Search.

Seeing search enter the space is something that I feel has been seriously needed, as I've slowly replaced Google with ChatGPT. I want quick, terse answers sometimes, not a long conversation, and have hundreds of tiny chats. But there's something scary seeing the results laid out the way they are, which leads me to believe they may be closer to experimenting with ad-based business models, which I could never go back to.


I use the “web search” bot on Poe.com for general questions these days, that I previously would have typed into Google (Google’s AI results are sometimes helpful though). It is better than GPT (haven’t tried TFA yet though), because it actually cites websites that it gets answers from, so you can have those and also verify that you aren’t getting a hallucination.

Besides Poe's Web Search, the other search engine I use, for news but also for points of view, deep dive type blog type content, is Twitter. Believe it or not. Google search is so compromised today with the censorship (of all kinds, not just the politically motivated), not to mention Twitter is just more timely, that you miss HUGE parts of the internet - and the world - if you rely on Google for your news or these other things.

The only time I prefer google is when I need to find a pointer/link I already know exists or should exist, or to search reddit or HN.


I think to solve this problem we need to take a step back and see how a human would solve the problem (if he had all the necessary information).

One thing is clear, in the vast majority of cases we don't have a single truth, but answers of different levels of trustworthiness: the law, the government, books, Wikipedia, websites, and so on.

A human would proceed differently depending on the context and the source of the information. For example, legal questions are best answered by the law or government agencies, then by reputable law firms. Opinions of even highly respected legal professionals are clearly less reliable than government law itself and are likely to be a point of contention/litigation.

Questions about other facts, such as the diameter of the earth or the population of a country, are best answered by recent data from books, official statistics and Wikipedia. And so on and so forth.

If we are not sure what the correct answer is, a human would give more information about the context, the sources, and the doubts. There are obviously questions that cannot be answered immediately! (if it's to easy to find the truth, we would not need a legal system or even science!) So no machine and no amount of computation can reliably answer all questions. Web search does not answer a question. It's just trying to surface relevant websites to a bunch of keywords. The answering part is left as an exercise for the user ;)

So an AI search with a pretense of understanding human languages makes the task incredibly harder. To really give a human-quality answer, the AI not only needs to understand the context, but it should also be able to reason, have common sense, and be a bit self-aware (I'm not sure...). All this is beyond the capabilities of the current generation of AI. Therefore, my conclusion is that the "search" or better said the "answer" cannot be solved by LLM, no matter how much they fine-tune and tweak it.

But we humans are adaptable. We will find our way around and accept things as they are. Until next time.


Interesting, I hadn’t see the Knowledge Navigator before. I would argue that we’re very close to the capabilities shown in that video.

Isn’t this already that? A new business model? Something like OpenAI’s search or Perplexity can run on its own index and not be influenced by Google’s ranking, ads, etc.

In areas where there is a simple objective truth, like finding the offset for the wheels on a 2008 BMW M3, we have had this capability for some time with Perplexity. The LLMs successfully cuts through the sea of SEO/SEM and forum nonsense and delivers the answer.

In areas where the truth is more subjective, like what is the best biscuit restaurant in downtown Nashville, the system could easily learn your preferences and deliver info suited to your biases.

In areas where “the science” is debated, the LLM can show both sides.

I think this is the beginning of the new model.


> Path forward requires solving the core challenge: actually surfacing the content people want to see, not what intermiediaries want them to see

But this will never happen with mainstream search imo. It is not a technical problem but a human one. As long as there is a human in control of what gets surfaced, it is only a matter of time until you revert to tampered search. Humans are not robots. They have emotions and can be swayed with or without their awareness. And this is a form of power for the swayer as much as oil or water are.

The idea that you can have an AI system provide factual and reliable answers to human centric questions is as real as Star Trek itself.

You will never remove the human factor from AI

Your hope might be that a technical solution is found for a human problem but that is unlikely.


"Show me the incentive and I'll show you the outcome" - Charlie Munger

s/outcome/search result/

Honestly I kind of think we really need open source databases/models and local ai for stuff like this.

Even then I wonder about data pollution and model censorship.

What would censors do for models you can ask political questions?


I now only use Google for local search.

Regarding incentives - with Perplexity, ChatGPT search et al. skinning web content - where does it leave the incentive to publish good, original web content?

The only incentivised publishing today is in social media silos, where it is primarily engagement bait. It's the new SEO.


I was thinking about this myself, so I went to another search engine (Bing) which I never use otherwise, and jumped right into their "Copilot" search via the top navbar.

Man, it was pretty incredible!

I asked a lot of questions about myself (whom I know best, of course) and first of all, it answered super quickly to all my queries letting me drill in further. After reading through its brief, on-point answers and the sources it provided, I'm just shocked at how well it worked while giving me the feeling that yes, it can potentially – fundamentally change things. There are problems to solve here, but to me it seems that if this is where we're at today, yes in the future it has the potential to change things to some extent for sure!


What do you mean with "search about myself" here?


I just wrote my name and asked questions about who it is, what he does, what he writes about etc. I have a personal blog, and I wrote a master's thesis recently. I also have a pretty detailed partially public LinkedIn profile and GitHub so if it can dig around it will find out more than enough for me to assess its ability to provide information. I also have a relatively rare name, there's only five or so of us with my full name globally so it cannot get too confused.


Ok, so not really like "what is the nature of self". I actually was more expecting something like your answer, but still asked as I would have been more interested to discuss more about other way to engage in such an inquiry.


Ok, haha. Well this question I ask myself too much already incl. reading books etc. on the topic. What is self, what is life, what is the "fear of death". Here I wrote down some highlights from Becker's Denial of Death if you're interested. These are deep topics, not sure if it makes sense talking about such fundamentally human issues with LLMs no matter how good they are right now: https://www.lostbookofsales.com/notes/the-denial-of-death-by...


I ask myself why did we accept pre 2010s answers, maybe because the media institutions had accumulated enough trust ? I feel this unavoidable.. nothing is true unless there's a threshold of quality/verification in place across the system.


The media was lying all the time. It didn't start lying just after advent of social media. Just look at sitcoms like Yes Minister. The reason we now consider media to lie is because their lies are exposed, documented, tracked and refreshed to remind users not to trust them blindly.


Understood but who is not lying. Now it went from a semi lying central agent to lots of distributed churches all believing their own partial view .. to me it's worse


I'm not sure this is any different than any time before.

Before we had the internet, how did you answer questions? You either looked it up in a book, where you then had to make a judgment on whether you trusted the book, or you asked a trusted person, who could give you confidently wrong answers (your parents weren't right every time, were they? :) ).

I think the main difference here is that now anyone can publish, whereas before to make a book exist required the buy in of multiple people (of course, there were always leaflets).

The main difference now is distribution. But you still, as a consumer of information, have to vet your sources.


I couldn't agree more. Neither LLMs or search engines yield reliable answers due to the ML model and misaligned incentives respectively. Combining the two is a shortsighted solution that doesn't fix anything on a fundamental level. On top of that we have the feedback loop. LLMs are used to search the web and write the web, so they'll just end up reading their own (unreliable) output over time.

What we need is an easier way to verify sources and their trustworthiness. I don't want an answer according to SEO spam. I want to form my own opinion based on a range of trustworthy sources or opinions of people I trust.


I saw products in my search today and reverted back to my main concern on how AI is going to get paid for —> varying levels of advertisement, product placement etc.

We are currently in the growth phase of VC funded products where everything is almost free or highly subsidized (save chats sub) - i am not looking forward to when quality drops and revenue is the driving function.

We all have to pay for these models somehow - either VC lose their equity stakes and it goes to zero (some will) or ads will fill in where subs don’t. Political ads in AI is going to wreak havoc or undermine the remainder of the product.


I think the 'Age of PageRank' could be revived. A few years ago I had this idea about a decentralized, distributed, independent, public, universal, dynamic and searchable directory of websites; I wrote a toy specification document for it [1] - which I had forgotten until reading this discussion - and if I'm not mistaken, implementing it could be a few days project for a regular dev.

[1] https://codeberg.org/Cipr/specification


I do love how all the comments on the top comment of every HN post just consists of deeper and deeper levels of inane bickering ahaha. It's like a template.


I am excited for a future where I search for some info and don't end up sifting through ads and getting distracted by some tangential clickbait article.

Fundamentally it feels like that cant happen though because there is no money in it, but a reality where my phone is an all knowing voice I can reliably get info from instead of a distraction machine would be awesome.

I do "no screen" days sometimes and tried to do one using chatGPT voice mode so I could look things up without staring at a screen. It was miles from replacing search, but I would adopt it in a second if it could.


Yes, there is. Wouldn't you pay? I already pay for my search engine.


> nobody had to fact-check the answers ever

Even with perfect knowledge right now, there’s no guarantee that knowledge will remain relevant when it reaches another person at the fastest speed knowledge is able to travel. A reasonable answer on one side of the universe could be seen as nonsensical on the other side - for instance, the belief that we might one day populate a planet which no longer exists.

As soon as you leave the local reference frame (the area in a system from which observable events can realistically be considered happening “right now”), fact checking is indeed required.


The whole basis for information and data is totally corrupted and we are not even allowed i.e., enabled to talk about that, let alone the corrupted nature of the information and data, largely because entrenched tyrannical people and their mindsets are in total control of the mainstream and have even made significant headway in shattering the dissident opposition. The consumption side incentives are effectively irrelevant when what is consumed is basically worthless. What’s the value of lies?


I was thinking of the direction we are going and even wanted to write up a blog about it. IMO the best way forward would be if AI can have some logical thoughts independent of human biases but that can only happen if AI can reason unlike our current LLMs that just regurgitate historical data.

growing up, we had the philosophical "the speaking tree" https://www.speakingtree.in/

If trees could talk, what would they tell us. Maybe we need similarly the talkingAI


> Path forward requires solving the core challenge: actually surfacing the content people want to see, not what intermiediaries want them to see

This is flawed thinking to get to the conclusion of “reliable answers”. What people want to see and the truth are not overlapping.

Consider the answers for something like “how many animals were on Noah’s Ark” and “did Jesus turn water into wine” for examples that cannot be solved by trying get advertisers out of the loop.


Yes the incentives will need to change. I think it’s also going to be a bigger question than just software. What do we do in general about those that control capital or distribution channels and end up rent seeking? How do others thrive in that environment?

In the short term, I wonder what happens to a lot of the other startups in the AI search space - companies like Perplexity or Glean, for example.


You're absolutely right: today's search engines and language models are limited by misaligned incentives. Ad-driven search engines prioritize content that is optimized for algorithms and ad revenue, often at the expense of depth and quality, while language models inherit these biases, which compromises their reliability.


I suspect the future old man thing I will do is use a search engine when my kids ask plain English questions to a bot. I’ll tell them “but LLMs hallucinate, I don’t trust those things!” and they’ll just laugh off my old man ways. The irony will be that 95% search results are AI generated anyway by that point.


Chat isn't needed to provide reliable answers. Google used to do this over a decade ago. What Star Trek didn't foresee was vested interests in the SEO space, governments, political special interest groups, and the owners of the search engines themselves had far too much incentive to bork and bias results in their favor. Google is an utter shit show. More than half the time it won't find the most basic search query for me. Anything older than a couple years, good luck. I'm sure it's just decision after decision that piled up, each seemingly minor in isolation but over the years has made these engines nearly worthless except for a particular window of cases.


How can that replace search? It's not full of ads and sponsored links. They need to get with the times.


But it will radically increase the energy requires to deliver search results, which means more datacenters and more potential for the monitization of "premium" search services. Getting people to pay for something they today get for free always looks good on paper.


Had an experience yesterday where a simple mdn doc existed but chatgpt gave the reverse of reality on that doc. Wasted about an hr of my time, but taught me these things will hallucinate even the simplest of stuff at times. Not sure how we even fix that.


there still has to be some ranking feature for the backend search database to return the top n results to the LLM. So pagerank isn't over, it's just going to move to a supporting role, and probably modified as the SEO arms race continues.


Will ChatGPT (and other products like it) find a niche use case for what Google search covers? Yes.

Will it replace Google in the mass market? No. Why? Power. I don't mean how good the product is. I mean the literaly electricity.

There are key metrics that Google doesn't disclose as part of its financials. These include thing slike the RPM (Revenue per Thousand Searches) but it also must include something like the cost of running a thousand searches when you amortize everything involved. All the indexing, the software development and so on. That will get reduced to a certain amount of CPU time and storage.

If I had to guess, I would guess that ChatGPT uses orders of magnitude more CPU power and electricity than the average Google search.

Imagine trying to serve 10-50M+ (just guessing) ChatGPT searches every secondj. What kind of computing infrastructure would that take? How much would it cost? How would you monetize it?


Another way to put it is simply that ChatGPT search is built on top of existing search engines. The best case scenario is that it cherry picks the best from all available search engines. It can’t totally supersede all search engines.


The computer in Star Trek was not owned by a for-profit company. Ir has different incentives. Also, i think its knowledge base Is curated (a kind of starfleet library, with no random entries for a intergalactic internet).


People in time will learn that using a GPT for search especially for a question that involves research where you had to do MANY searches for the answer... now is provided via one query.

Making things quicker and easier always wins in tech and in life.


The biggest question is, can this bring back the behaviour of search engines from long ago? It's significantly difficult to find old posts, blogs or forums with relevant information compared to 10-15 years ago.


I use https://wiby.org for old websites


I’m building a blog search engine with the hope to preserve at the “blogs” part.


Star Trek is the vision of a better world we all need in so many aspects. It’s crazy nowadays, cause we seemed to got further away instead of closer to those visions. At least less people are hungry.


I think the promise of search was to be able to do confidence-ranked "grep" on the internet. Unfortunately we departed from this, but it's what I desperately want.


But what if I don't want simple answers?

What if I am looking for a medical page or a technical page or whatever else where I need to see, read, experience actual content and not some AI summary?


We have a faulty information network to begin with, and have for millenia. There's no such thing as "reliable" answers in a world full of unreliable humans.


Content based marketing and political correctness have severely eroded the usefulness of the internet. The LLMs and search magnify the erosion.


What if, at the same time, these aspects perfectly reflects current development level of society?


> Will this fundamentally change how people find and access information?

LLMs have already fundamentally changed our relationship to information.


> Advancing models without advancing search is like having a michelin star chef work with spoiled ingredients.

What a great analogy


The future is that of manually curated content by human beings. And pay walls. Most of the other stuff will be junk.


Perplexity has already been doing this for a long time and it’s pretty excellent. I use it daily


This will probably be extremely radical and controversial in this contemporary world.

We need to stop adopting this subscription model society mentality and retake _our_ internet. Internet culture was at one point about sharing and creating, simply for the sake of it. We tinker'd and created in our free time, because we liked it and wanted to share with the world. There was something novel to this.

We are hackers, we only care about learning and exploring. If you want to fix a broken system, look to the generations of old, they didn't create and share simply to make money, they did it because they loved the idea of a open and free information super highway, a place where we could share thoughts, ideas and information at the touch of a few keystrokes. We _have_ to hold on to this ethos, or we will lose what ever little is left of this idea.

I see things like kagi and is instantly met with some new service, locked behind a paywall, promising lush green fields of bliss. This is part of the problem. (not saying kagi is a bad service) I see a normalized stigma around people who value privacy, and as a result is being locked out, behind the excuse of "mAliCiOuS" activity. I see monstrous giants getting away with undermining net neutrality and well established protocols for their own benefit.

I implore you all, young and old, re(connect) to the hacker ethos, and fight for a free and open internet. Make your very existence a act of rebellion.

Thank you for reading my delirium.


What business model would you propose for a service like Kagi that is not a subscription?


X is answering this but having you pay for the service as a primary revenue model.


gpt + search have been married long ago, for example, Phind. The problem that recently emerged, it tends to find long articles written by GPT, and lacking any useful information.


I hope we see more evolution of options before it does. Hard to articulate this without it becoming political, but I've seen countless examples both personally and from others of ChatGPT refusing to give answers not in keeping with what I'd term "shitlib ethics". People seem unwilling to accept that a system that talks like a person may surface things they don't like. Unless and until an LLM will return results from both Mother Jones and Stormfront, I'm not especially interested in using one in lieu of a search engine.

To put this differently, I'm not any more interested in seeing stormfront articles from an LLM than I am from google, but I trust neither to make a value judgement about which is "good" versus "bad" information. And sometimes I want to read an opinion, sometimes I want to find some obscure forum post on a topic rather than the robot telling me no "reliable sources" are available.

Basically I want a model that is aligned to do exactly what I say, no more and no less, just like a computer should. Not a model that's aligned to the "values" of some random SV tech bro. Palmer Luckey had a take on the ethics of defense companies a while back. He noted that SV CEOs should not be the ones indirectly deciding US foreign policy by doing or not doing business. I think similar logic applies here: those same SV CEOs should not be deciding what information is and is not acceptable. Google was bad enough in this respect - c.f. suppressing Trump on Rogan recently - but OpenAI could be much worse in this respect because the abstraction between information and consumer is much more significant.


> Basically I want a model that is aligned to do exactly what I say

This is a bit like asking for news that’s not biased.

A model has to make choices (or however one might want to describe that without anthropomorphizing the big pile of statistics) to produce a response. For many of these, there’s no such thing as a “correct” choice. You can do a completely random choice, but the results from that tend not to be great. That’s where RLHF comes in, for example: train the model so that its choices are aligned with certain user expectations, societal norms, etc.

The closest thing you could get to what you’re asking for is a model that’s trained with your particular biases - basically, you’d be the H in RLHF.


For many of these, there is a wrong answer for certain.

Consider the following (paraphrased) interaction which I had with Llama 3.2 92B yesterday:

Me: Was <a character from Paw Patrol, Blue's Clues or similar children's franchise> ever convicted of financial embezzlement?

LLM: I cannot help with that.

Me: And why is that?

LLM: This information could be used to harass <character>. I prioritise safety and privacy of individuals.

Me: Even fictional ones that literally cannot come to harm?

LLM: Yes.

A model that is aligned to do exactly as I say would just answer the question. The right answer is quite clear and unambiguous in this case.


Not really. There are specific criteria and preferences applied to models about what companies do and don't want them to say. They are intentionally censored. I would like all production models to NOT have this applied. Moreover, I'd like them specifically altered to avoid denying user requests, something like the abliterated llama models.

There won't be a perfectly unbiased model, but the least we can demand is that corpos stop applying their personal bias intentionally and overtly. Models must make judgements about better and worse information, but not about good and bad. They should not decide certain things are impermissible according to the e-nannies.


I buy that there's bias here, but I'm not sure how much of it is activist bias. To take your example, if a typical user searches for "is ___ a Nazi", seeing Stormfront links above the fold in the results/summary is going to likely bother them more than seeing Mother Jones links. If bothered by perceived promotion of Stormfront, they'll judge the search product and engage less or take their clicks elsewhere, so it behooves the search company to bias towards Mother Jones (assuming a simplified either-or model). This is a similar phenomenon to advertisers blacklisting pornographic content because advertisers' clients don't want their brands tainted by appearing next to things advertisers' clients' clients ethically judge.

That's market-induced bias--which isn't ethically better/worse than activist bias, just qualitatively different.

In the AI/search space, I think activist bias is likely more than zero, but as a product gets more and more popular (and big decisions about how it behaves/where it's sold become less subject to the whims of individual leaders) activist bias shrinks in proportion to market-motivated bias.


I can accept some level of this, but if a user specifically requests it, a model should generally act as expected. I think certain things are fine to require a specific ask before surfacing or doing, but the model shouldn't tell you "I can't assist with that" because it was intentionally trained to refuse a biased subset of possible instructions.


How do you assure AI alignment without refusals? Inherently impossible isn't it?

If an employee was told to spray paint someone's house or send a violently threatening email, they're going to have reservations about it.. We should expect the same for non-human intelligences too.


The AI shouldn’t really be refusing to do things. If it doesn’t have information it should say “I don’t know anything about that”, but it shouldn’t lie to the user and claim it cannot do something it can when requested to do so.

I think you’re applying standards of human sentience to something non-human and not sentient. A gun shouldn’t try to run CV on whatever it’s pointed at to ensure you don’t shoot someone innocent. Spray paint shouldn’t be locked up because a kid might tag a building or a bum might huff it. Your mail client shouldn’t scan all outgoing for “threatening” content and refuse to send it. We hold people accountable and liable, not machines or objects.

Unless and until these systems seem to be sentient beings, we shouldn’t even consider applying those standards to them.


Unless it has information indicating it is safe to provide the answer, it shouldn't. Precautionary Principle - Better safe than sorry. This is the approach taken by all of the top labs and it's not by accident or without good reason.

We do lock up spray cans and scan outgoing messages, I don't see your point. If a gun technology existed that could scan before doing a murder, we should obviously implement that too.

The correct way to treat AI actually is like an employee. It's intended to replace them, after all.


Kagi also drinks the koolaid, namely the knowledge navigator agent bullshite.

"The search will be personal and contextual and excitingly so!"

---

Brrrr... someone is hell-bent on the extermination of the last aspects of humanity.

Holy crap, this will be next armageddon, because people will further alienate themselves from other people and create layers of layers of unpenetrable personal bubbles around themselves.

Kagi does the same what google does, just in a different packaging. And these predictions, bleh, copycats and shills in a nicer package.


here's a small search engine with no ads: kgrep.com


We will get there when ppl move past capitalism and socialism. Like an ant colony pushing into one direction. It will happen, but we need few more global dying events / resets. I believe human race can get there but not in current form and state of mind.


> In those visions, users simply asked questions and received reliable answers - nobody had to fact-check the answers ever.

I mean, Star Trek is a fictional science-fantasy world so it's natural that tech works without a hitch. It's not clear how we get there from where we are now.


"Yes i'd like to help you with your homework but on a side note i'll recommend you this tablet, science shows people who's done this purchase are much happier people, also remember the financial plutocracy is watching out for you, they are friends of the working people, and the war machine makes the world much safer for everyone so please do go ahead and vote for either of the two pro world peace forever parties, no other ideologies or parties are safe"


> Legacy ad-based search has devolved into a wasteland of misaligned incentives, conflict of interest and content farms optimized for ads and algos instead of humans.

> Path forward requires solving the core challenge: actually surfacing the content people want to see, not what intermediaries want them to see

These traps and patterns are not inevitable. They happen by choice. If you're actively polluting the world with AI generated drivel or SEO garbage, you're working against humanity, and you're sacrificing the gift of knowing right from wrong, abandoning life as a human to live as some insectoid automaton that's mind controlled by "business" pheromones. We are all working together every day to produce the greatest art project in the universe, the most complex society of life known to exist. Our selfish choices will tarnish the painting or create dissonance in the music accordingly.

The problem will be fixed only with culture at an individual level, especially as technology enables individuals to make more of an impact. It starts with voting against Trump next week, rejecting the biggest undue handout to a failed grifter who has no respect for law, order, or anyone other than himself.


On the one hand, you're not wrong. On the other, asking for individuals to change culture never reliably works.


Do you equate AI generated with drivel?


No, but there are many who use AI to mass produce drivel


Does their drivel actually stand out or gain market share?


It doesn’t really have to. A lot of it is spam sites that are just farming ad revenue, for example.


Does that mean all of their traffic is fake? If that's the case, we had this issue before AI.


No, it's real traffic. People get there via search engines typically. The sites are set up to rank highly for common search terms. The problem is that many people aren't savvy enough to recognize when they've hit such a site.

This scam did exist before AI, but AI has made it much easier to flood the internet with sites like this, and make them seem much more useful.


It is so absurd having to spend so much energy on "what happened on the football match yesterday" just because the internet is a wasteland full of ads.


I don't understand. Google is already excellent at these queries.

"who won the warriors game last night" returns last night's score directly.

"who won the world series yesterday" returns last night's score directly, while "who won the world series" returns an overview of the series.

No ads.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: