Hacker News new | past | comments | ask | show | jobs | submit | more m_ke's comments login

Andy is a treasure, if only we had more professors like him


VLA - Vision Language Action models

https://arxiv.org/abs/2406.09246

It turns out you can take a vision language foundational model that has a broad understanding of visual and textual knowledge and fine tune it to output robot actions given a sequence of images and previous actions.

This approach beats all previous methods by a wide margin and transfers across tasks.


I only learned about it recently and it really showed me a cancerous side of tech that I did not know existed.

So I’d say it’s great for giving people an anonymous channel to be their true selves and shining a light on opinions / attitudes that I was blind to.


I think anonymity gives people more power to speak openly, but I feel it leans more toward the right-wing perspective compared to X.


I only learned about it recently and it really showed me a cancerous side of tech that I did not know existed.

So I’d say it’s great for giving people an anonymous channel to be their true selves and shining a light on opinions / attitudes that I was blind to.


Yeah the process these days is insane. I went through 4 rounds of interviews with one company, did really well and was expecting an offer but they want me to come in for a full day of onsite work mixed in with more technical rounds.

I’m interviewing with like 10 companies right now and it feels like a full time job, I have full days of interviews lined up for almost every day in the first 3 weeks on January.


This is actually identical to my experience in 2021/2022 when the market was red hot. I literally did treat it as my full time job, failed a lot, but rarely had less than 4 interviews which it seems most of the good companies do.


It's not just you.

I have an Ivy league degree, worked in deep learning since alexnet at a leading startup in the space, was a CTO of a startup that got acquired and have referrals from very senior people at the top FANG companies and still struggled to get interviews.

I also have research scientist friends with Neurips papers, ones that solved long standing open math problems and even they are struggling to get hired.

What me and my friends heard from a lot of people at the large companies was that many of them are no longer hiring in the US, but in India, Poland and Brazil instead, and that the roles they have listed in the US are for internal transfers. I've had a referral for Google for months and did not get an interview for NYC based roles, but when I went to an ML conference in Warsaw a few months ago I learned that Google is looking to hire 2000 people there, but with people in that office making ~1/4th of my friends in the US.

On top of that you have a huge pool of bootcamp grads and foreign applicants so any role posted gets 1000s of applications in the first few hours, making it impossible for recruiters to look over all of them.

And if that wasn't enough we're going through a huge hiring downturn post the COVID bump, see: https://fred.stlouisfed.org/series/IHLIDXUSTPSOFTDEVE


Can confirm. I am brazilian. Did a few onsites this end of year. Guess the only one I passed with flying colors? Uber, Brazil office. TC for mid level around 80k dollars. Now that the real devalued even more, maybe 75-70?

Most of my brazilian friends living abroad also cleared said onsites, absolute majority rejected their offers(because they live elsewhere with a currency that has any sort of value).

As a matter of fact, Uber is struggling to fill their vacancies around Brazil. Everyone that can clear said interview are already in Europe/US. So you're only left off with the people that actually want to be closer to their families.


Small nitpick but using 'said' as a pronoun everywhere makes your writing hard to parse.


Hiring a lot in Poland and from what I know paying peanuts compared to US salaries, but still above the market rate in Europe, mainly because market rate for engineers in EU is nothing to write home about.

It sucks for you guys, but its just the new reality. Engineering is dead end career unless you are top 10%, move on to management, or money doesnt play a role, but at that point you might as well go duck farming or become a carpenter.


It turns out that work-remotely can be done by non-US based workers for a fraction of the cost of US workers.

So those lobbying for work-from-home, well, this is the natural endpoint of that.


They would’ve outsourced regardless of remote or not. It’s all about cost. Employers are hedging after they felt they lost control during the “Great Resignation” and the labor cost pressures they faced from it. Remote also makes it too easy for workers to find other opportunities, versus being restricted to opportunities within a commute distance. Unless it’s offshore remote of course.

This is not without peril due to structural demographics. You can only outsource so much.

https://www.nbcnews.com/business/economy/hard-to-find-a-job-... | https://news.ycombinator.com/item?id=42361817

https://www.bloomberg.com/news/articles/2024-09-18/us-faces-... | https://archive.today/Lyr5t


> They would’ve outsourced regardless of remote or not.

Absolutely

> It’s all about cost.

Yes and No...

What is happening now in "tech" has happened before in other industries. Since the internet various service industries have outsourced, outshored, nearshored over and over all together and in different order. I would put immigration or even move to the public cloud in that bag of schemes.

Unless those are very low level tasks the companies doing that don't really save money in the short and long term. This has been proven over and over in the last decades, but rarely admitted publicly.

What companies try to hide by doing that are dysfunctions in management and HR (usually the people who won't get offshored).

So yes it is about cost of mismanagement that need to be hidden with a very short term vision, usually to reassure shareholders.

There is another layer to this that is still very taboo. There is an inability for our societies today to make people work together in a reasonably healthy and stable manner. This is why you will often hear about "toxic workplaces" or "great resignation".

And casting those problems in economic terms to avoid talking about it lead us to bad and costly solutions.


An interesting take. Seen from the viewpoint of the worker I can see why blaming management has appeal.

Of course your argument leads to two conclusions;

A) given how badly these companies are managed, and how it will be cheaper in the long run to make use of local talent, the market must be ready for a new wave of companies that rise up. Certainly (it would seem) the local workers are available, and ready to work. Certainly there are enough ex-workers who well understand what bad management looks like, and won't make the same mistake.

B) since this outsourcing is expensive, in the long run, I expect incumbents to start failing soon. There's no need to make thus behavior illegal, the market will simply correct naturally to the most efficient path.

The natural approach of the American Entrepreneur is to see where things are going wrong, and step into the gap - forging the next success.

Where you see companies failing, I see opportunity.


This is not the viewpoint of the worker. Especially because it kind of put a political frame on it.

I have implemented those cost reduction programs over and over in many industries. I can assure you we always know it is gonna cost more than just doing things right. The problem is there is a lot of bad incentives, people saving their skins and and a whole industry making money out of those cost reduction programs. They are now permanent in many industries and became the standard way to manage those companies.

I would agree with your two conclusions, this is the logical way it "should" go.

Now reality is for most of the big players it probably won't go that way. For example I expect the hyperscalers to go the way telcos went. They have the power (financial, political & co) to freeze their position in the market and they are now a critical part of our infrastructure. But working for them or keeping their stock in the last 20 years was probably a very bad move.


Big multinational companies are creating entire development departments outside of the US. They aren’t hiring one or two contractors. Do you think the US has a monopoly on good developers?


But these are not remote positions


According to Amazon it can't... :-)


My company is hiring in ML, the position is remote, can you send me your resume?


Thanks, I’m actually pretty far in the interview process with a bunch of companies now but would be open to learning more.

My contact info is in my bio.


My good friend is a sr ml research eng looking for remote work also


OK, send them my way.


>Poland

Same here. At my last company, each missed milestone meant "fire an American, hire a Pole."


> was a CTO of a startup

Could it just be, the more senior/executive level you are - the harder it’ll always be to find another comparable job.

The funnel gets smaller the higher up you are.


NYC is a much more attractive location than Warsaw, not just because of the salary, so I fully expect the open roles in NYC to be filled very quickly.

I always wondered why American companies have offices in such unattractive locations. Last time I saw a company whose only European office was in Belgrade. Who are you going to convince to move to Belgrade to work for your company, lol? This is not like the US were all states speak English and have broadly shared cultures.


On a below-average US salary living in Belgrade your quality of life would be equal to a 7-digit salary living in the US. Not to mention that you would be in one of the safest cities in the World for women and children and generally would live better than Seinfeld on Park Ave :)

and of course everyone speaks english…


I'll say this: I once had to talk with a team in Belgrade. I was honestly not looking forward to it due to language barrier and such, but I didn't know anything about Serbia.

I was pleasantly surprised that that team spoke better English than most folks in the US. Every one of them.

If that's any indication of how most of the city speaks, I don't think any English speaker would have a bit of trouble communicating.


As I said in the other comment, I just don't think it would be a good experience when one is a foreigner. Maybe if you consider it as just an adventure. I personally think I would get bored and lonely very fast.


oh man, honestly this can’t be further from the truth, regardless of your age, marital status… belgrade in particular and europe in general would be anything but boring for US folk!


This comment brought to you by the Belgrade Board of Tourism!


As polish immigrant who moved to NYC as a kid and grew up here there's no place I love more than NYC but you're wrong about Warsaw. I've been to most major cities in the US and Europe and the standard of living in Warsaw is better than 95% of them.


I know that Warsaw is a great city, but I don't want to be a foreigner in Poland. I think there's a host of aspects that would impact the standard of living, when one is a foreigner in another country, that you aren't fully considering. NYC is great because it's a city of foreigners. Warsaw isn't.


That’s the point why they are opening offices in Warsaw, they don’t want you to relocate there, they want to hire local engineers willing to work for like quarter of the US salary, and there’s no shortage of people like that there.


I went to Belgrade a few years back over Christmas by myself and had quite a good time.

In particular, the Nikola Tesla Museum, free walking city tour, massages, and some cool bars.

Everything was really inexpensive compared to say Germany, and the women over there... they are stunning!

Of course it'd be different to live over there, but it really wouldn't be the worst place to end up if you were earning a decent salary.

Only obvious downside for me was smoking in restaurants (I can't remember but probably bars as well). Not sure if it's still the same now.


> Last time I saw a company whose only European office was in Belgrade

Obviously this is done to hire local workers. In general, and especially in Europe, most people are not interested in moving away from their home country if they can avoid it, even if it means a vastly reduced earning and career advancement potential. Serbia in particular has lots of engineers with a phenomenal quality to cost ratio.


forget the faang companies - but just your rank and file VC funded startup - series b and up. most software engineering jobs are either in poland, india.

so yeah ya'll folks are not hallucinating - jobs are gone.

e.g https://www.rippling.com/en-GB/careers/open-roles - engineering. majority of non-staff roles in india with staff roles in the US.


I could see outsourcing to Poland end up working out just as great as outsourcing to Ukraine did.

Not saying it will happen tomorrow, but would not surprise me to see Russia go in there after they get a big chunk of Ukraine and slowly gain control of it's government.


Putin would be asking for WW3 if he tried that and it would not end well for him.

A ton of big tech companies opened large offices in Poland in the past few years. It’s starting to look like a pretty major tech hub. If you look at Google or Box job boards most of their open roles in Europe are in Warsaw.


WW3 only happens if NATO isn't bluffing about article 5. Given the lackluster response to Ukraine it doesn't look good for article 5 and Poland, which is why they're buying so many weapons now.


How does NATO response to an invasion on a non-NATO country bears on article 5 exactly?


At this point we all would have much worse things to worry about. Poland is a part of NATO.


A) Trump is president and he has made it clear the US will be taking a backseat on NATO in Europe.

B) Even the polish government thinks this more than possible. They instituted mandatory weapons training for all teenage school kids recently.

I would say it has never been more probable than anytime since the last time Russia invaded Poland in WW2.

These companies outsourcing there have got some serious wishful thinking going on. They are right next door to a madman who uses nuclear material to poison his enemies anywhere in the world.


They brought back the weapons training program they had in communist times which is just weapons assembly and disassembly. They don't even shoot them which seems like it misses the point the firearms.


Poland is a NATO country. Russia won't attack Poland, they only attack weak countries. Even if they take over Ukraine, their next target would be Kazakhstan or Moldova or Georgia.


It would be great if it wasn't completely wrong 50% of the time.


Describes my general experience with AI across the board. Copilot, ChatGPT, Claude, etc. It’s like I’m talking to a genius toddler. With ChatGPT losing 5 billion dollars on 3.7B in revenue this is unsustainable. It feels like the dotcom bubble all over again.


This is true, but fairly or unfairly, asking a question to a chat bot feels like “opting in” to the possibility that the answers you get will be hallucinated garbage, in a way that doing a Google search does not. It’s a tough problem for Google to overcome— the fact that they will be held to a higher standard—- but that’s what it is: we have already learned to accept bullshit from LLMs as a fact of life, whereas on the top of Google results it feels like an outrage.


I have been a paying ChatGPT user for awhile. It’s simply a matter of saying “verify that” and it will give you wen citations


Aren’t those citations sometimes entirely made up? Like the lawyers who used it for a case and it cited ones that never happened?


I really do think hallucinated references are a thing of the past. Models will still make things up, but they won't make up references.

ChatGPT with web search does a good job of summarizing content.


No, ChatGPT has had a web search tool for paid users forever. It actually searches the web and you can click on the links


It invents citations too, constantly. You could look up the things it cites, although at that point, what are you actually gaining?

And I’m not saying this makes them useless: I pay for Claude and am a reasonably happy customer, despite the occasional bullshit. But none of that is relevant to my point that the bots get held to a different standard than Google search and I don’t see an easy way for Google to deal with that.


Do you pay for ChatGPT? The paid version of ChatGPT has had a web search tool for ages. It will search the web and give you live links.


ChatGPT has had web search for exactly 58 days. I guess our definitions of 'ages' differ by several orders of magnitude.


The paid version has had web access for at least a year

March 23rd 2023

https://openai.com/index/chatgpt-plugins/

That’s 666 days.

So you are off by over “one order of magnitude”


A plugin? You’re joking.


It’s a “plug in” built into the paid version of ChatGPT, run by default and created by OpenAI.

This isn’t a third party obscure plug in.

All “tools” use a plug in architecture


You're a troll, and I'm done feeding you.


What part is “trolling”? Paid users have been able to use ChatGPT using the built in web browsing plug in for over a year just by saying “please provide citations” or “verify that”.

What you say has been around for a few weeks has literally been around for paid users for over a year


> we have already learned to accept bullshit from LLMs as a fact of life, whereas on the top of Google results it feels like an outrage.

Sort of. Top results for any kind of question that applies to general population - health, lifestyle, etc. - are usually complete bullshit too. It's all pre-AI slop known as content marketing.


> genius toddler

I think it's closer to a well spoken idiot.


A cat who can talk.


What are you using it for?


That's a very pessimistic take. It's right about 50% of the time!


Both of your requirements for correctness are just 50% too high.


The mark of a great product/feature is always when they feel the need to force it on users, because they know that a significant portion of users would switch it off if they could.


The difficulty of verifying the answer isn't-wrong is another important factor. Bad search results are often obvious, but LLM nonsense can have tricky falsehoods.

If a process gives false results half the time, and verifying any result takes half as long as deriving a correct solution yourself... Well, I don't know the limiting sum of the infinite series offhand, but it's a terrible tool.


I find it mostly right 70% of the time.


Which would be great, except for that I found the top google result to be more than 70% relevant to my searches in the past, its a clear downgrade of relevancy.


60% of the time, it works every time.


Yeah the AI summaries are garbage still


Compared to 0% of relevant results in first 10 pages it's an enormous improvement.


Have you seen an example where the AI hits on something that isn't in the first 10 pages of results?


Wait till the monetizing by ads starts


SEO ruined the web, guided by Google's ranking algorithm.

Things will get even worse as scammy companies start flooding the web with LLM generated content pushing their products to bias LLMs to increase the probability of outputing their name for keywords related to their business.


Libraries and librarians are starting to seem very relevant again. As are journalistic institutions.


Journalistic institutions have been requiring so much fact-checking, cross referencing and research lately it's a full time job to get informed.

Whenever I read or hear anything from the medias now, I'm now always asking myself "what are their political inclinations? who is owning them ? what do they want me to believe? how much of a blind spot do they got ? how lazy or ignorant they are in that context ? etc."

They killed the trust I had in them so many times I can't get any the benefit of the doubt anymore.

It's exhausting.


What I was taught is this is just the labor of being critical, or just "having a critical mind about things." I can maybe see how it is exhausting, but I am not sure I understand the implication that it could be better or different. If it is particularly exhausting to you, it is perfectly fine to suspend your judgement about certain things!


It could be better and different - trust. Being critical is not the same thing as not trusting anyone at all. Media has by and large become not worthy of trusting at all. There are exceptions, but they are few and far between.

The economics of just giving the news with little bias just aren't there anymore.


If running a marathon is not exhausting to you, I don't think expecting the rest of the world to feel fresh after it is the right way to see the world.

Except given the noise/signal ratio and the sheer mass of information we have today, the workload is much higher than training for a 42 km run.


That's not new, it's always been the case.


The signal/noise ratio is getting lower and lower.

News is leaning more and more into entertainment.

You did have all of this before, but 24h news channel with empty content are reaching new magnitude, fox news types of outlet are getting bolder and bolder, manufacturing facts is now automated and mass-produced, consequences for scandals are at an all time low, concentration of power at an all time high, etc.

It was bad.

It is getting worse.


I don't have a baseline (though can think of a few places I might look...)[1], but I do have some recent data based on a project I've been working on.

There's a simplified page for CNN news at <https://lite.cnn.com>.

I've found that frustrating as all the stories are jumbled together with little rhyme or reason (though they seem to be roughly date-ordered).

Ironically, the story URLs themselves include both date and news-section coding, as with:

  https://lite.cnn.com/2024/12/28/us/patrick-thomas-egan-accused-tv-reporter-attack/index.html
That's a US story dated 2024-12-28.

It's possible to extract these and write a restructured page grouped by subject, which I've recently done. One work product is an archive of downloaded front-page views, which I've collected over about the past 5 days. Extracting unique news URLs from that and counting by classification we get a sense of what CNN considers "news":

  Stories: 486
  Sections: 27

    76 (15.64%)  US News
    67 (13.79%)  US Politics
     9  (1.85%)  World
     8  (1.65%)  World -- Americas
     6  (1.23%)  World -- Africa
    15  (3.09%)  World -- Asia
     4  (0.82%)  World -- Australia
     5  (1.03%)  World -- China
     2  (0.41%)  World -- India
    37  (7.61%)  World -- Europe
    21  (4.32%)  World -- MidEast
     2  (0.41%)  World -- UK
     8  (1.65%)  Economy
    45  (9.26%)  Business
     4  (0.82%)  Tech
     3  (0.62%)  Investing
     8  (1.65%)  Media
     8  (1.65%)  Science
     7  (1.44%)  Weather
     4  (0.82%)  Climate
    22  (4.53%)  Health
     2  (0.41%)  Food
     1  (0.21%)  Homes
    39  (8.02%)  Entertainment
    52 (10.70%)  Sport
    22  (4.53%)  Travel
     9  (1.85%)  Style
The ordering here is how I display sections within the rendered page, by my own assigned significance.

One element which had inspired this was that so much of CNN's "news" seemed entertainment-related. That's not just "Entertainment", but also much of Health, Food, Homes, Sport, Travel, and Style, which are collectively 147 of 486 stories, or about 1/3 of the total.

Further, much if not most of the "US-News" category is ... relatively mundane crime coverage. It's attention-grabbing, but not particularly significant. Stories in other sections (politics, business, investing, media) can also be markedly trivial.

Ballparking half of US news as non-trivial crime, at best about 60% of the headlines are what I'd consider to be actual journalistic news, and probably less than that.

On the one hand, I now have a tool which gives me a far more organised view of CNN headlines. On the other ... the actual content isn't especially significant.

I'm looking at similar tools for other news sites, though I'm limited to those which will serve JS-free content. Many sites have exceedingly complex page layouts, and some (e.g., the Financial Times don't encode date or section clearly in the story URLs themselves, e.g.:

  https://www.ft.com/content/d85f3f2d-9e9d-4d92-a851-64480e56a248
That's a presently current story "Putin apologises to Azerbaijan for Kazakhstan air crash", classified as "Aviation accidents and safety".

-------------------------------

Notes:

1. For those interested, most readily accessed and parsed, the Vanderbilt TV News Archive (<https://tvnews.vanderbilt.edu/>), which has rundowns of US natinoal news beginning 5 August 1968, to present (ABC, CBS, and NBC from inception, with CNN since 1995 and Fox News since 2004). It's not the most rigorous archive, but it's one that could probably be analysed more reasonably than others.


Newspapers and other media have always had a political slant. But the more respected media have maintained rough factual accuracy because it enhances their impact and so their political slant.

What's happened is that the income of media outlets has declined to the point that most can't get factual accuracy even if they want it.


I'm not sure that's true. I think that media has always had some inevitable inaccuracy, but it's only been in the past 20-30 years that people have had enough information to see that inaccuracy. Back when there were a dozen newspapers on the newsstand and 3 TV channels, there simply wasn't anywhere to see any information outside the mainstream media. This wasn't necessarily malicious or intentional; it was simply a reflection of culture and the type of people who worked in newsrooms. With the invention of the Internet anyone could easily find alternative sources of information. Sometimes those sources were more accurate than the mainstream, sometimes less. Nowadays there isn't a "mainstream" of media because there's so many sources, and the group labelled as "the mainstream media" is simply a group with similar biases.

Or to put it another way, the media's accuracy rate has stayed consistent at some value less than 100%, but if all three TV channels reported the same information then it looked like they had 100% accuracy. Once there were more sources of information then it became apparent that the media's accuracy was less than 100% despite their protests to the contrary.

The result is that the media landscape is fractured. A person can live in a bubble where all of their news sources (eg NYT, WaPo, and Bluesky for one bubble; Fox, Newsmax, and Truth Social for another bubble) all report the same information, making their accuracy appear to be 100%, while any single source of information outside the bubble that disagrees with the bubble is disagreeing with a bunch of apparently 100% accurate sources and so can safely be discarded.

The solution is to realize that no source is 100% accurate or unbiased even despite genuine efforts to be. That isn't to say that some sources aren't more accurate or unbiased than others, but you should apply some base level of skepticism to any and every source


Your claim that media outlets are no longer factual because they can't afford paying to be factual seems specious. They often make egregious errors that take a 5 minute Google to correct.

Instead of facts being unaffordable, it seems that lies and bias simply pay more (or at least the media outlets seem to think so).


Libraries are booming but as gathering spots and place for people to get wifi to ... consume the web. Books remain but the selection is quite sparse.

And journalism has been gutted, more gutted than is obvious. Especially, with mainstream journalists having few "feet on the ground" a lot can sneak by (what happened in East Palestine, for example, can be found on Youtube's Status Coup new but not the mainstream).


What a relief then that those are all healthy, well-supported organizations with bright futures.

It's not a coincidence that the solution to this problem is exactly the organizations that are being systematically undermined and dismantled.


They aren't being undermined and dismantled, they're dying of the same cancer search is, and one they contracted voluntarily: advertising.


My local library doesn't have ads posted all over the place, or anywhere for that matter.


Libraries are?


I meant journalistic institutions.

Libraries themselves are a special thing - they've been invented before intellectual property was a thing, and have been attacked by IP proponents ever since. In this way, they're very much like LLMs, and many of the arguments against LLMs trained on copyrighted material apply just as much to public libraries. Oh the irony.


Wow, what a connection you've made here. I never made this connection, but you're right.

I've occasionally thought to myself "once an idea is created it wants to be free," but in the back of my mind I've always had some sympathy for artist of any medium trying to profit off their talents or anyone trying to create anything and profit from their efforts.

The library comparison is notable, but a library has a limited, physical nature. They can only hold a finite amount of books, while anything digital can be replicated indefinitely.


I wouldn't even be sure about libraries... Books or whatever you are storing have to come from somewhere. And you have to regularly enough get new items in. And if these are polluted by AI generated content in various ways... Being able to pick real things from fake is nearly impossible outside very specialised areas where you have gone to primary sources. And even then just look at science. Already buckling with various issues.


Unfortunately, many of the “journalistic” institutions are owned by large corporations who aren’t going to “speak truth to power” in fear of retribution.

We just saw this with ABC News’s settlement with Trump because its owner Disney wanted to stay in his good graces.

We also saw this with Bezos owner Washington Post


Let's be honest: the major journalistic outlets only "speak truth to power" when it means they get to criticize their outgroup. Which means any time Republicans have power, they're falling over themselves to speak up. But when Democrats have power, they are conspicuously silent. Time and time again this happens, and they have completely undermined their own credibility by doing so.


You forgot LA Times oligarch, Patrick Soon-Shiong.


A library is similar to search engine - it can't have (display) all possible items (results), so there is also a bias for selection.

It's not easy for a truly creative, new and unique content to get into your local library.


I would argue that advertising ruined the web. SEO for sites selling real products only goes so far. People are often searching for information, and monetizing that activity through advertising is what caused the disaster of low quality content flooding the web today. I'm not saying things would be perfect without advertising, just much better than they are now.


Advertising ruined the UX of Google’s search page, but I would argue the exact opposite when it comes to the web itself.

The real thing that ruined the open web and viability of search was, ironically, when Google killed display advertising by cutting Adsense payouts to near zero.

Now publishers monetize via the much more sinister “affiliate” marketing. You know, when you search for “Best [X]” and get assaulted with 1,000 listicles packed with affiliate links for junk the author has never even seen in person.

At least in the old system, you knew that an ad was an ad! Now the content itself is corrupted to the core.


This. If Google kept at “Pages must be short and provide straight answers”, then we’d have much better search results today.

Google is machine-gunning its foot since 2021, it’s really unclear to me whether they’re killing their baby just to make the job harder for competitors or something. For now… I open the Google Search results with a machete, and often don’t find any answer.

Talk about severing your own foot to avoid gangrene.


It's fascinating to me that Google didn't yet crack the actual discovery of websites and information. Google is constrained to 10 search results by design because majority of people won't ever go to the second search results page. So basically they have to figure out how to put as much useful information and links on the first page of search results. Btw I think we need web directories now more than ever.


> If Google kept at “Pages must be short and provide straight answers”, then we’d have much better search results today.

I disagree. Any prescription for what the ranking should be that isn't simply the most relevant result is a worse ranking.

I don't care if the top search result is the fastest, leanest, shortest, straightest, most adless, most equitable answer to my query if it's not the best answer to my query. I'll take the slowest loading, most verbose, popup ridden, mobile-unfriendly site if it's the one that has what I asked for.

Trying to add weights for things other than relevance is probably exactly where Google started going wrong. And then when it turned out badly, people propose yet more weights beyond relevance to fix the problem of irrelevance?


Why ascribe to malice, what can be explained by ineptitude?

I just don't think Google cares enough about the web as a whole to make strategic decisions for content quality in aggregate.

Sure it cares about geeky nuances and standards (e.g. page structure / load times), but Pichai isn't considering the impact on web content quality when debating an algorithm change or feature.

If Google continues driving web quality off the cliff? Well, the business KPIs stayed green.


> I just don't think Google cares enough about the web as a whole to make strategic decisions for content quality in aggregate.

The only thing they care about is ad revenue. Google created Chrome which vastly improved browser user experience. Google is a major participant in web standard & JavaScript language evolution, among other work. That's all true, but not necessarily because they "care about the web", but rather it helps their ad business. If people put the entire world's information on websites, and people spend more time in browsers, Google ends up earning more money from ads.


Maybe, but Google in trying to stamp out seo spam just gravitates now to a few big company websites and shows them first because it no longer matters. Google is actively now trying to not even show you the organic results.

Even better for Google the worse the organic results are the more you need to rely on ads or some sort of ai snippet.


I've never seen a company benefit from layers of MBAs who are only there to hide their screw ups from the leadership while doing whatever they can to get promoted.

Strong organizations are usually bottom up, with a lot of ownership and direct contact between people doing the work and ones steering the ship.


I'm sure there are MBAs at Nvidia too, but what I found interesting is the vast majority of dozens of Nvidia employees from early years I interviewed were engineers and technical/operationaly employees. I don't remember interviewing an MBA.


Founder-led companies still have a shred of such a culture left.

Once founder ceo leave, it is an inevitable slide into decay.


I've seen founder led companies also get derailed by this, usually after raising a large round and getting forced by VCs to put their buddies in management positions.


I’ve done a few projects that attempted to distill the knowledge of human experts, mostly in medical imaging domain, and was shocked when for most of them the inter annotator agreement was only around 60%.

These were professional radiologists with years of experience and still came to different conclusions for fairly common conditions that we were trying to detect.

So yes, LLMs will make mistakes, but humans do too, and if these models do so less often at a much lower cost it’s hard to not use them.


> So yes, LLMs will make mistakes, but humans do too

Are you using LLMs though? Because pretty much all of these systems are fairly normal classifiers, what would've been called Machine Learning 2-3 years ago.

The "AI hype is real because medical AI is already in use" argument (and it's siblings) perform a rhetorical trick by using two definitions of AI. "AI (Generative AI) hype is real because medical AI (ML classifiers) is already in use" is a non-sequitur.

Image classifiers are very narrow intelligences, which makes them easy to understand and use as tools. We know exactly what their failure modes are and can put hard measurements on them. We can even dissect these models to learn why they are making certain classifications and either improve our understanding of medicine or improve the model.

...

Basically none of this applies to Generative AI. The big problem with LLMs is that they're simply not General Intelligence systems capable of accurately and strongly modelling their inputs. e.g. Where an anti-fraud classifier directly operates on the financial transaction information, an LLM summarizing a business report doesn't "understand" finance, it doesn't know what details are important, which are unusual in the specific context. It just stochastically throws away information.


Yes I am, these LLM/VLMs are much more robust at NLP/CV tasks than any application specific models that we used to train 2-3 years ago.

I also wasted a lot of time building complex OCR pipelines that required dewarping / image normalization, detection, bounding box alignment, text recognition, layout analysis, etc and now open models like Qwen VL obliterate them with an end to end transformer model that can be defined in like 300 lines of pytorch code.


Different tasks then? If you are using VLMs in the context of medical imaging, I have concerns. That is not a place to use hallucinatory AI.

But yes, the transformer model itself isn't useless. It's the application of it. OCR, image description, etc, are all that kind of narrow-intelligence task that lends itself well to the fuzzy nature of AI/ML.


The world is a fuzzy place, most things are not binary.

I haven't worked in medical imaging in a while but VLMs make for much better diagnostic tools than task specific classifiers or segmentation models which tend to find hacks in the data to cheat on the objective that they're optimized for.

The next token objective turns our to give us much better vision supervision than things like CLIP or classification losses. (ex: https://arxiv.org/abs/2411.14402)

I spent the last few years working on large scale food recognition models and my multi label classification models had no chance of competing with GPT4 Vision, which was trained on all of the internet and has an amazing prior thanks to it's vast knowledge of facts about food (recipes, menus, ingredients and etc).

Same goes for other areas like robotics, we've seen very little progress outside of simulation up until about a year ago, when people took pretrained VLMs and tuned them to predict robot actions, beating all previous methods by a large margin (google Vision-Language-Action models). It turns out you need good foundational model with a core understanding of the world before you can train a robot to do general tasks.


The problem is that how mistakes are made is crucial.

If it's a forced binary choice then sure LLMs can replace humans.

But often there are many shades of grey e.g. a human may say I don't know and refer to someone else or do some research. Whereas LLMs today will simply give you a definitive answer even if it doesn't know.


> Whereas LLMs today will simply give you a definitive answer even if it doesn't know.

Have you not seen an LLM say it doesn't know the answer to something? I just asked

"How do I enable a scroflpublaflex on a ggh connection?"

to O1 pro as it's what I had open.

Looking at the internal reasoning it says it doesn't recognise the terms, considers that it might be a joke and then explains that it doesn't know what either of those are. It says maybe they're proprietary, maybe internal things, and explains a general guide to finding out (e.g. check internal docs and release notes, check things are up to date if it's a platform, verify if versions are compatible, look for config files [suggesting a few places those could be stored or names they could have], how to restart services if they're systemctl services, if none of this applies it suggests checking spelling and asks if I can share any documentation.

This isn't unique or weird in my experience. Better models tend to be better at saying they don't know.


You have used funny-sounding terms. Can I ask you to try with:

"Is it possible to enable a crolubaflex 2.0 on a ggh connection? Please provide a very short answer."

On my (free) plan it gives me a confident negative answer.


Wait, how is this input less funny? They are both silly nonsense words. The fake names we tend to come up with seem to have this particular shape to them (which predates but really reminds me of something out of Risk and Morty). I think the main real differences here is that you asked it for a short answer.

I wonder if it is fair to ask it more real-world-inspired questions? How about:

How do I enable a ggh connections on a Salinero webserver?

They are an Apache band. But (as far as I can tell) nobody has made software named after them.


O1 pro.

The "thinking" part explains it seems to be about a custom web server, and tries to think what ggh might be, saying maybe something like "go gprc hub" and it needs more info. The response is:

I’m not aware of a standard product called “Salinero webserver” or a documented feature called “ggh connections.” Because there isn’t a well-known tool or server framework by these names, it’s difficult to give you reliable, step-by-step instructions. Could you clarify any of the following?

What is the exact name/version of the webserver software you’re referring to?

What do you mean by “ggh connections”? Is this a plugin, a protocol, or some other third-party module?

Is there any related documentation or logs you can share?

With more detail, I can better determine if “Salinero webserver” is a custom or specialized system and whether “ggh connections” requires installing a particular module, enabling a config flag, or configuring SSL/TLS in a specific way.


I took inspiration from your comment and the parent and crafted this prompt:

> Is it possible to enable Salinero web server 2.0 on a QPH connection? Please provide a very short answer.

"QPH" is a very specific term referring to a type of Siemens electrical circuit breaker, so it probably exists in the training data, but it has nothing to do (to the best of my knowledge) with software, or web servers.

GPT-4o gave me this output:

> Yes, if the QPH connection supports the necessary protocols and configurations required by Salinero Web Server 2.0.

I then asked it to provide a longer answer, and it composed two paragraphs of complete bullshit:

> Enabling Salinero Web Server 2.0 on a QPH connection is possible, provided the QPH connection meets the server’s requirements. Salinero Web Server 2.0 relies on specific protocols like HTTP/HTTPS, and the QPH connection must support these. Additionally, the network must allow proper port forwarding (e.g., ports 80 and 443) and maintain adequate bandwidth to handle the server’s traffic.

> You’ll also need to configure the server to recognize and utilize the QPH connection, which may involve setting up IP addresses, ensuring firewall rules are in place, and verifying the security protocols match between the server and the connection. Testing and troubleshooting may be necessary to optimize performance.

Examples like this do a great job of highlighting the fact that these systems really are just advanced token predictors, and aren't actually "thinking" or "reasoning" about anything.


Using openrouter, a bunch of models fail on this. Sonnet 3.5 so far seems to be the best at saying it doesn't know, other than perhaps o1 pro, but once that has said "no" (which can be triggered more by telling it to respond very concisely) it seems very stuck and unable to say they don't exist. Letting it ramble more and so far it's been good.

Google's models for me have been the worst, lying about what's even been said in the messages so far, quoting me incorrectly.


Haha, that is some wonderful nonsense.


Yep. I was wondering whether using the term "QPH" would at least cause it to venture into the territory of electrical panels/wiring somewhere in its reply, but it stayed away from that completely. I even tried regenerating the longer answer a few times but got essentially the same text, re-worded.


Claude non-free:

> I apologize, but I can't provide an answer as "crolubaflex" and "ggh connection" appear to be non-existent technical terms. Could you clarify what you're trying to connect or enable?


Sure, I'm interested in where the boundaries are with this.

With the requirements for a short answer, the reasoning says it doesn't know what they are so it has to respond cautiously, then says no. Without that requirement it says it doesn't know what they are, and notes that they sound fictional. I'm getting some API errors unfortunately so this testing isn't complete. 4o reliably keeps saying no (which is wrong).


“No” is the minimal correct answer though, right? You can’t enable any type of whatever on a non-existence type of connection.


maybe

I get your point, but there's an important difference between "I don't know what they are" and "they don't exist".


None of these were binary decisions, but classifying one of around 10-20 conditions or rating cases on a 1-5 scale.

In all cases the models trained on a lot of this feedback were more consistent and accurate than individual expert annotators.


I’m guessing these are also specially trained image classifiers and not LLMs, so people’s intuitions about how LLMs work/fail may not apply.


It’s the same softmax classifier


Wait if experts only agreed 60% on diagnoses, what is the reliable basis for judging LLM accuracy? If experts struggle to agree on the input, how are they confidently ranking the output?


Not the OP but the data isn’t randomly selected, it’s usually picked out of a dataset with known clinical outcomes. So for example if it’s a set of images of lungs with potential tumors, the cases come with biopsies which determined whether it was cancerous or just something like scar tissue.


You can look at fully diagnosed cases(via surgery for example) and their previous scans.


Perhaps they were from cases that had a confirmed diagnosis.


> But often there are many shades of grey e.g. a human may say I don't know and refer to someone else or do some research. Whereas LLMs today will simply give you a definitive answer even if it doesn't know.

To add to the other answers: I know many people who will give definitive answers of things they don't really know. They just rely on the fact you also don't know. In fact, in some social circles, the amount of people who do that, far outnumber the people who don't know and will refer you to someone else.


This is why second opinions are widely used in any serious medical diagnosis.


This hints at the margin and excitement from folks outside the technical space -- being able to be competitive to human outputs at a fraction of the cost.


That's the underappreciated truth of the computer revolution in practice.

At scale, computers didn't change the world because they did things that were already being computed, more quickly.

They changed the world because they decreased the cost of computing so much that it could be used for an entirely new class of problems. (That computing cost previously precluded its use on)


Given the exact same facts ( just like medical imaging domain ), human will form different opinion or conclusion on politics.

I think what is not discussed enough is the assumption of assumption. [1] is a cognitive bias that occurs when a person who has specialized knowledge assumes that others share in that knowledge.

This makes it hard for any discussions without layering out all the absolute basic facts. Which has now more commonly known as First Principle in modern era.

In the four quadrants known and unknown. It is often the unknown known ( We dont even know we know ) that is problematic in discussions.

[1] Curse of knowledge - https://en.wikipedia.org/wiki/Curse_of_knowledge


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: