Hacker News new | past | comments | ask | show | jobs | submit | bashfulpup's comments login

Long horizon problems are a completely unsolved problem in AI.

See the GAIA benchmark. While this surely will be beat soon enough, the point is that we do exponentially longer horizon tasks than that benchmark every single day.

It's very possible we will move away from raw code implementation, but the core concepts of solving long horizon problems via multiple interconnected steps are exponentially far away. If AI can achieve that, then we are all out of a job, not just some of us.

Take 2 competing companies that have a duopoly on a market.

Company 1 uses AI and fires 80% their workforce.

Company 2 uses ai and keeps their workforce.

AI in its current form is a multiplier, we will see company two massively outcompete the first as each employee now performs 3-10 people's tasks. Therefore, Company two's output is exponentially increased per person. As a result, it significantly weakens the first company. Standard market forces haven't changed.

The reality, as I see it, is that interns will now be performing at Senior SWE, senior SWE engineers will now be performing at VP of engineering levels, and VP's of engineering will now be performing at nation state levels of output.

We will enter an age where goliath companies will be common place. Hundreds or even thousands of mega trillion dollar companies. Billion dollar startups will be expected almost at launch.

Again, unless we magically find a solution to long horizon problems (which we haven't even slightly found). That technology could be 1 year or 100 years away. We're waiting on our generation's Einstein to discover it.


Companies that have no interest in growth and are already heavily entrenched have no purpose for increasing output though. They will fire everyone and behave the same?

On the other hand that means they are weaker if competition comes along as it's expected that consumers and business would demand significantly more due to comparisons.


AI doesn't have institutional knowledge and culture. Company 2 would be leveraging the use of LLMs while retaining it's culture and knowledge. I imagine the lack of culture is appealing to some managers but that is also one of it's biggest weaknesses.


Pythia is stupidly easy to use.

Then hookup a simple test harness. - this is like a grand total of 3 commands - git pull, install, point and run a model


I love this, do tell the direction to be nudged in.

I wish to experience this new level of understanding.


Nutrition is one place where you can start, because it is possible to do some practical experiments.

But, be warned, you will easily spend a decade investigating it.

I picked the subject, because I'm sure that in the end you will have achieved something of great value: your own relatively better health. (I said relatively, because diet is only one of the factors that influences health, although I say it's an important one)

Expanding on the sibling comment by 'mistercheph':

1) Start on a blank slate. have no notion about the subject. This is easier said than done . Un-learning is way more difficult than learning , even for the simplest of topics.

2) Read really old literature on the subject, the likes of Aristotle, the alternative is to listen to people on the internet who have read old literature. Initially do some cross checking to make sure that these people are indeed saying the truth by actually checking the sources they cite. Progressively read the literature to relatively modern times, say up until 100 years back. Many research papers that are over 100 years old are very readable compared to the current ones even for lay people.

3) Experiment on yourself. Come up with your own observations on what is good and bad.

4) now explore the conventional knowledge on the subject. For x amount of time spend on gleaning conventional knowledge, listen to the (1)heretic who takes the opposite stand for x amount of time who says otherwise.

5) do several iterations of steps 1 to 4.

6) Form your own opinion after a decade.

(1)= the older credentialed heretic is a good bet. By speaking against the guild that he/she is affiliated with he/she has got a lot to lose : his/her livelihood.

Good luck!


> Experiment on yourself

No control group?

> Form your own opinion after a decade

After a decade, differences in your body will mostly be because you are 10 years older.

Such experiments are largely pointless because the only people doing them are people who care about their health. You are more likely to be healthy because of the totality of your life choices, not the specific things you do in the diet experiment.


>No control group?

Nope. Start simple. ( if you can do a full blown experiment with multiple people, by all means go for it)

>After a decade, differences in your body will mostly be because you are 10 years older.

Correct. (did, you think I did not know that?)

>Such experiments are largely pointless

I leave you to decide that, given there are no well controlled experiments in mainstream nutrition, we are left with imperfect choices. (There are small experiments that are conducted by either small groups or individuals, that are pretty high quality IMO).


Let me guess. All of this is to say that you leaned into eating saturated fat, got high cholesterol, and because nothing happened in 10 years, the converging lines of evidence that constantly replicate over a half century are wrong.

Or maybe you started smoking and because there were no RCTs on smoking, then nobody can actually know if it's bad for you, but your N=1 has more epistemic value because you feel okay.

Just getting flashbacks from those hokey "carnivore diet" videos that Youtube keeps wanting me to watch.


lol 'smolder' I saw your comment before you deleted it.


There's two approaches here:

Pick one phenomenon in the world that you observe and don't have an account for, and try to come up with an account, assuming nothing except for your own observation and experimentation, of the causes of the phenomenon. Once you're done, follow the trail, reading only original or translated original documents, of the history of human descriptions of the phenomenon, do the "science" of "science" by observing the phenomena of observing and describing phenomena.

Go in the woods and read Plato and Aristotle and Sophocles for a year.

:D


There is little to no research that shows modern AI can perform even the most simple long-running task without training data on that exact problem.

To my knowledge, there is no current AI system that can replace a white collar worker in any multistep task. The only thing they can do is support the worker.

Most jobs are safe for the forseable future. If your job is highly repetitive and a company can produce a perfect dataset of it, I'd worry.

Jobs like a factory worker and call center support are in danger. But the work is perfectly monitorable.

Watch the GAIA benchmark. It's not nearly the complexity of a real-world job, but it would signal the start of an actual agentic system being possible.


I’d argue the foreseeable future got a lot shorter in the last couple years.


This is true if you don't know what you're doing, so it is good advice for the vast majority.

Fine tuning is just training. You can completely change the model if you want make learn anything you want.

But there are MANY challenges in doing so.


This isn't true either, because if you don't have access to the original data set, the model will overfit on your fine tuning data set and (in the extreme cases) lose its ability to even do basic reasoning.


Yes. It's called "catastrophic forgetting". These models were trained on trillions of tokens and then underwent a significant RLHF process. Fine tuning them on your tiny data set (relative to the original training data) almost always results in the model performing worse at everything else. There's also the issue of updating changed information. This is easy with RAG - replace the document in the repository with a new version and it just works. Not so easy with fine tuning since you can't identify and update just the weights that were changed (there's research in this area but it's early days).


Again, that's why I said it is challenging.

I regularly do fine tuning on a model with fine results and little damage to the base functionality.

It is possible, but it's too complex for the majority of users. It requires a lot of work per dataset you want trained on.


A CL agent is next generation AI.

When CL is properly implemented in an LLM agent format, most of these systems vanish.


Already a big thing. See the constellation architecture used here:

https://arxiv.org/html/2403.13313v1


I looked at the website and have no idea how Arc is supposed to be AGI.

Can someone explain?


It is necessary but not sufficient.

If you can't do ARC, you aren't general enough. But even if you can do ARC, you still might not be general enough.


It's also possible that you are an AGI and simply cannot pass ARC.


How so? If there is a task that humans can do but the AI cannot, I would not call it AGI. But that's just my definition.


Yeah but if my brother can't pass it, that doesn't mean he is NOT human.


Could he pass it if he was educated to do the task from birth? Human level intelligence includes being able to be educated, the ML models we have done so far can't be educated so have to match the level of educated humans to compare.

General intelligence as we know it requires ability to receive education.


I said AGI. I did not say human.


Isn't chatGPT already proven to be smarter than many of us in many ways?


Its not a test of AGI. It tests whether you possess innate human capacities: rudimentary arithmetic & geometry, etc. Most of the problems were created manually. The original paper states that they limited the test to innate human priors to make the scope well-defined.


The biggest issue the author does not seem aware of is how much compute is required for this. This article is the equivalent of saying that a monkey given time will write Shakespeare. Of course it's correct, but the search space is intractable. And you would never find your answer in that mess even if it did solve it.

I've been building branching and evolving type llm systems for well over a year now full time.

I have built multiple "search" or "exploring" algorithms. The issue is that after multiple steps, your original agent, who was tasked with researching or doing biology, is now talking about battleships (an actual example from my previous work).

Single step is the only real situation search functions work. Mutli step agents explode to infinite possibilities very very quickly.

Single step has its own issues, though. While a zero shot question run 1000 times (eg, solve this code problem), may help find a better solution it's a limited search space (which is a good thing)

I recently ran a test of 10k inferences of a single input prompt on multiple llm models varying the input configurations. What you find is that an individual prompt does not have infinite response possibilities. It's limited. This is why they can actually function as llms now.

Agents not working is an example of this problem. While a single step search space is massive, it's exponential every step the agent takes.

I'm building tools and systems around solving this problem, and to me, a massive search is as far off as saying all we need 100x AI model sizes to solve it.

Autonomy =/ (Intelligence or reasoning)


Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: