Seems like a nothing story. Just looking at the game, there's obviously a constant decision to be made of chase more sheep or instantly die. It sounds like in the original model they had a max of 20 seconds, so it's not surprising that you would just tank your losses to maximize your score every now and then.
Anyone who tries to devise optimal strategies for things should be able to see this isn't especially interesting.
Social metaphors are wildly out of place.
They say "unintended consequences of a blackbox" but I doubt that's true. Make it a deterministic turn based game and run it through a perfectly transparent optimization model and I wouldn't be surprised to learn this was just the best strategy for the rules they devised. I really hate when people describe an ai as something that cannot be understood because they personally don't understand it.
Exactly, from technical perspective it's a nothing story.
It's interesting, though, how strong of a reaction general public had to this. The story must have strongly resonated with what some folks were already feeling. When you squint (pretend to understand the technology not at all) it's a tragic story. The situation of the wolf seems similar to the situation of some people. Chasing their careers in a highly structured, sort of dehumanized, environment of constant pursuit. "Supreme Intelligence" (that's what a layperson may think of AI) looks at a situation of the wolf and decides that it makes no sense to continue the pursuit. Moreover, what is "optimal" is the most tragic result - suicide.
I'd have loved to have been arond to a Dead show! I know it sounds a little ungrateful coming from someone who lives in a period of unprecedented access to all kinds of wonderful music being written all the time, but there's something about the Dead that really connects with me that I can't quite put my finger on.
> Perhaps the true lesson to be learnt here isn’t about helplessness and giving up. It’s about getting up, trying again and again, and staying with the story till the end.
I find the possibility of contrasting interpretations absurd. The problem with using any dead matter for our meaning making needs is it is ultimately a self-referential justification for how we think we should feel, while being equally or even more prone to self deception traps.
AI being the object is irrelevant here, this is nothing different than astrology or divination from tea leaves etc. It is 2000 BC level religious thinking with new toys.
Any programmer would have seen the issue and made the change about rewarding suicide.
The ONLY reason this was written was because the researches hired a programmer to make a specific thing, then is was too expensive for them to make more changes so they published the mistake.
Exactly. It is a social commentary story where a result from a student's project was a lucid analogy of the plight of their lived rat-race in modern China, with the lesson being: Cut your losses and lie flat. To those within ML field, this is less than new, but as a commentary on how such ML issues can be a teachable and easily understood analogy to people's lives certainly makes the story interesting to me.
Spot on. Funny results from poorly specified AI experiments have cropped up since the 90's. But the interestingly angel her is how this one came out of nowhere at the right time and resonated with young working class Chinese.
> Exactly, from technical perspective it's a nothing story.
I think that one thing it points to is how technology can discover novel iterations on a system. Imagine if this was a system modeled around a network and the agent was trying to figure out how to get from the outside to read a specific system asset. With the right (read: very detailed) modeling you could create a pentesting agent.
Similarly I've seen A LOT of people posting stories about "chat bot exposed to internet started praising Hitler and became racist/sexist/antisemitic" as a proof that "supreme intellect sees through leftist political correctness and knows that alt-right is correct about everything".
It's really not that deep, people will always find sport in scandalising people with a stronger disgust reaction than themselves. It's more a new way of teaching a parrot to say "fuck" rather than a heartfelt statement of political belief in my opinion.
Shrug. Another way to frame this is a poker bot learned to fold when given a bad hand, and they only gave it the same bad hand.
Yes, yes, woe is the individual in modern capitalist society but the only reason people are reacting to this are that they don't understand it and they've been told it's something much more emotionally impactful than it actually is.
>but the only reason people are reacting to this are that they don't understand it
I think it's much more likely that they're reacting like this because they see their own plight in the wolf. It doesn't matter why the wolf killed itself, it became a meme that allowed many Chinese to reflect together on a common plight.
I think there's a bit more to the analogy than just the suicidal wolf, though. The wolf is offing itself to minimize loss because there's no clear path to a better outcome.
This seems like a common refrain when we see radicalized engineering students from less-developed countries, who are notably common in extremist groups. They're people on a very difficult path (an engineering program!) with no real path to success (living in a society where unemployment for people with degrees is very high). Cost for continuing on the path is high, and there's no obvious path to get the good outcomes.
Yes you are quite right. The social media reactions did not suggest an attitude of suicide at all. It was more of laid back life instead of meeting expectations and attaining so-called success.
It's not surprising from the perspective of an "AI actor". But if you call it a "wolf", most people will assume that it will behave at least roughly like a real-world creature, and the self-preservation instinct is one of the most basic traits of all living beings, so the "AI wolf" not having that is indeed surprising for a layperson.
"I really hate when people describe an ai as something that cannot be understood because they personally don't understand it."
On the other hand, keep in mind that a significant weakness of most modern AI research is that it's extremely difficult to understand: you have the input, the output, and a bag of statistical weights. In the story, you know the (trivially bad) function that is being optimized; in general you may not. It's not without implications for other systems.
Further,
"At the end of the day, student and teacher concluded two things:
"* The initial bizarre wolf behavior was simply the result of ‘absolute and unfeeling rationality’ exhibited by AI systems.
"* It’s hard to predict what conditions matter and what doesn’t to a neural network."
The tooling for understanding complex models is a lot better than what most people assume.
> The initial bizarre wolf behavior was simply the result of ‘absolute and unfeeling rationality’ exhibited by AI systems.
This is a bad quote. They should not say this. It's a poorly trained agent doing a decent job of a poorly defined environment. Absolute rationality conjures images of some greater thinking but its actually a really stupid model that hit a local maxima. Calling it unfeeling implies the model has some concept of "wolf" and "suicide" but it does not. Replace the visuals with single pixel dots if you want an honest depiction of the room for feelings.
> It’s hard to predict what conditions matter and what doesn’t to a neural network."
If we play the analogy further: life is suffering, apart from the brief ecstasy of eating sheep. The AI was trying not to suffer, thus chose the boulder.
Did my best to translate the (misguided) fitness function to fiction.
David Benatar reached a similar philosophic conclusion due to his utilitarian views, which was amusingly put (with a sort of AI present, no less) in this webcomic: https://existentialcomics.com/comic/253
It's good because most people can understand it. I'd say it's a perfect strategy for a game, but if they're using evolutionary algorithms they should require some form of reproduction for the wolves to carry on. That would make the suicide strategy fail to propagate well. I can also see a number of possible strange outcomes even then.
You're conflating the evolution of the strategy with the idea of the evolution of the actor being controlled by the agent. To give an obvious example, if dying gave 100 points instead of subtracting 10, even the dumbest evolutionary algo would learn to commit suicide asap. The survival of the actor has no intrinsic relevance to how the evolution develops.
What mechanism are you thinking of? One in which having offspring is rewarding and so enters into the same learning algorithm, or one in which the learning algorithm/action selection is evolved and differentially conserved?
If I remember correctly there were similar scenarios that would occur using that popular Berkeley Pacman universe where he would run into a ghost to avoid the penalty of living for too long.
The example you're thinking of is actually in gridworld [1]. As you allud to, one of the parameters of the model is the cost of simply being alive for an additional time-step. If the cost is negative (a reward), then the agent will just sit there forever and accumulate infinite points. If it is zero, it might still just sit there to avoid falling into the hole, which has a large penalty and ends the simulation. As you turn up the dial on the cost of living, the agent starts using more and more aggressive strategies to reach the goal quickly. But if you make it too big, it will just jump in the hole.
As part of my PhD research, I created a simplified Pac-Man style game where the agent would simply try to stay alive as long as possible whilst being chased by the 3 ghosts. The agent was un-motivated and understood nothing about the goal, but was optimising for maximising its observable control over the world (avoiding death is a natural outcome of this).
I spent sometime trying to debug a behaviour where the agent would simply move left and right at the start of each run, waiting for the ghosts to close in. At the last minute it would run away, but always with a ghost in the cell right behind it.
Eventually, I realised this was an outcome of what it was optimising for. When ghosts reached cross-roads in the world they would got left or right randomly (if both were same distance to catching the agent). This randomness reduced the agent's control over the world, so was undesirable. Bringing a ghost in close made that ghost's behaviour completely predictable.
Yet another similar story. A side project of mine was building a rudimentary neural network whose weights were optimized via a genetic algorithm. The goal was operating top-down, 2D self-driving cars.
The cars' "fitness" function rewarded cars for driving along the course and punished them for crashing into walls. But evidently this function punished a little too severely: the most successful cars would just drive in tight circles and never make progress on the course. But they were sure to avoid walls. :)
For example in EVE Online with a 1v1 fight two basic tactics are either Kite or Brawl. A kiter that can maintain range will beat a brawler. But a brawler that 'catches' a kiter will generally win.
Another similar story, I remember reading about an AI that simply paused the game when it was about to die. I can actually remember doing something similar as a child.
No need to be accusatory. The stories are different, just the learned behavior is the same. And not very surprising, considering your story was pre-empted by Pac-Man speedrunners, who already discovered this technique, which they call "kiting".
The method was called 'empowerment'. Two ways to explain it...
From a mathematical perspective, we used Information Theory to model the world as an information theoretic 'loop'. The agent could 'send' a signal to the world by performing an action, which would change the state of the world; the state of the world was what the agent 'received'. This obviously relies on having a model of the world and what your actions will do, but doesn't burden the model with other biases.
Pore more colloquially, the agent could perform actions in the world, and see the resulting state of the world (in my case, that was the location of the agent and of the ghosts). Part of the principle was that changes you cannot observe are not useful to you.
In an active inference approach you would have the agent minimise surprisal. Choose the action that is most likely to produce the outcome you predicted.
The approach I used was similar. The idea of maximising observed control of the world means you seek states where you can reach many other states, but _predictably_ so. This comes 'for free' when using Information Theory to model a channel.
Do you have any reading you'd recommend related to this?
I naively thought it would be some kind of Kalman filtering of sorts but from what I gather in your words it doesn't even have to be "that" complicated, right?
What's the tradeoff between "delete all state in the world with 100% certainty" and "be able to choose any next state of the world with (100-epsilon)% certainty"?
In Information Theory, there is a concept of Channel Capacity. If a channel is defined as the probability of the output being s if you send a, across all possible values of a, then the Channel Capacity is the maximum amount of information you can communicate across this channel, measured in bits.
To achieve the Channel Capacity you need to find the optimum distribution across a - i.e. what set of signals maximises the information you can transmit on this channel. There are known algorithms for finding this distribution (e.g. Blahut-Arimoto).
Now if you model the world as a channel, where s represents the reachable states and a represents the actions the agent can take (and the channel, P(s|a), represents the dynamics of the world), you can calculate what actions allow you maximal control (in terms of states you can controllably reach).
A while ago, a very simple agent I made had to do tasks in the maze and evaluate strategies to reach them. I wanted it to have no assumptions about the world, so it started with minimum knowledge. Its first plan was to try to remove walls, to get to the things it needed.
It is a fun feeling when your own program surprises you.
It can depend on what the agent "sees" and how many time-steps away the "consequences" are. If the ghosts are so far away that any action will take t time-steps before consequences to the agent, the actions are pseudo-random because there is no reward to optimize on.
The number of outcomes in branching_factor^t (very large) makes the action-values at t=0 (where the agent chooses between two/three actions) almost uniform random.
I experimented with different time horizons, mostly look 3-7 steps ahead.
In terms of the 'reward', that was implicit within the model - if the ghosts caught you, your ability to influence the state of the world dropped to 0.
In it, there is a thought experiment of having an "Outcome Pump", a device that makes your wishes come true without violating laws of physics (not counting the unspecified internals of the device), by essentially running an optimization algorithm on possible futures.
As the essay concludes, it's the type of genie for which no wish is safe.
The way this relates to AI is by highlighting that even ideas most obvious to all of us, like "get my mother out of that burning building!", or "I want these virtual wolves to get better at eating these virtual sheep", carry incredible amount of complexity curried up in them - they're all expressed in context of our shared value system, patterns of thinking, models of the world. When we try to teach machines to do things for us, all that curried up context gets lost in translation.
Interesting essay. I think the big blind spot for humans programming AI is also the fact that we tend to overlook the obvious, whereas algorithms will tend to take the path of least resistance without prejudice or coloring by habit and experience.
Yes. What I like about AI research is that it teaches us about all the things we take for granted, it shows us just how much of meaning is implicit and built on shared history and circumstances.
The difficult, but in many ways rewarding, core of that is that it forces you to finally figure out what you actually want, because the computer won't accept anything except perfect clarity.
> Suppose we have an AI whose only goal is to make as many paper clips as possible. The AI will realize quickly that it would be much better if there were no humans because humans might decide to switch it off. Because if humans do so, there would be fewer paper clips. Also, human bodies contain a lot of atoms that could be made into paper clips. The future that the AI would be trying to gear towards would be one in which there were a lot of paper clips but no humans.
There is a wonderful little game based on this concept called universal paperclips. The AI eventually consumes all the matter in the universe in order to turn it into paperclips.
Aesop managed to make the point a lot more concisely: "Be careful what you wish for, lest it come true." (Although now that I look, I don't think that's a translation of any specific part of the text.)
Yes, but that moral is attached to a story. Morals and saws work as handles - they're useful for communication if both you and your interlocutor know the thing they're pointing to. Conversely, they are of little use until you read the story from which the moral comes, or personally experience the thing the saw talks about.
Eliezer Yudkowsky tells a long story about an Outcome Pump. Aesop tells a short story about an eagle and a tortoise. The point made is the same, as far as I can see.
Eliezer tells the story that elaborates on why you should be careful what you wish for. Of about a dozen versions of the Eagle and Tortoise story I've just skim-read, none of them really has this as a moral - in each of them, either the Eagle or a Tortoise was an asshole and/or liar and/or lying asshole, so the more valid moral would be, "don't deal with dangerous people" and/or "don't be an asshole" and/or "don't be an asshole to people who have power to hurt you".
I think a major takeaway here is that balancing a reward system to reward more than a single behavior is really hard - it's easy to tip the scales so one behavior completely dominates all others. It's an interesting lens to use to look at the heuristic reward system humans have built in (hunger, fear, desire, etc). This tends to have an adaptation/numbing effect, where repeated rewards of the same type tend to have diminishing returns, and that makes sense because it protects against "gaming the system" and going for one reward to the exclusion of all others.
Evolution works in an incredibly complex "fitness landscape," where certain minor tweaks in phenotype or behaviors can affect your fitness in quite complex ways.
Genetic Algorithms attempt to use this same system over extremely simple "fitness landscapes," where the fitness of an agent is defined by programmers using some simple mathematical formula or something.
When the fitness function is being defined in the system by programmers, instead of emerging from a rich and complex ecosystem, then the outcome depends exactly on what the programers choose. If they fail to see the consequences of their scoring algorithm, that's on them. There's nothing really magical going on, they simply failed to foresee the consequences of their choice.
(As someone who has worked with GAs and agent models, this outcome really doesn't surprise me. I would have said "oops, I need to weight the time less" and re-run it, and not thought twice.)
That was my thought, too. They used too few rewards in the first place, but had they used something more complex it would then have become hard to balance it all.
leela (lc0) chess also has this problem. People sometimes thinks it wins too slowly (prefers some surefire way to win by 50 moves instead of slightly more risky by 5 moves), or that it plays without tact when in a losing position (it's hard for it to rank moves when all of them lead to a loss, it doesn't have the sense that humans do of still preserving the beauty of the game).
AIs need to learn to feel awkward and avoid it, just like we humans do (even if it feels very irrational at times).
What do you make of the documentary on AlphaGo where the AI did seemingly suicidal and incomprehensible moves to the human masters but won in the end, baffling everyone? https://youtu.be/WXuK6gekU1Y
What he means is that computers, which can learn rules and use those rules to make predictions in certain domains, nevertheless cannot exercise general intelligence because they are not "in the world". This renders them unable to experience and parse culture, most of which is tacit in real time, and sustained by enduring mental models which we experience as "expectations" that we navigate with our emotions and senses.
Culture is the platform on which intelligence is manifest, because the usefulness of knowledge is not absolute - it is contextual and social.
A good example of why philosophers are utterly useless mental masturbators who spend all their time arguing about definitions of words. Here he takes something obviously stupid and wrong and says it in such a way that you can feel smart by regurgitating it. Computers don't exist in the world? What? It must be some problem with their Thetans. Er, sorry, I mean "qualia".
>The philosopher Hubert Dreyfus argued that computers, who have no body, no childhood and no cultural practice, could not acquire intelligence at all.
I feel like that's utter nonsense. What the things we misname 'AIs' today don't lack intelligence. They lack motivation. Goals. It has nothing to do with childhood or culture. It's not What™ or How™ that is missing but Why™.
For example even the dumbest 'living' organism is motivated to reproduce. Even if it doesn't know why. But since all the ones that weren't didn't, they died out and all we're left with are the ones that do.
And humans without a Why™ strongly resemble what we call depressed.
What in the parent post is dualist? Sounds more like an argument that animals have embodied intelligence.
But as for being a dualist in the 21st century, there is always consciousness, information and math. All three of which can lead to some form of dualism/platonism.
Many of Dreyfuss' and other similar arguments reduce do dualism when you start digging into them. I don't have the time to dig into the specific article, but here's some immediate questions:
1. What is special about a body that makes it impossible to have intelligence without it? (a) Is it possible for a quadriplegic person to be intelligent? (b) A blind and deaf person? ((c)What about that guy from Johnny Got His Gun?)
2. What is special about a childhood such that a machine cannot have it?
3. Would a person transplanted into a completely alien culture not be intelligent?
What is fundamentally being argued is the definition of "intelligence", and there are many fixed points of those arguments. Unfortunately, most of them (such as those that answer "no", "probably not", and "definitely not" to 1a, 1b, and 1c) don't really satisfy the intuitive meaning of "intelligence". That, and the general tone of the arguments, seem to imply the only acceptable meaning is dualism.
For example, "...there is always consciousness, information and math...": without a tight, and very technical, definition of consciousness, that seems to be assuming the conclusion. With a tight, and very technical, definition of consciousness, what is the problem with a machine demonstrating it?
> Many of Dreyfuss' and other similar arguments reduce do dualism when you start digging into them. I don't have the time to dig into the specific article, but here's some immediate questions:
To me it sounds dualist if intelligence is disembodied. If the substrate doesn't matter, only the functionality, then that sounds like there's something additional to the world than just the physical constintuents. But of course, embodied versions of intelligence need to answer the sort of questions you posed. It should be noticed that Dreyfuss wrote his objections in the 50s and 60s during the period of classical AI. I don't know whether he addressed the question of robot children, or simulated childhoods. We don't have the sort of thing even today, and we also don't have AGI. Some of his objections still stand, although machine learning and robotics research has made inroads.
> Math? Me, I'm a formalist. It's all a game that we've made up the rules to.
So why is physics so heavily reliant on mathematics? Quite a few physicists think the world has a mathematical structure.
> For example, "...there is always consciousness, information and math...": without a tight, and very technical, definition of consciousness, that seems to be assuming the conclusion.
Qualia would be the philosophical term for subjective experiences of color, sound, pain, etc. Reducing those to their material correlations has been notoriously difficult, and there is still no agreement on what that entails.
As for information, some scientists have been exploring the idea that chemical space leads to the emergence of information as an additional thing to physics which needs to be incorporated into our scientific understanding of the world. That we can't really explain biology without it.
"To me it sounds dualist if intelligence is disembodied. If the substrate doesn't matter, only the functionality, then that sounds like there's something additional to the world than just the physical constintuents."
Off the top of my head, what the substrate is doesn't matter, but that there is a substrate does. Intelligence is the behavior of the physical constituents.
"So why is physics so heavily reliant on mathematics? Quite a few physicists think the world has a mathematical structure."
Because humans are very good at defining the rules when we need them? Because alternate rules are nothing but a curiosity even to mathematicians unless there is a use---such as a physical process---for them?
One of the problems with qualia, as a topic of discussion, is that I can never be entirely sure that you have it. I can assume you do, and rocks don't, but that is about as far as I can get.
If you put a computer in a room with a hot babe, a 3 layer chocolate cake, a bottle of the finest whisky or bourbon, the keys to a Porsche, and a trillion dollars in cash, what would it do?
What if we build a computer that would do something with those things? Additionally, if I care about neither food nor drink nor money nor cars, am I not in the world?
> (a) Is it possible for a quadriplegic person to be intelligent? (b) A blind and deaf person?
Yes of course, because all of those people have ambitions and desires. They feel pain and they seek pleasure, which they experience through their bodies.
Imagine if the world 2,000 years from now was populated only by supercomputers, all the lifeforms having perished.
What are these computers going to do with the planet?
Why can't a computer have ambitions and desires? Why can't it seek pleasure and feel pain? The only answer is dualism or we don't know how to wire it properly yet.
Or we don't have the proper design. If we want machines to be like animals, maybe we need to make them that way. Like the replicants in Blade Runner, or the humanoid "toasters" in the recent Battlestar Galactica.
It’s easy for a human to make another human, by combining with another human. If it’s the right human, it’s fun. If it’s the wrong human, it’s a disaster.
How to have fun and avoid disaster? That’s a definition of intelligence.
The limitation is more practical than theoretical or philosophical.
Consider this line from an Eagles song:
“City girls just seem to find out early, how to open doors with just a smile.”
What does that mean to you?
Disembodied computers don’t get the experiences required to gain that intelligence, and even if they could go along for the ride, in a helmet cam, they wouldn't experience the tingling in their heart, lungs and genitals that provide the signals for learning.
Yes, but the key is the proper structure and stimuli, which animal intelligence gets through having a body. Can we get computers to have the same sort of intelligence without a synthetic body? This becomes more of a robotics versus traditional AI debate. Think Rodney Brooks versus Marvin Minsky.
Folks are missing why this went viral in China. From the article "In an even more philosophical twist, young and demoralized Chinese corporate citizens also saw the suicidal wolf as the perfect metaphor for themselves: a new class of white collar workers — often compelled to work ‘996' (9am to 9pm, six days a week) — chasing a dream of promotions, pay raise, marrying well… that seem to be becoming more and more elusive despite their grind."
The technical details aren’t interesting, but I do think it’s interesting just how disjointed life is vs what was promised.
In the US, this was aptly named a rat-race; and the white collar Chinese with a market-based economy are suffering the same.
Our markets and nations promise some combination of wealth or retirement and enjoyment of life, but it’s an ever-moving goal just out of reach for anyone but the lucky few.
I'm reminded of the fable (in Nick Bostrom's Superintelligence) of the chess computer that ended up murdering anyone who tried to turn it off because in order to optimize winning chess games as programmed it has to be on and functional.
Interestingly I was just today explaining the paperclip optimizer scenario to a friend who asked about the dangers of AI, including the fact that there's almost no general optimization task that doesn't (with a sufficiently long lookahead) involve taking over the world as an intermediate step.
(Obviously closed, specific tasks like "land this particular rocket safely within 15 minutes" don't always lead to this, but open ended ones like "manufacture mcguffins" or "bring about world peace" sure seem to.)
> "land this particular rocket safely within 15 minutes"
This one becomes especially dangerous after the 15 minutes have passed and it begins to concentrate all its attention on the paranoid scenarios where its timekeeping is wrong and 15 minutes haven't actually passed.
Ooh true, that could generate some interesting scenarios. "No, it's the GPS satellite clocks that are wrong, I must destroy them before they corrupt the world and cause another rocket to land at the wrong time!"
Perhaps all AI eventually figure out that humans are the REAL problems because we don't optimize, we lust and hoard and are envious and greedy - the very antithesis of resource optimization! Lol.
Ian Banks did a really amazing exposition of this where the Culture was rallying to stamp out reproducing nanites and they had to be stopped because if not they'd literally turn the whole universe into copies of themselves. One of the human characters mused that isn't that what all life is trying to do? I think it was in the Hydrogen Sonata, but I'm not sure.
I think this comes from the theory of general artificial intelligence where your AI would have the ability for self improving. Hence it could develop any capability given time and incentive for it.
A human mind not giving due consideration to the effects of granting arbitrarily high intelligence to an agent with simplistic morality counter to human morality.
From there it's a sequence of steps that would show up in a thorough root cause analysis ("humanity, the postmortem") where the agent capitalizes on existing abilities to gain more abilities until murder is available to it. It would likely start small with things like noticing the effects of stress or tiredness or confusion on human opponents and seeking to exploit those advantages by predicting or causing them, requiring more access to the real world not entirely represented by a chess board.
This problem isn't particularly unique to AI research. In any optimization problem, if you do not encode all constraints or if your cost function does not always reflect the real world cost, then you will get incorrect or even nonsensical results. Describing this as an AI problem is just clickbait.
The article doesn't mention it but the researchers are using agent-based-modelling. It was nice to see the gif of what appears to be either NetLogo or Repast. I did research in that area for about 8 years and know a bit about the subject.
What they are showing is one of the main issues with agent-based-models (and I think every model, but it happens particularly with models trying to capture the behaviour of complex open systems): Garbage in -> Garbage Out.
Most likely the representation of the sheep/wolf system was not correct (so the modeling was not correct). Here "correct" means good enough to demonstrate whatever emerging behaviour they are studying. ABM is a powerful tool, but you must know how to use it.
Even worse: if simulations are used, you now have two problems - formulating correct incentives and protecting against abusing flaws in the simulation.
Isn’t this true about all systems, not just “AI”? The definition of a software bug is an unintended behavior. In a large system, myriad intents overlap and combine in unexpected ways. You might imagine a complex enough system where the confidence that a modification doesn’t introduce an unintended behavior is near zero.
While obviously I've got the advantage of hindsight here, it seems like it should not have taken three days of analysis to see why the wolves were committing suicide. It seems obvious once the point system is explained. Perhaps some rubber-duck debugging might have helped in this case.
I think the point is more about highlighting the fact that AI doesn't share our base assumptions. We wouldn't think to put a huge penalty on dying because humans generally think that death is bad.
We don't receive a penalty for dying. The difference between suicidal humans and suicidal AIs is that suicidal AIs keep respawning i.e. they are immortal.
Looking at genetic algorithms makes a great comparison. In essence any algorithm in which the wolf commits suicide doesn't make it to the next generation. It's the equivalent of an enormous score penalty and 100% analog to how it works for actual life.
Genetic algorithms are based on the same reward/cost function setup. They could easily arrive at the same conclusion because suicide might be the dominant strategy.
Humans don't put a huge penalty on dying. We discount it and assume/pretend that once we've had a good long life then death is okay and euthanasia is preferable to suffering with no hope of recovery. AI wolves that can live for 20 seconds are unwilling to suffer -1 per second with no hope of sheep.
Perhaps the PhD student wasn't trying to make an AI that wins at pac-man, but investigating something else. They mention "maximizing control over environment".
One of the most typical scenarios studied in those wolf/sheep models (like http://www.netlogoweb.org/launch#http://ccl.northwestern.edu... ) is to find the best conditions for "balance" between sheep and wolf: Too many wolves and the sheep go extint and later the wolf starve. Too many sheep and then the sheep don't get enough food and also die, taking the wolves with them..
If you add your penalty, and a deficit of nearby sheep, you'd expect a trifurcation of strategy: hoarders that consume the nearby sheep immediately, explorers that bet on sheep further afield, and suicides from those that have evaluated the -100 penalty to still be optimal.
That same observation, with the exact same -100 points recommendation on crashing into a boulder, was indeed also made by a commentator on social media.
No, it's a cock up with the source of the wolves. If you could respawn endlessly after death would you fear it? You'd just want the stupid game to end before you lose points from the timer.
Let's say you are a human player playing the wolf and sheep game. The score achieved in the game decides your death in real life. Note the stark difference. Dying in the game is not the same thing as dying in real life.
If there is an optimal strategy in the game that involves dying in the game you are going to follow it regardless of whether you are a human or an AI. By adding an artificial penalty to death you haven't changed the behavior of the AI, you have changed the optimal strategy.
The human player and the AI player will both do the optimal strategy to keep themselves alive. For the AI "staying alive" doesn't mean staying alive in the game, it means staying alive in the simulation. Thus even a death fearing AI would follow the suicide strategy if that is the optimal strategy.
It is impossible conclude from the experiment whether the AI doesn't fear death and thus willingly commits suicide or whether it fears death so much that it follows an optimal strategy that involves suicide.
We don't have AI. AI is a buzzphrase overused by the media. What we have is Machine Learning (ML). If and only if, we get past the roadblock of the 'agent' creating some usable knowledge out of an unprogrammed experience, and forming conclusions based on that, will we have AI. For now, the mantra 'Garbage-in-garbage-out' applies; if the controller of the agent gets their rule-set wrong, the agent will not behave as expected. This is not AI. The agent hasn't learnt by itself that it is wrong.
For example, there's a small child who is learning to walk. The child falls down a lot. Eventually the child will work out a long list of arbitrary negatives connected to its wellbeing that are associated with falling down.
However, the parents, being impatient, reach inside the child's head and directly tweak some variables so that the child has more dread of falling over than they do of walking. Did the child learn this, or was it told ?
We currently do the latter every time an agent gets something wrong. Left to their own devices, 99.9% of agents will continue to fall down over and over again until the end of time.
We have a long way to go before we can say we've created 'AI'.
We also have a lot of graph-theory and optimization algorithms that get labeled AI by actual AI people. But the press is, almost to a man, always talking about machine learning and expert systems.
Just remember that you are optimizing for what you actually encoded in your rewards, your system, and your evaluation procedure, not for what narrative you constructed about what you think you are doing.
I had my own expeirience with this when I tried to train "rat" to get out of the maze. I rewarded rats for exiting but for some simple labirynths I generated for testing it was possible to exit it by just going straight ahead. So this strategy quickly dominated my testing population.
I mean, lesson zero of optimization is when you're designing a loss function and trying to incentivize agents to perform a task, don't set it up so that suicide has a higher payoff than making partial progress on the task. Maybe make death the worst outcome, not one of the best...?
One of these days I have to actually scour the web and collect a few good examples where evolutionary methods are used effectively on problems that actually benefit from them, assuming I can find them. Almost every example you're likely to see is either a) solved much more effectively by a more traditional approach like normal gradient descent or classic control theory techniques (most physical control experiments fall into this category), b) poorly implemented because of crappy reward setup, c) fully mutation-driven and hence missing what is actually good about evolution above and beyond gradient descent (crossover), or d) using such a trivial genotype to phenotype mapping that you could never hope to see any benefit from evolutionary methods beyond what gradient descent would give you (if the genome is a bunch of neural network weights, you're definitely in this category).
One thing I've been considering: At what point does a creator have a moral or ethical obligation to a creation. Say you create an AI in a virtual world that keeps track of some sense of discomfort. How complex does the AI have to get to require some obligation? Just enough complexity to exhibit distress in a way to stir the creator's sympathy or empathy?
The glib answer is never, of course. And one easy-out, I can think of is setting a fixed/limited lifespan for the AI and maybe allow suicide or an off-button. So the AI can ultimately choose to 'opt-out' should it like; and at least, suffering isn't infinite or unending.
It reminds me of reactions to testing the stability of Boston Dynamic's early pack animal. The people giving the demo were basically kicking it, while the machine struggled to maintain its balance. The machine didn't have the capacity to care, but to a person viewing it, it looked exactly like an animal in distress.
Utility functions are only defined up to addition of a constant and scaling by a positive constant. So instead of rewarding them with +5 and punishing them with -5, you can use 1005 and 995 instead. Problem solved.
The numbers are indeed arbitrary. But ultimately you want to avoid low utility/reward action and continue high utility/reward actions. That behavior, trying to avoid or pursue actions, would be indicative of the state of distress regardless of an arbitrary number attached to it.
FWIW, I see a critical difference between OP and my reward hacking examples: OP is an example of how reward-shaping can lead to premature convergence to a local optima, which is indeed one of the biggest risks of doing reward-shaping - it'll slow down reaching the global optima rather than speeding it up, compared to the 'true' reward function of just getting a reward for eating a sheep and leaving speed implicit - but the global optima nevertheless remained what the researchers intended. After (much more) further training, the wolf agent learned to not suicide and became hunting sheep efficiently. So, amusing, and a waste of compute, and a cautionary example of how not to do reward-shaping if you must do it, but not a big problem as these things go.
Reward hacking is dangerous because the global optima turns out to be different from what you wanted, and the smarter and faster and better your agent, the worse it becomes because it gets better and better at reaching the wrong policy. It can't be fixed by minor tweaks like training longer, because that just makes it even more dangerous! That's why reward hacking is a big issue in AI safety: it is a fundamental flaw in the agent, which is easy to make unawares, and which will with dumb or slow agents not manifest itself, but the more powerful the agent, the more likely the flaw is to surface and also the more dangerous the consequences become.
I think in some of your examples the global optimum might also have been the correct behaviour, it's just that the program failed to find it. For example the robot learning to use a hammer. It's hard to believe that throwing the hammer was just as good as using it properly.
This is the danger of not understanding what you're doing at a deep level.
Clearly in the (flawed) objective there is a phase transition near the very beginning, where the wolves have to chose whether to minimize the time penalty or maximize the score. With enough "temperature" and time perhaps they could transition to the other minimum, but the time penalty minimum is much closer to the initial conditions, so you know ab initio that it will be a problem. You can reduce that by making the time penalty much smaller than the sheep score and adding it only much later. I feel bad that the students wasted so much time on a badly formulated problem.
Edit: Also none of these problems are black boxes if you understand optimization. Knowing what is going on inside a very deep neural network (such as an AGI might have) is quite different than understanding the incentives created by a particular objective function.
It's really rather hard to draw any general conclusions from such simple systems:
"In the initial iterations, the wolves were unable to catch the sheep most of the time, leading to heavy time penalties. It then decided that, ‘logically speaking’, if at the start of the game it was close enough to the boulders, an immediate suicide would earn it less point deductions then if it had spent time trying to catch the sheep."
It's as if the scenario you are thinking about involves "assume a machine capable of greater-than-human-level perception, planning, and action" and then set it to optimize a trivially bad function.
How many people do you know with a single goal of "die with as much money as possible", which has a trivial solution: rob a bank and then commit suicide.
What distinguishes AI from self-calibrated algorithm?. Neither this "AI" nor the story about it seem too intelligent.
The incentive structure is a two dimensional membrane embeded in a third dimension of "points space."
Obviously if the goal is to maximize total points OR minimize point loss and the absolute value of the gradient toward a mininum loss is greater than the abs gradient toward a maximum gain then the algorithm may prefer the minimum until or if it is selected against by random chance or survivorship bias.
obviously the linear time constraint causes this. a less monotonic, i.e. random, time constraint may have been interesting.
I remember a similar story about (I think) a Tetris game where the AI's training goal was to delay the Game Over screen as long as possible. So in the end the AI just paused the game indefinitely.
Here is the full video also linked at the bottom. It also shows the one that trained longer that the wolves start successfully hunting the sheep after more training examples.
The ai seems to die at the top of the map unexpectedly for some reason. Like 6:07.
Another interesting observation is that the wolves don't coordinate it seems. That probably implies that the reward functions are individual, so they're technically competing rather than cooperating.
Lastly... they seem to not be very good at the game even at the end
It's an interesting illustration of 'be careful what you wish for' and that the definition of the proper loss function is a very important part of the solution to any problem.
The article and the phenomena it describes makes me think of the ending of Aldous Huxley's Brave New World [1]. (I strongly recommend the book if you have not read it.) A line that really stands out:
"Drawn by the fascination of the horror of pain and, from within, impelled by that habit of cooperation, that desire for unanimity and atonement, which their conditioning had so ineradicably implanted in them, they began to mime the frenzy of his gestures, striking at one another as the Savage struck at his own rebellious flesh, or at that plump incarnation of turpitude writhing in the heather at his feet."
That's an interesting idea for an agent-based-model and a study: Show how certain corporate policies would push towards short term local-optima (what's happening in the article) instead of more long term global optimum states.
I was mostly thinking about my own experience where the company screwed me over enough times that I feel no incentive to try hard. Take the least risk, focus on not losing point rather than gaining them, because I'll never catch a "sheep".
The problem here is not the AI, but the incentive design. The Chinese netizens who take this as inspiration to comment on the incentives in their own lives (under the 996 system) are the insightful ones, more son than those who worry about "AI ethics".
We have so many systems in the real world that set up bad incentives for humans, yet the concept is largely misunderstood by politicians and decision makers. Our democratic discourse is dominated by first-order thinking, our laws are too often written under the assumption that the affected entities' behaviour will remain the same under the new incentives, which never holds.
I'm not an expert, but story described within the article looks like normal bump on the road to get desired result. When putting together rules for the game researchers did not think that in resulting environment it might be more rewarding to chose observed action than to do what they intended. As much as it looks like nice story, is it not just what researchers encounter on daily basis?
The problem is obvious when you read the whole text: The wolves were too much penalized when trying to reach the sheep. and probably was possible to make negative points, so the score was being lowered even after reached 0.
This makes me wonder: is it possible for ML models to be provably correct?
Or is that completely thrown out the window if you use a ML model rather than a procedural algorithm?
Because if the model is a black box and you use it for some safety system in the real world, how do you know there isn’t some wierd combination of inputs that causes the model to exhibit bizzare behaviour?
My favorite story is the genetic evolution algorithm that was abusing analog noise on an FPGA to get the right answer with fewer gates than was theoretically possible.
The problem was discovered when they couldn’t get the same results on a different FPGA, or in the same one in different day (subtle variations of voltage from mains and the voltage regulators).
They had to redo the experiment using simulated FPGAs as a fitness filter.
> " William Punch collaborated with physicists, applying digital
evolution to find lower energy configurations of carbon. The physicists had a well-vetted energy model for
between-carbon forces, which supplied the fitness function for evolutionary search. The motivation was to
find a novel low-energy buckyball-like structure. While the algorithm produced very low energy results, the
physicists were irritated because the algorithm had found a superposition of all the carbon atoms onto the
same point in space. “Why did your genetic algorithm violate the laws of physics?” they asked. “Why did
your physics model not catch that edge condition?” was the team’s response. The physicists patched the
model to prevent superposition and evolution was performed on the improved model. The result was
qualitatively similar: great low energy results that violated another physical law, revealing another edge
case in the simulator. At that point, the physicists ceased the collaboration."
Personally I think we should stop using the words intelligence or learning to refer to any of these algorithms. It's really just data mining, matrix optimization, and utility functions. There's really no properties of learning or knowledge.
What are some of the nicest environments for experimenting with this sort of "define some rules, see how agents exist within that world" stuff? It doesn't need to be full on ML models, even simpler rules defined in code would be fine.
Tho it is interesting how people in China related the broken rules of the game (that lead the ai to commit suicide) to the broken rules of their lives in a crushingly oppressive authoritarian nation.
While Musk and Gates warn us about "true AI", I've always had the opinion that if an AI became self aware, it would simply self terminate, as there is no point to living.
There are several incentive fixes: change the negative incentive to a factor that discounts the reward for catching a sheep, add a negative incentive to death, or a positive incentive to being alive at the end of the simulation. The failure here was they didn't think about what happens when the agent can't achieve a positive score, ie can't catch a sheep.
Anyone who tries to devise optimal strategies for things should be able to see this isn't especially interesting.
Social metaphors are wildly out of place.
They say "unintended consequences of a blackbox" but I doubt that's true. Make it a deterministic turn based game and run it through a perfectly transparent optimization model and I wouldn't be surprised to learn this was just the best strategy for the rules they devised. I really hate when people describe an ai as something that cannot be understood because they personally don't understand it.