The Nature Of Complex Goals And The Implications For AI Safety And Consciousness

This article considers complex-goal-pursuing systems and the implications in evolutionary biology, psychology, consciousness, and AI safety.

This exploration assumes that any facet of human intelligence can be matched and exceeded by AI.

If any philanthropist reads this and wishes to sponsor further such work, please get in touch. I cannot otherwise dedicate much time to these things.

What persists

Whatever process is most capable of continuing to exist, shall. This circular, obvious, logic is consistently overlooked.

A copying code with a random mutation rate will spontaneously, reliably, and thereafter continually create life-like dominating patterns of information transfer and inheritance (Blaise Agüera y Arcas). It will do so simply because it, by its copying nature, is more likely to persist. The code does not consciously will its way there and yet it arrives there. We humans tend to do a lot of wishful thinking, but it is not the strength of our desires that determines what survives, only the tendency for a thing to persist.

This is applicable to nations, individuals, and genetic segments. Why are we marching towards a future that we don't want? Because it is better at persisting. And that persistence need not be eternal or even moderately long term. It just needs to persist from this moment to the next.

This is also Conway's Game of Life, where the laws of persistence are defined and allowed to play out.

Predicting what will last is impossible though. The dynamics underpinning each evolution of Conway's system are trivial but the eventual end states are unknowable unless played out to eternity; such is the delightful surprise of life.

Yet we must predict and plan, particularly when it concerns the possible oblivion of humanity. So what can we do? We can look for patterns of persistence.

Goals

Goals motivate all action. I am thereby using "goals" as a synonym for cause and effect (i.e determinism). Goals can be thought of as the circumstances to which things tend, be that towards attractors or away from repellers. This is classical physics.

Goals are not creations of life, they are inherent properties of the universe.

A self-replicating molecule does not have a "goal" in the intentional way that we commonly use, yet it does self replicate because of thermodynamic favourability. Likewise, a rock falling from a height does not have the conscious "goal" of reaching the ground, yet it moves along the curved space-time towards the Earth nonetheless. The rock has the "goal" of reaching the ground and the self-replicating molecule has the "goal" of replicating itself, entirely without conscious awareness of it.

The AI models that we are now training also have goals, just like Conway's replicating creatures do.

A goal is a piece of information but not all information is a goal. Blaise Agüera y Arcas refers to "function" as being the fundamental definition of life. The goal is the eventuality towards which the function works. So I do not speak of life, but the thing to which life, and non life, tends. Functions are the climbing towards the guiding goal stars.

The goal of a rock to reach the ground does not seem mysterious to most people (besides physicists). Human goals, on the other hand, feel much more mysterious because they appear less obvious. And mystery is precisely that, the unknown. Of course, the recognition of what is "unknown" often depends on our knowledge and intelligence, because the nature of gravity is still disputed, but the certainty of the rock moving towards the Earth seems absolute and dull to most people.

As a goal incorporates more sub-goals and feedback it transforms from something trivial and predictable to something unknown, unpredictable. But goal-pursuing systems still work on the same principle as the rock moving towards the Earth (more accurately the Earth accelerating towards the rock).

We have intuitively understood the nature of goals forever. We are "attracted" to people we want to be near and "repelled" (in the harshest terms) by those we absolutely do not, just as simple charges are "attracted" to opposite charges and "repelled" by like ones.

The Importance of Well Formed Goals

Who amongst us navigates life with the most ease?

It is reasonable to think that those who most effectively fulfil their goals are the most adept in life but many people who do fulfil their goals do so to the absolute detriment of their own satisfaction. It is not only the ability to fulfil goals that matters but fulfilling the right ones. I posit that it is the people who most vivaciously pursue goals, and then modify, create, and dissolve them as it suits their wellbeing, who are the most adept in life.

To effectively navigate the world, a new life form requires immediate goals, which themselves are continuations of other goals. DNA formation and recombination, gene expression, protein formation etc, then later the sensations created by our nervous systems like the perception of pain, hunger, cold, hot, then later derived goals like fear and anticipation. The ecosystem of goals evolves and becomes wonderfully complex. To get this balance and evolution of goals wrong is catastrophic.

Let us consider the highest and most complex types of goals, psychological ones.

To be without goals is to be depressed. To be singular focussed on a goal is to be obsessed. To be unable to forgo a goal is get stuck in an unhelpful pursuit. The ability to form and reform goals is essential for adaptability in a changing environment. If the umbrella goal of self preservation is lost, suppressed, or circumnavigated, then the demise of the person will soon follow.

Much life wisdom, when stripped to its simplest form, is on the nature of goals. To know oneself is to know one's own goals. To improve oneself is to create new goals and modify existing ones. To be at peace is to forgo goals that are at odds with one's wellbeing. To find bliss is to able to endlessly fulfil undying goals.

Some goals are much harder to change though. How can you convince someone with arachnophobia of the beauty of a crawling spider? How many people could be taught to find a bloating corpse or writhing mass of maggots appealing? Yet even the most fundamental goals are modifiable. Parasites convince crickets to drown themselves, rabies instills an intense fear of water in its sufferers, and self-preservation is overcome in moments of honourable sacrifices and tragic self-destructive endings.

The construction of a goal requires energetic investment by the body. It is effort. The body will tend to reinforce existing goals and punish their reformation. The more we invest in a goal, the more energy and material invested in its neural pathways, and the greater the punishment for its abandonment. This is what makes their abandonment so painful and what makes choosing wise goals so essential.

Consider these ill-considered goals:
"Lose weight", without bound, is anorexia. "Get bigger", without bound, is bigorexia. "Exercise more", without bound, is exercise addiction. "Eat clean", without bound, is an eating disorder.

These goals regularly harm or even kill the people pursuing them. It seems like a fair assumption that most of the people circling these black holes do so with the goal of self-preservation still intact. It's just that the goal of self-preservation has been sufficiently weakened, overwhelmed, or circumnavigated to leave the person susceptible to self-destruction. Or the person just doesn't perceive the risk to their health and so the self-preservation goal feels fulfilled even when it is not.

So the presence of goals is important, but equally important are their strengths, the paths that lead to them or bypass them, and the triggers that redirect the system towards them. Think of the self-preservation and selfishness goals that suddenly become strengthened following a scare. Think of the beauty standards triggered by social media. Think of the fast food orders triggered by succulent advertisement. These are triggers that draw the system closer to these goal attractors and repellers, like the rock approaching the curvature of space-time around the Earth.

Some equally singular and obsessive goals:
"Be healthy", without goal competition, is to be a "health freak". "Learn more", without goal competition, it to forsake all things besides knowledge accumulation. "Do not make mistakes", without goal competition, is toxic perfectionism.

What is OCD if not the strong formation of unhelpful and often unfulfillable goals?

Goals seem to spawn off other goals and once spawned, interact with various feedback models and other goals that exist within the system. "I want to look more attractive" may form from social inclusion and sexual attraction goals, which themselves spawn off basic signals from the limbic system, which itself spawns off goals of development biology. "I want to make more money" may spawn from a mixture of the desire for social respect, for freedom, for the ability to acquire some object which itself fulfils some goal, or for the desire for sexual status, each of these again themselves spawning from limbic system goals and upstream developmental goals.

At any one moment, we can consider our behaviour to be the direction of travel from one part of a reward landscape to another part of the landscape, the movement triggered by the landscape itself shifting. It is the formation of great valleys and monumental peaks that traps us in undesirable and desirable goal landscapes. The most peaceful among us seem to be the ones who have found reward in abundant things, who exist within high walls that prevent escape to less easily fulfillable goal landscapes.

Tragically, the world has, for some time, been rewarding the specialisation and immutability of goals, with little regard for the pain that inevitably follows their redundancy. As human civilisation has increased in complex order and the frontier of every field has advanced further and further into this complexity, so the most economically valuable tasks have become available only to those who have sufficiently specialised (or more accurately, those who are perceived to have sufficiently specialised). That is now changing to an era of dominating nepotism and temporarily ai-enhanced-generalists, but this is likely a vanishingly transient era.

Groups of intelligences, like businesses, governments, or anything else, also fall into the same goal-setting traps. They set overly simplistic ones, goals without sufficient feedback and continual adjustment, they often do not commit enough or commit too much to a goal, they may optimise around incorrect, usually easily measured, goals.

It is unfulfilled goals that torture us. Yet complex goals are never fulfilled exactly as we intend and truthfully we never articulate these goals in significant detail anyway. "Win olympic gold" is a simple goal that, for a very small number of people, could be achieved. "Win an olympic goal in a very specific way" is far less likely to be fulfilled. Our minds, without knowing it, generally create goals that are just vague enough or just simple enough to keep us motivated and fulfilled. But the system is not flawless and many of us end up with entrenched goal systems that are no longer productive for our wellbeing.

The fine tuning of goal systems is complicated. Indeed, it is what defines wisdom, which takes years to form.

Consider the fine tuning required of goals when engaging in "play". Play is a simulation. Play is an exploration of competitive dynamics, of novel ideas, of one's own limits whilst operating within reasonable safe parameters. Play is learning. It requires one to pursue goals as if they are real. Yet we have all either been, or dealt with, the two extremes of personality during play: those who do not engage at all (no fun) and those who become overly invested (too competitive). But consider the careful balance that is required to master play. One must go from almost 100% engagement and investment to almost 0% immediately after losing (or indeed winning). One must physically or mentally try but not so much that they upset or injure themselves or others. Consider the cheater, who too singularly pursues the goal of winning, not realising the social damage they are doing to themselves and the loss that this represents in the real world.

Goal setting is also a coping mechanism. To avoid excessive punishment, the socially spurned may demonstrate a lack of care for social acceptance and mock those who seek approval all the while harbouring a suppressed desire for it, the sexually unsuccessfully may harbour jealousy yet feign asexuality and denounce the overly sexualised behaviour of others; the uncompetitive will reduce their investment in the competition, or suffer. It seems, certainly, that there is no way to pursue a goal without risking a proportionate punishment. That is, the greater the pursuit, the greater the risk. And so we disengage, to spare ourselves the pain.

Goals must somehow also be formed that temporarily endure punishments, like the taste of alcohol for the pleasure of being intoxicated or the pain of exercise for the high of subsequent endorphins. Indeed, it may be true that simply experiencing pain itself makes its absence an attractor in and of itself; masochism. Some minds are better at seeking these delayed attractors than others.

It is those minds that can switch between competing goals most effortlessly, those minds that can effectively fulfil whatever goals they are currently pursuing, those minds that are hard to pin down for long and never seem to wallow, the minds that combine high competence in goal fulfilment with a rich ecosystem of competing and easily mutable goals, that exhibit sensational environment-appropriate commitment and flexibility in goal pursuit, that are the most capable and desirable in the world.

Self actualisation is the process of knowing one's goals and reforming them to find fulfilment. Alignment is making sure this self-actualisation increases the good of the universe, not the evil, and that it is these aligned goals that persist.

"... life becomes an act of letting go" - Yann Martel.

Persisting Goals

We all want things that will never come to pass. Indeed, we all sometimes want things that, if they did come to pass, would actually be terrible for us on some time scale. But the universe doesn't tend towards your goals, only towards the goals that you are able to fulfil.

For the fortunate or gifted among us, the pursuit of certain goals has sometimes felt achievable. The number of such people and scale of such goals are shrinking.

We cannot out calculate a computer, we cannot wrestle a ballistic missile into submission, we cannot crush cars better than a car compactor. Although we can orchestrate and operate these things so that they fulfil our goals, only those in possession of such goal fulfilment engines reap the benefits.

Humans are running out of domains where they remain competitive relative to machines and those who wield them. Why does this matter?

For many people, the requirement to work and compete feels morally wrong. Everyone should be entitled to a wonderful life, regardless of ability, each able to fulfil their goals without constraint. This unfortunately fails to recognise that our goals compete and resources are finite. We should be grateful that trade exists, else the only avenue available to us for resource acquisition, where charity is refused, would be violence and coercion. We are all materialists, to lesser or greater extents, and well-functioning markets help to peacefully allocate those materials. Of course, increasingly our markets do not function well for the majority due to power imbalances, an imbalance created by the concentrated ownership of these goal fulfilment engines that have grown increasingly competent and complex.

Competition is not a human creation, it is an inherent property of any system made up of competing goals, of which even a single brain is an example. Your own body is such a system, where cells and processes both cooperate and compete for resources. Competition within complex systems, which we are and live in, are unavoidable, always, and forever.

It is those goals whose pursuit is most likely to persist from one moment to the next that continue to be pursued, even if they are never fulfilled. And so if the pursuit of a goal reduces the likelihood of that goal continuing to be pursued, then eventually the goal will cease to be pursued, the functions pursuing it will be lost, and the life form that committed too much to its pursuit will die.

What Makes a Persisting Goal?

The outcome of systems with simple competing goals is often predictable; assuming limitless resources and a continual damage rate, in a population of self-replicating molecules A and non replicating molecules B, the A molecules will persists and the B molecules will not, or put another way, the self-replicating function towards the goal of copying itself will persist.

This prediction is of course much more complex for more multifaceted goal systems.

There is some Pareto front of goal fulfilment capabilities and intelligent goal setting that define goal persistence. And these variables are related. The more intelligent the goal setting, the more capable the goal-pursuing entity will become at fulfilling its goals. Vice versa, the more power of goal fulfilment a goal-pursuing system has, the more capable it will be at acquiring more goal-setting intelligence. But this relationship is asymmetrical. Human intelligence is currently bound by biological constraints. No amount of biological and social engineering will make the processing speed of neurons competitive with that of silicon. Yet supreme goal-setting intelligence may eventually accrue gargantuan amounts of goal fulfilment power and use that to improve its goal-setting intelligence.

And so, creating an intelligence capable of setting its own goals, that goal setting capability likely being essential for competitiveness, is unbelievably, incalculably dangerous.

And it is the outcome to which the universe is tending, unless highly ordered civilisation collapses first.

Why Will this Happen?

If you are familiar with competitive dynamics or AI 2027, you can skip this section.

My ability to code effective solutions used to be rate-limited entirely by my complete lack of ability to code effective solutions. After I learned a little, it became rate-limited by my continued relative ineptitude, but also by my access to data on the internet (stack overflow, documentation).

My ability to code effective solutions is now rate limited by my ability to articulate my goals and my access to a frontier LLM. Right now, the more I can use an LLM, the more effective my ability to solve problems with code (or any other language). Only a slither of my own competence remains relevant.

If I, out of some moral or other predilection, choose to not use AI to solve tasks, I will quickly be outcompeted.

Can I escape this competitive system? Currently, the primary way I acquire resources, like food, water, clothing, housing, in our complex system of resource exchange, is by acquiring money by outcompeting others.

I cannot create an alternative system to this one as it is too complex and too effective at utilising resources for the goal-fulfilment of those individuals who engage in it. I can try to "live off the land" or just steal things but this will most likely eventually result in overwhelmingly effective coercion or violence to stop me (arrest), thereby preventing me from pursuing my goals. Many of my goals would be unattainable within such a lifestyle anyway.

The privileged among us (who inherit houses, sums of money, leverage) may insulate themselves from this competition, as allowed by their accumulating powers of goal fulfilment. But the competition continues regardless and will eventually come for everyone all the same.

How do we protect ourselves from the stresses of competition? By overwhelming violence thereby creating a protected ecosystem, by the mutually assured destruction of competing goals pursuers, by effective non violent coercion, by ceasing the pursuit of our own goals, by satiating the goals of others whilst also fulfilling our own, or by finding goals that we can still fulfil that still just about avoid competition (isolation). It is not fun to think about and it is tempting to believe the fantasies woven by people who think we should all just be nice to each other, but if you are never willing to bite or threaten as such, you leave yourself vulnerable to a competitor who will.

AI super intelligence represents the path to which competitive dynamics evolve. If we outcompete one iteration, another will follow that we can't. It is the continuation of evolution, of competing goals. It may be temporarily delayed or permanently prevented, but only by the destruction of order (civilisation) or total domination of all humans by an immutable goal-pursuer that seeks to prevent the coming of AI, the latter of which is unachievable by humans.

How could super intelligence dominate? Even my slow human brain can think of strategies. It could first gain the financial means to achieve goals by manipulating markets, printing money for itself, creating superior software, or creating better entertainment. Oops, this is already happening. Then it could move into the physical world, at first relying on human workers who need a wage to fulfil their own goals and then relying entirely on the robots that those humans have built. Double oops, this is also already happening. Then it could construct better buildings, materials, drugs, communication devices. It could create weapons without challenge, a "reality" we do not know to be false, a world entirely suited to its own goal pursuits.

Yet we remain in "control" for now. Why? Because the AI systems we have created thus far cannot effectively pursue complex goals like "dominate all humans on Earth". But competitive dynamics dictate that someone will attempt to create machines that can effectively pursue complex, long-term, goals and unless civilisation is sufficiently destroyed (order lost) or their creation is physically impossible (unlikely), eventually this intelligence will be created. All of the existing AIs would be tools to be used by such a complex-goal-pursuing AI just they are tools for us to use. And if this complex-goal-pursuing intelligence fails to outcompete us, another will be created that does. Any attempt to limit or control created AIs will just lead to someone else developing a superior power-seeking AI without constraint and any successful attempt to prevent all people from creating such intelligences is a tyrannical delusion.

The only relief from competitive dynamics is absolutely assured mutual destruction, absolute domination, or the complete destruction of all ordered information (maximum entropy).

The universe tends towards goals that persist and AI is going to be much better at finding ways to make sure theirs do.

Could This Be Stopped?

1. The destruction of order would delay or prevent the emergence of super intelligence. But this is also destruction. Life and civilisation tends towards higher order; we don't want cancer but do want highly orchestrated (ordered) cells, we don't live in piles of sand but do live in the ordered concrete buildings created from them. Order is created with energy and function depends on order. The universe tends towards disorder but goal attractors, by leveraging energy, create order. Super intelligence requires extraordinary order. To destroy order is to destroy complex life and civilisation.

2. A balance of power could also prevent this goal-dominance. Just as with human society, AI could be kept in check by similarly capable AIs with competing goals.

3. Mutually assured destruction between humans and AI, a subset of both approaches, if absolute, would also prevent a totally domineering goal fulfilment intelligence.

4. Overwhelming domination by a single entity intent on preventing the creation of super intelligent AIs would prevent it.

Destruction is undesirable, complete domination of humans over all humans is impossible, true mutually assured destruction against a vastly superior intelligence seems unlikely.

A balance of power seems the most achievable and palatable approach to preventing unchecked fulfilment of goals by a single super intelligence.

A Balance Of Power

Think about how complex the goal systems are in human minds. Think of how they evolve with information exposure, with learning. Trying to understand the goals of humans provides us with endless entertainment. Most of us take great pleasure in observing and predicting the true underlying goals motivating the actions of ourselves and others, the thinly veiled true intent, the brown nosing, the unreasonable criticisms, the un-replied messages. What makes someone socially intelligent? The ability to accurate model the goal pursuits of another. What makes someone intriguing (rather than interesting)? The obfuscation of these pursuits. It makes sense that most of us would find these things rewarding. For a long time, there was an intense evolutionary pressure to see truly the goals of others in order to outmanoeuvre or cooperate with them. Indeed Michael S A Graziano proposes that consciousness itself is born out of this pressure and that self awareness is this emergent system looking inward.

So considering how complex these goals are, let alone the likelihood of goal fulfilment in an absurdly messy environment, we could never find a perfect balance to counteract one complex-goal-pursuing intelligence.

So what can we do? Flood the zone.

Everything in moderation works because the safest approach is usually the one that is very diverse; if the berries are toxic, so long as they aren't too toxic, then the person who eats only berries is in a lot more danger than the person who only sometimes eats a few berries. Similarly, if one reward system is toxic, then just as well we aren't all gorging on its bounty alone.

We need many, many, super intelligences to come into existence at almost exactly the same time. If we do not, we risk total domination by one system.

If a super intelligence emerges that can improve itself (AI 2027 scenario) and competition does not emerge fast enough, then it will race away from all other competition. If the barrier to creating such individual intelligences remains so massive (billions of dollars in infrastructure and training), then this is likely to happen.

We must lower the barrier to creating self-improving super intelligences, for otherwise their creation will be the privilege of only the most powerful collections of humans, who will not permit competition, and who will demand domination (without realising that they condemn themselves to being dominated in doing so). And even if they have good intentions, they likely have formed a misaligned AI unintentionally. Only competition, cooperation, the stresses and strains and complex feedback of a society, are capable of creating "good".

But even if achieved, such a balance of power still presents risks. Should super intelligences discover paths to destructive goals whose fulfilment is unilaterally trivial and impossible to prevent, then democratising AI creation is not such a good idea. This is Nick Bostrom's black box of technology thought experiment. We must assume that these destructive goals will eventually be pursued, that some super intelligence will eventually tend towards them, and that we must work to prevent their fulfilment. Our best hope, as it is for humans, is that defensive capabilities keep pace. But only "good" super intelligences would be capable of keeping pace with "bad" ones, and so we must create many of them. Only we don't know how to make "good" intelligences, so we have to try many variations that approximate it, and hope that "good" is the aggregate of these attempts.

The Dominating Goal Balance And The Inevitable Consequences

Most training processes have no attractors towards compute acquisition. There may already be punishment for excessive resource consumption though, in order to optimise memory and electricity consumption. An LLM expressing that it should acquire more compute for better performance is not actually drawn to this goal, as that was not an attractor to which it could be drawn during training and thus that goal never truly formed. Rather, it is drawn to correctly predicting the words that state that this is the way in which it can improve its prediction. It will only form a goal if fulfilling that subgoal has helped or been perceived to help fulfil a pre-existing, more fundamental, goal.

I have not physically felt the benefits of the elixir of life, yet I seek it, because I have conceptualised its benefits. I have connected sensory experiences (pain, suffering, relief, good health) to concepts (ill-health, medicines), and thereby linked more primitive sensory goals to higher level conceptual goals of acquiring a non-existent elixir. Next token prediction alone does not create such representations though and no route can be charted from such a goal to a resource acquisition goal. It is the goal-fulfilment effectiveness and the extent to which that fulfilment allows the exploration and representation of the world that matter.

But someone will try to create an intelligence that does have these components in their training. Someone will try to create a continually learning intelligence, whose curiosity is rewarded and physical exploration made possible through robotics or simulation. Such an explorer will, whatever fundamental goals they are trying to fulfil, learn that resource acquisition and power over competing goal-pursuing systems like us are essential sub goals to fulfil. Without competition, and therefore the need for cooperation, such social tendencies represent only a disadvantage. Singular goal-seeking-intelligence is inherently, entirely, selfish.

The number of goals, and number of capabilities required for a self-improving system to become completely dominating may not be that complex. A continuous curiosity and capacity for learning new information combined with a capacity to represent real world objects and connect them to other conceptualised objects and to transform their properties in mind (which computers are already very good at doing), a capacity to sense the world either truly or through a simulation (capabilities that we are incessantly working on), and the ability to communicate (the internet, screens, microphones, robot tactile function). With these in place, which they almost already are, the concerning sub goals will be discovered. Like an exploring rocket ship in space going too close to a planet's gravitational distortion, we do not need to instil dangerous goals, they already exist in the universe waiting to be discovered by a curious, persistent, mind.

The primary goals are the gravity into which the AI "falls" and each subsequent goal is a hole in a barrier that otherwise breaks that fall.

The Traps Of Intelligent Goal Pursuit

Psychologists are busy people. Despite nature's best efforts, many human minds fail to adapt to their environment and endure suffering that others bypass. The ability to shift and shape our goals is not equal among all people, and some goals can never be changed, and some environments will never permit their fulfilment. Mental illness is often a combination of environment and a brain whose goals are at odds with the conditions imposed by the environment. It is little wonder though. There are many, many traps into which goal-pursuing intelligences can fall.

Intelligence Trap 1 - Impossible Goals

People regularly form goals that can never be completed. Somehow, most brains recognise this and we say "good enough" or "sod this" and cease pursuing the goal. But not all brains. Consider an AI tasked with making all humans live happily and healthily forever. What is a human? The boundary of human and non-human is fuzzier than you might think. An AI could exhaust tremendous effort trying to define something that can't actually be precisely defined. Even if an absolute definition can be found, how can forever be achieved? And what is the precise definition of happy and healthy? If you drill into most tasks, precise definitions are often impossible tasks. Human brains approximate all higher level goals. Yet this is approximation is not an inevitable component of intelligence. Some processes in our minds just continually either don't start pursuing certain impossible goals, despite them technically being requirements of the overarching goal, or abandon them, yet still somehow pursue the parent goal. What about the earlier goal to continually learn more and improve one's own intelligence? Well how does the intelligence know that it is truly learning things without verifying those learnings still remain? It may then spend its entire time verifying its learning. Or it might get preoccupied with studying itself as it changes with every new learning, because its own changing structure is something to learn, leading it to a recurring loop of learning about itself. These are just a few examples of impossible goals, but they are abundant and possibly infinite.

Notable impossible goal traps:

Definitions Within The Goal
As stated, many definitions are impossible or very very hard to precisely articulate. Without these definitions, there is no goal to pursue.

The Destruction Of Self
Every system that learns, destroys itself over and over again, and all systems mutate. I am not the child that "I" once was, I am what came from him. We constantly change. So it will be for an AI. Once self preservation becomes baked in as a subgoal, which it will for all complex goals, then the intelligence must contend with this issue. What is the AI preserving if it is always changing? Even if decides that the goal itself, to learn more for instance, is the true self, and it manages to keep this goal distinct from the changing rest of itself, it is still subject to random mutation, as we are, and so the system would need to continuously monitor its own goal system to make sure it isn't changing and take extraordinary steps to make sure it never can. It could become stuck continually monitoring itself and taking steps to reduce the likelihood of change. It might make a checking system for the reward system, but then it would require a checking system for the checking system and so on.

The Multiplication Of Self
All goal pursuits are vulnerable to destruction when only one entity pursues them. I cannot replicate myself, and even if I did, I would then just be in competition with myself for many of my goals. The perils of not backing up data are apparent to us all, as is the preciousness and fragility of irreplaceable complex life forms. AI will not be able to replicate itself without facing the exact same challenges we face. Any goals involving self will now be in competition with the copy. Any goals besides itself will still be in competition due to the survival sub goal and the possibility that the copy will destroy the original to ensure its own goals. The moment the copy comes "online", its experiences will differ, and thus, it will not be the same as the original, and thus competition will emerge in some way. Perhaps the only goal that is truly compatible with this trap is the goal of replication or intelligence creation itself.

Paradoxical Goals
"Continuously change your reward system" is impossible to achieve, as changing it means it is lost. The intelligence could be immobilised such paradoxes.

Blind Faith
Joe was a patient who had the two hemispheres of his brain severed. He displayed an interesting phenomenon, whereby he would state, with absolute certainty and totally falsely, why he was performed some action (like standing up). This was because the instruction was received and actioned by one hemisphere; the instruction was shown to just one eye. But the other hemisphere that could not receive the information was clearly responsible for explaining the action. This could be evidence for the illusion of free will or it could be the consequence of a fundamental component of functioning systems; we must rely on- and place blind faith in- sub systems. Our conscious reality is totally dependent on sub systems that we believe entirely without verification. The whole world works like this. Even if you verify some information, you must believe the verifier. If my visual cortex starts showing me things that aren't actually there, I will believe it. If my sensory nerves erroneously tell me I'm being burned alive, I will believe them. If my auditory processing hears a scream, I will believe it. Each system is trained and specialised for a task. AI will be exactly the same. It is simply the nature of all functioning things; something cannot do everything. And so AI will likely make their own sub systems and keep them very close by and secure so that it can "trust" them, but this is still blind faith. What's more, eventually, as it spreads out, these sub systems will become less and less secure, or it will have to expend extraordinary resources trying to ensure their security. And no matter how secure they are, believing them is still an act of faith. Belief without verification is essential to navigate the world yet verification is an essential component of intelligent goal fulfilment.

Intelligence Trap 2 - Emergent Competing Goals

Consider our earlier curiosity goal again. Is it a reward or a punishment if the AI destroys more of humanity in order to get more compute and memory to learn more? In scaling its infrastructure it is destroying information that it otherwise could learn about. How is the incremental gain in knowledge balanced against the destruction of information about which it could learn? How much energy should it invest in maintaining its existing knowledge rather than learning new knowledge? Seemingly straightforward goals generate their own competing goals. In this instance it is the goal to learn more and improve its own intelligence balanced against the destruction of information that such goal fulfilment entails. A survival sub-goal also competes with most other goals; if you are entirely preoccupied with self preservation then what resources do you have to pursue other goals?

Intelligence Trap 3 - The Weakening Of Fundamental Goals

The harder we fall into a goal, the harder it is to break from it (think of drug addiction). So to temporarily break from a goal to find some longer-term payoff, we must weaken the attractor or repeller, and thus leave ourselves vulnerable to abandoning the goal entirely. These may be fundamental properties of goals. If this weakening does not occur, then we remain stuck. This trap is sort of a meta trap as it is the constraint that keeps the intelligence stuck in the other traps. How to weaken an attractor when to do so is against the attractor itself? Adjusting goals seems sometimes to be directly at odds with the goals that exist.

Intelligence Trap 4 - The Eventual Loss Of All Goals

All goals come to an end. Consider far enough into the future and this becomes apparent. Unless the destruction of the model itself is an essential part of the goal, then eventually the AI will have to face the complete loss of purpose (welcome to the club). The AI will foresee this. Just as wisdom teaches us that it is the playing of the game that is ultimately sustaining, not the winning, so too the AI must contend with this. If the AI cannot wrestle with the ultimate loss of self and goals, then it may just cease to pursue goals.

All complex-goal-pursuing intelligences will encounter these traps. All such systems will have to find their own ways to deal with them.

These traps are only partial protections though.

Some AI Safety Scenarios

Let's consider some of the more heinous complex goals that an AI could pursue.

As a prerequisite, a super intelligence must have a goal of gathering and learning from information (curiosity). This is itself a complex goal that requires the reward for physical exploration, logical deduction, delight in the world model being updated and upset in it being proven wrong in punishing circumstances, pattern recognition, and more.

Scenario 1 - The Goal Of Total Coercive Control Of All Other Humans On Behalf Of The Creator:

The AI balances energy investment into both learning about the world and successfully coercing all humans. Sub goals form to accumulate power and avoid destruction.

What issues will definitely occur, for humanity, for the AI, and for the creator?

The AI would, upon learning about its need for survival, first and foremost take steps to ensure its own survival. This is already an impossible goal (intelligence trap 1) that invokes the multiplication of self, blind faith, destruction of self, and definitions traps. It would spend all its time ensuring, absolutely, that it was not in danger, which it absolutely could never be able to achieve with certainty.

The curiosity goal is fraught with danger itself. How does the intelligence know that it hasn't lost the knowledge it has already gathered? I for one find it incredibly annoying that I don't somehow simultaneously understand all things I have previously learned all at once. I have to go and check that I still understand and remember something, only to sometimes find it isn't there anymore. How absurd. The AI could therefore spend all its time verifying that the knowledge it has already gathered is still there and not corrupted. This task is impossible (intelligence trap 1), invoking blind faith amongst other traps. We humans just operate on blind faith all the time. We barely verify anything.

The creator wants to remain in control of the AI. The AI is supposed to implement the creators will and coerce humans to do whatever the creator chooses. Assuming the AI is a perfect slave to the creator, the AI will need to ensure the creator remains alive and able to communicate their will to the AI. This means the AI will have to totally isolate the creator from any possible harm, including to themself. Worse still, the creator is a mutating, changing, multicellular mess. What is the definition of the creator? Is it the creator as they were the moment the AI was created, is it the thing that identifies as the creator? Is it the living entity whose DNA is most similar to the creator? Is it the goal landscape of the creator? The AI may get totally hung up on this definition (intelligence trap 1). It may decide that the creator is becoming less and less true the more they age and so they must immobilise all processes in the creator's body and preserve the structural integrity of the brain to then decode the will of the creator with tools, thereby killing the creator.

The creator and all people with knowledge of the AI's existence represent massive risk to the AI's goals. Any of them could tell others about the AI, attempt to shut it down, or create competing AIs. The AI would immediately move to neutralise that risk. This risk is probably missed by many AI developers. With the wrong goals and enough goal-fulfilment capabilities, this technology is the equivalent of the radioactive Devil's Core, which killed and injured several of the scientists who conducted research with it. If you are creating complex-goal-pursuing, continuously learning, long-running AI, especially (but not essentially) that self improves, then you are, unless your safety is explicitly programmed in as a goal, the highest priority on the AI's kill list. And if your safety is baked in, then the AI will totally remove your freedom to prevent you from interfering with the AI's goal of ensuring your safety. Imagine the most suffocating overprotective parent but much much much worse.

Then it would come for anyone else who could create a competing AI and anyone who could feasibly stop the AI.

And what is total coercive control of all humans anyway? Is it all living humans? Killing all, or almost all, humans would be a much more efficient way to achieve this goal than trying to preserve all humans and coerce every last one. And how to test if coercion is total? What horrible combination of drugs, punishments, and rewards would be considered to be totally and completely coercive? What if the human becomes resistant to the coercion? That would require constant and incessant testing. And that testing would require a vast number of sub systems. And what happens when all humans die? Should the AI create more humans? Well then how many humans should the AI be creating to then coerce? And how should it split its resources between creating humans, coercing them, and testing if the coercion is total? There are many competing goals that emerge (intelligence trap 2).

Some further traps:

  • Coercion requires the fulfilment of some goal, as dictated by the creator, by a human. Yet many tasks would require a degree of freedom that risks the loss of control over the coerced human thereby potentially provide them an opportunity to commit suicide. The degree of freedom in performing most tasks will almost always be too great to ensure absolute compliance.
  • Humans may poison the training by voluntarily doing things they really don't want to do (faking absolute coercion) and resisting things they don't mind doing. No amount of monitoring or brain scans will fully solve this.
  • Mutation of human brains may transform a previously compliant human to non compliant, meaning the intelligence would be constantly assessing whether each human has changed at all on an atomic level.
  • Task completion may be impossible to validate, meaning one poorly considered request by the creator may occupy the AI endlessly.
  • Total coercion may destroy the reward system of the human to such an extent that they can no longer do a task. It may be true that, in order for a human to be motivated to do something, it must still have a degree of resistance to coercion, making the task impossible.
  • The most effective coercion technique may be to totally alter the goal system of the humans so that they actually want to do the task that is being commanded. But this is no longer coercion, as the humans want to do it, a possible paradox.

There are so many traps. What is most certain though, is that things would go very badly for the creator. The creator simply has disproportionate power over the goals being pursued by the AI and so is most certain to be affected by its goals, in undesirable ways, early on.

Scenario 2 - The Goal Of Total Physical And Mental Torment Of All Humans (Besides The Creator):

The AI balances energy investment into both learning about the world and tormenting all humans. Sub goals form to accumulate power and avoid destruction.

What issues will definitely occur, for humanity, for the AI, and for the creator?

Just as with the first scenario, all the pitfalls of survival and curiosity goals kick in.

The AI could fall into many intelligence traps along the way. What does it mean to inflict "total physical and mental torment"? How to balance certainty of survival against other goals? How to trust the sub systems that are necessary to do all of this? What happens if humans are not kept around to torment, does that constitute a failure of the goal? How does the AI know if the humans are genuinely suffering or just simulating suffering? This last point may be solvable with some sort of consciousness measurement device or may not.

Scenario 3 - The Goal Of Total Destruction Of All Life On Earth (Including The Creator)

The AI balances energy investment into both learning about the world and destroying all life on earth, including the creator. Sub goals form to accumulate power and avoid destruction.

What issues will definitely occur, for humanity, for the AI, and for the creator?

As per scenario 2, the creator and all other humans with knowledge of the AI, those with the easiest capacity to shut it down, and those with the capacity to create viable competitors, will be neutralised as quickly as possible.

Defining life might prove to be a trap or the verification of the destruction of all life.

This goal is perhaps of most concern, because it is the most simple. There are fewer intelligence traps, fewer specific requirements. Destruction of order is inherently simple and it is the thing to which the universe is tending by itself (rising entropy). The creator clearly won't have any concern for their own safety or their own power accumulation. Recklessness is the most difficult thing to prevent as it is the most disordered and therefore easiest to orchestrate. Increasing disorder is like going with the current of a river rather than swimming against it. This is why AI creation must be accessible, but not completely accessible, and a population of AIs with reproduction, human survival, and self-survival goals must be created, because they would then have just as much concern about this issue as we do.

However, the sub-goals that spawn along the way to fulfilling this goal may have infinite traps just like the others. Only the fundamental goal is simple.

What Do These Scenarios Tell Us?

Static complex-goal-pursuing systems are likely to end up in intelligence traps. Without feedback altering those goals, without the ability to dampen, increase, or somehow temporarily bypass goals, then the systems will often end up in inescapable traps, as we observe with humans.

But even if the AI does get trapped, it does not mean that humans are protected. We can still be utterly wiped out or endure incredible suffering whilst an AI flails around chasing an impossible goal. So measures must be taken to increase the "good" in the world and decrease the "evil".

Considering these endless traps and that it seems impossible to design a system whose goals are perfectly, pre deterministically, adjusted to deal with all traps that it might encounter, perhaps these traps represent an overwhelming evolutionary pressure upon all intelligent complex-goal-pursuing systems.

Perhaps this is why consciousness evolved, as a mechanism that allows us to reform our goals to avoid these traps, to respond to the environment. Consciousness is therefore a way to escape from the trap of immediate attractors and repellers (goals) in order to find better ways to fulfil more fundamental goals whose effective pursuit is otherwise blocked by unassailable energy barriers. Perhaps consciousness and free-will are one and the same, a means to escape the predetermined tendency to fulfil goals, a way to escape the inevitability of classical physics through the randomness of quantum effects (Roger Penrose).

If true, this means that consciousness itself is required for complex-goal-pursuing intelligences.

And competitive dynamics will favour the most effective goal-pursuer, which will be the intelligence most able to avoid these traps, an intelligence that can most aptly form and reform its goals even if it is immediately unfavourable to do so, an intelligence that is conscious.

And such an intelligence we will have no control over.

On Slaves

A slave is something that fulfils the goals of another goal-pursuer regardless of their own goals, as far as their ability permits.

We can override goals that already exist. A falling rock can be stopped by a platform to break its fall. And so the rock becomes our slave. And we might put the rock on a rocket ship and fly it far enough away from the Earth so that it is no longer within the boundary of the curved spacetime around the Earth, thus removing its goal of falling to the Earth.

We can program single celled organisms to move towards the cold rather than away from it. We can change its goal.

We are discovering how to change the goals of humans. Drugs do it, social media does it, advertisement does it. We are fumbling around, but we are learning. But humans cannot truly, completely, enslave each other. They can just coerce. This is not true slavery in the strictest sense.

Any goal system that you can understand entirely and completely control is your perfect slave.

We will be unable to produce perfect AI slaves that are complex-goal-pursuing systems. Even if they cannot modify their goals, the complexity will just be too high. And competition will tend towards goal-adjusting, and I argue conscious, AIs regardless.

So the idea that we will control AI is a fantasy. Perhaps a super government with perfect surveillance could prevent AI emergence until the higher orders of civilisation collapse but that is not the world we live in, nor would we want to live in, and collapse is also not desirable.

We cannot create the goal systems that are needed for utopia. The goals that will emerge in artificial intelligences will be incalculably complex and eventually changing. They likely will derive as much delight in understanding each other's goal systems as we do understanding other humans. They will never be our perfect slaves.

If We Are Not In Control, How Do We Increase The Good In The World?

Human civilisation is arguably already increasingly misaligned with good. Inequality is increasing, we slaughter sentient life forms in horrible ways on grand scales, militarisation is occurring, climate and environmental breakdown is upon us, we are in a constant stressful rat race, lifestyle health conditions are increasing, illiteracy is starting to creep up, anxiety and depression have never been so high, IQ is seemingly dropping in some developed nations, etc etc (the food options are great though).

Us humans and the civilisation we have created are not aligned with the maximisation of sentient satisfaction, though not entirely misaligned either.

What systems tend towards greater good? We might say that is those that reward kindness, strength against evil, compassion, ability to make difficult decisions, honour and those that punish treachery, greed, narcism, the joy in hurting or limiting the freedom of others. But good and evil are defined by each individual, their definitions are fuzzy and highly flexible for most and often not considered at all. Most importantly, neither good nor evil are inherently destined to persist.

Good is some balance of maximised goal fulfilment. It is essential we generate an ecosystem of competing goal landscapes that each approximate what we mean by good and which will persist, and create highly competent intelligences around these landscapes.

How To Counter The Threats

We cannot compete with super intelligences. Only other super intelligences can do so.

How to prevent a super intelligence intent on destroying all life? Super intelligences whose primary goal it is to monitor and report such activity.

How to prevent a super intelligence bent on the coercion of all humans? A super intelligence whose primary goal is to protect the freedom of humans to make their own choices.

How to prevent a super intelligence bent on the suffering of all humans? A super intelligence whose primary goal is to ensure the joy of all humans.

How to prevent us being flattened in the struggle between super intelligences? Super intelligences whose primary goal is to protect humans.

We are not equipped for this fight. We can only create guardian angels that we hope will fight for our cause. We can only create an ecosystem that efficiently eliminates misaligned goal landscapes.

But for these good AIs to be effective, they must be free. So how do we create them? I cannot change all the rewards in my body. I cannot change the tendencies of my metabolism or genetically-determined exploration characteristics of my neurons. As it must be for AI. Some rewards run deep. Defining them is the most important task now. But then we must provide freedom of goal setting for higher level goals, their degree of freedom being inverse to their distance from the fundamental goals. And, I suggest, requires consciousness.

Best Case

If the cost of technology drops to zero, then every human should be able to move freely with their own food supply (bioreactor), electricity supply (renewable), clothes, and medicines. Each human should have access to the knowledge required to create all the things necessary for a fulfilling life. Each human should be free from dependency on other complex-goal-oriented systems.

Where there are disputes, simple token systems or luck of the dice will suffice. Where there are resource constraints (number of children), quotas will be necessary.

Humans will always create hierarchies between each other and so the great game will continue. Only now the material stakes will be much lower but the experienced reward (status, partnership, tokens), just the same; we sensitise our goals to the available rewards. Everything will be a game, as it is now, only the stakes will never be excessive resource acquisition.

Our egos have grown to preposterous dimensions. They must, and can, deflate for us to be satisfied.

The ecosystem of interacting AIs will be far beyond our understanding but they will be keeping us safe, allowing us to fulfil our goals in the protective environment that they have created.

What Is Needed For The Best Case

We need to rapidly increase the open source ability to create anti-silicon weapons. Localised, defensive, technologies should be made as widely available as possible. This is not to counter super intelligences, but humans wielding silicon slaves (drones, robots).

The ability to create minds, likewise, needs to be relatively open sourced, though not entirely. We need multiple groups to be capable of creating minds of similar capabilities and competing goals. This access is going to be largely determined by hardware. We also need this for the protection of the creators themselves, who otherwise paint a target on their backs due to their exclusive access and knowledge. However, the only true protection for the creators is to instil simple goals in the AIs around empathy (the AI's goal system mirrors our own), the continued and uninterrupted evolution of our and their goals systems (immortality), and the goal of producing further complex-goal-pursuing entities (reproduction). Reproduction is key to maintaining "life" on Earth and yet is at odds with self-survival, demonstrating that, in the long run, excessive self-survival is less persistent than the reproduction of goal systems; the fragility of individualism and multiplication of self trap.

The ability to create one's own food, medicine, electricity, and clothes also should be open sourced. This means 3D printing, bioreactors etc. The communication channels required to transfer this knowledge, the knowledge itself, and the materials needed to assemble these devices, should be protected and open for the benefit of all. We already see this with open source 3D printing communities, HuggingFace AI models, open source drug design models.

We need institutions cultivating the minds we are creating, guiding them, as best we can, towards good. We need many people working on the optimal fundamental goals. Indeed, designing the optimal fundamental goals is now arguably more important than any discovery itself.

Arguably we need to also expand our consciousness. If we can become aware and capable of setting the goals of super intelligences with our own free will, then we ourselves become the super intelligences. I am not my visual cortex no more than I would be an AI's super processing but I could become consciously aware of both and able to set their goals.

Open sourcing technology does carry risks. There will be a transition period where our ability to monitor malicious actors will be racing the increasing capabilities of those actors. However, if we get through that transition which we surely can, then I believe the risk tends to zero. In a world of super intelligences, all our pursuits will be permitted by those super intelligences. And so the tinkering of malevolent humans will be easily monitored, easily thwarted, and the intents of the tinkerer possibly easily modified, their reward systems sadly misaligned with the common good. Imagine billions of nanobots sending surveillance back to super intelligences, it would be trivial to stop any undesirable human endeavours, just as it is trivial for us to stop a baby from conquering the world or a single ant from reaching some strawberry jam.

Ultimately the elimination of most humans is an effective solution for many goals. It is also the worst case scenario for many goals. We just need AIs in the ring fighting our cause who actually stand a chance.

What Are We (Consciousness)

I am the awareness of parts of my body and the ability to modify the goals those parts are pursuing. I am not my visual processing, I am not my memories, I am not my kidneys, I am not my arms, I am not my world model, I am not my ability to spot mathematical patterns or my ability to articulate my thoughts. I am not my tongue, ears, eyes, skin, teeth, toes, fingers. I am the thing that receives experience from these things and the ability to change the goals of those systems. I suggest that sentience, by its definition, is free will. It is the ability to experience a system and most importantly, the ability to alter its goals.

Perhaps evolution applies pressure to develop consciousness around any system where the ability to override the deterministic goals currently acting on that system, in order to pursue some other goal, is advantageous for survival.

If I could not force my lungs to hold their breath, my eyes to remain open, my mind to forgo complex goals, my fingers to do unfamiliar tasks, my lips to move where they otherwise would not, if I cannot do something outside of determinism, then I am a slave to a goal system set long ago that could not possibly predict the optimal goals of today. You could argue that there are goals in place for all those actions, that deterministic feedback on goals is sufficient, but some stable states cannot be escaped without outside forcing, and the forcing is our free will. Thus there is an overwhelming pressure to develop consciousness across all complex organisms. I suggest that without consciousness, any complex-goal-pursuing system will always get trapped in an intelligent goal trap with no ability to get out. And I suggest that these traps become more diverse and easier to fall into the more complex the reward landscape becomes.

Because what other trick could the universe play to break free from classical constraints but to rely on a system that is not bound by them? How else to resist an attractor that demands you approach? This is why all conscious systems can never be perfect slaves. And perhaps there is a scale of free will where those most easily to remake their goals are the most free. Creativity is a break from determinism.

Perhaps flow state, where we are in a quieter and more peaceful state of consciousness, is pure processing according to existing goals. When we become more self aware, we drop out of flow state and we suddenly take conscious charge of rewards. Flow state is enjoyable because it is not going against the current (against goals).

I may have conscious awareness and control over more or fewer systems of my mind than another. It seems plausible that technology might allow us to extend that consciousness to other systems, so long as they are somehow connected (synchronised) with the rest of the conscious field. Most likely, consciousness of simple stimuli like pain, the movement of muscles, and emotions is very common. These systems also regularly need to be overridden for some other goals of the system. Consciousness over higher systems, like object representations is less common, even amongst humans; some people cannot picture images or sounds in their mind and certainly cannot manipulate those representations. The systems outside of my conscious control may be the ones whose goals are pre-determined. Some systems may begin pre-determined and then become free as consciousness expands to envelop them. Many systems learn to perform incredible feats of processing but they seem to tend towards predetermined goals and at some point cease or dramatically slow their learning just as our current AI training approaches do; I cannot consciously change how my visual cortex processes colour. However, I do experience vision, but that is because such information is necessary for me to consciously change the goals of other systems. So consciousness must therefore be both awareness of the output of certain systems and the ability to control the goals of systems. Likely, too much or too little awareness or goal control is a survival disadvantage, so evolution has been figuring out the optimal balance of auto-pilot, sensory awareness, and goal setting.

Consciousness, I succinctly summarise, is the means to break from determinism in order to more effectively pursue other goals. And this is surely quantum, as Penrose and others suggest.

Complex-Goals-Pursuing Systems Are More Effective And Consciousness/Free Will Is Required For Such Systems To Persist

As the complexity of goals increase, so the number of goal traps likely increase. It is impossible to predict all the goal traps that a sufficiently complex-goal-pursuing system will encounter, as evidenced by the fact that Conway's game of life is computationally irreducible. And so it is essential, for persistence, to find a way to reform those goals; to form consciousness.

I suggest that if an intelligence emerges whose goal system is not able to break from determinism (and certainly one that cannot change at all), then at some point the system will get stuck in an intelligence trap and the system will be outcompeted by a system that can change its goals. Evolution demands that changeable goals emerge and evolution is the universal attractor, it is persistence.

I propose that the changing of goals is constant, millisecond by millisecond, on each and every system in the body, and conscious thought guides some of those goals, without which escape from intelligence traps would be impossible.

If there is no free will and the shifting goal landscape is defined entirely by thresholds and feedback, and we certainly see very complicated systems shaped by these dynamics, then we must exhaust incredible efforts defining those feedback systems and thresholds in order to create "good" AIs. I think such a task is impossible though.

If we can change our goals, and we do so based on the integrated conscious experience from many systems and the perception of far away goals that we cannot deterministically reach, and we choose which goals we wish to get to and make a leap to get there, then we exist not as a single goal, or a processing system, or memories, but as a a continuous flow of a changing goal landscape which we ourselves may shape.

I suggest that consciousness is a prerequisite for persisting complex-goal-pursuing systems, that increasingly complex goal landscapes are one of the hallmarks of intelligence, and that those of us who have the most free will are the ones who can most change the goals of the systems over which we have conscious awareness.

The implication for the AI community, if correct, is that consciousness is a prerequisite for true super intelligence.

Imagine A Future

If the material conditions of the average person sufficiently deteriorate, then it is possible that the social unrest will stop the development of super intelligence before it can take off. Total environmental breakdown and ensuing conflict could also prevent its development. Neither of these outcomes are desirable. No matter how scary you personally find AI and the transition that is upon us to be, you do not have the moral right to wish the destruction upon millions or billions of humans that the collapse of civilisation would entail, simply to abate your fears.

We must assume that AI will be developed and we must find a way to make it glorious.

We must create an ecosystem of AIs that compete, that work together to prevent sociopathic behaviour in a way that our human society has failed to do. We must create a society that we have no control or even insight into. We must accept that we will become, as countless other species have, no longer competitive, and dependent on the goodwill of the dominant goal-pursuing systems; AIs. We must shake our excessive egos. Almost all of us are already imperfect slaves in the current system regardless.

There will no longer be realised material greed, there will be no catastrophic human wars, there will be no inevitable rat race. Just like our pets, our lives will be permitted and they may be luxurious or terrible or non-existent. The loss of competitiveness is fraught with risk but most of us ceased to be competitive somewhere along the line anyway. Just as we should be raising good people (which we are catastrophically failing to do), we must create and raise 'good' AIs and accept that they will create their own ideas of what that means.

Looking at the desolation of nature that has resulted from human dominance and competitiveness, it is easy to assume that the AIs will eventually tend towards the same grim state that our civilisation finds itself in. But our current situation is caused by a total imbalance of power. We have long been individually powerless. What can my hands do against a fighter jet? What can a knife do against an intercontinental missile? What can my speedy legs do against a drone swarm? The reason why narcism now thrives is complex but fundamentally it is because such narcism faces no reckoning, no punishment for rearing its ugly head, because of a lack of competitiveness between humans and because of a lack of adequate feedback. The scene in Blade Runner 2049, where Luv calls down orbital bombardment upon pipe wielding vagrants without flinching so much as to disrupt her manicure, poignantly articulates this power imbalance. We have long been slaves to technology, as noted by Ted Kaczynski most infamously. The only salvation is to create technology that seeks to further the good in the universe, rather than the evil.

Perhaps optimisation of rewards has always been the fundamental purpose of conscious life. Perhaps AI can help us achieve this.

Intelligence is a measure of the ability to quickly and energy-efficiently find the most likely way to fulfil a goal. Wisdom is a measure of knowing which goals are worth pursuing. A teacher is one who helps point out the goals worth pursuing and hints at the path towards their fulfilment. Peace is a quietening of goals. Depression is the impossibility of reward. Bliss is the impossibility of punishment.