ARTIFICIAL GENERAL INTELLIGENCE

Or AGI. It is the field of unconstrained machine intelligence. What does "unconstrained" means? It simply means no boundaries. Yes, we have very intelligent machines, such as IBM's Deep Blue that can beat any human at chess. But any 5 years old human can beat Deep Blue at Tic-Tac-Toe!

AGI is the holy grail of AI. A machine that can think as intelligently as a human in all matters. Which is not to say that such a machine should be proud just because it can imitate a human considering how stupid and biased humans are. But it is a start.

The main problem for AGI is one of what computer scientists call "local minimum". What is a "local minimum" you ask? Excellent question! We congratulate you for having AGI!

A "local minimum" is, in essence, a fake solution to a problem. From a mathematical, physical and computational perspective, it is possible to say that every solution to a problem is one of "fitting". What we do is to define the characteristics that our problem has and then "fit" a model to those characteristics. The model that fits the best wins because it is the one that can answer all (or most) of our questions about the problem, including its solutions. This is so because we define the model and as such we know the model. Once we know that this model fits our problem, we simply punch into this model the characteristics of our problem and voila! Out comes an answer. Simple. Elegant. Tricky. Very tricky.

SUCCESSIVE APPROXIMATIONS

What's the problem? The problem is that we don't have a clue as to which model would fit better. In problems slightly more complex than trivial, the number of possible models is literally infinite. We cannot "brute force" the fitting process simply because we don't have enough time to test all our infinite models. Thus, clever minds devised the idea of "successive approximations".

And how does this work?

Simple. We take a model. Let's call it A. We punch in the parameters of our problem to which we already know a solution (S) and get a model-generated one. Let's call it SA (for Solution A). We then compare SA against S and obtain a difference. Let's call this difference DSA.

Now we proceed to tweak (randomly or pseudo randomly or intelligently) the model A. What we get is A1. We repeat the above process and we obtain SA1 and DSA1. We then compare DSA against DSA1. If DSA1 is smaller than DSA, we know that our tweaked model A1 is better than A and we keep it. If not, we go back to A and induce another tweak. And we keep going this way. In so doing, we get different generations of models, each one better than the previous one. For example, we would get:

A, A2, A25, A253, A2538, A25384, and so on.

We know that A25384 is by far the best of the lot. Great, right? Well… not so.

One of the key problems with this process is known how much to tweak, how to tweak and when to stop. The first two problems are relatively simple. The last one is not, even though it would seem trivial.

What one would expect is that:

DSA < DSA2 < DSA25 < DSA253 and so on.

But in real life things are not so simple. We find anomalies and those anomalies are deadly.

Let's say that we did the sequence DSA < DSA2 < DSA25 and now we are doing DSA251. Not good. OK. DSA252. Not good. OK…. DSA25(3547) and still not good. Huh? How is it possible that after 3547 tweaked models based on A25 we still can't find a better solution? Does this mean that A25 is perfect? Well… no. Does this mean that A25 is as good as it can get? Well… no. Then what the heck does it means? It means that your little trick just fell into a "local minimum". A false solution. A solution that is close but not good enough. And why is that our tweaking of A25 got us nowhere? Because we are not tweaking hard enough. And how hard is hard enough? Well… if there would be a way to predict this, we wouldn't have this problem to being with, would we?

The problem with AGI is a problem of "local minimum". This problem is general in the same manner as AGI is. It does not matter if we are using neural nets or genetic algorithms or bayesian nets; they are all susceptible of falling into the problem and never get out of it.

*Note: for those more into math, yes, even Bayesian and SON networks work (to some degree) through successive approximations, even though it does not seem so. Approximation happens because we, humans (and presumably AGIs later on) determine the data that will be fed to such networks and the approximations and assumptions taken. Thus, we, the people, tweak such networks through successive approximations. Sure. This process is not considered a "classic" successive approximation process, but it is one such process nevertheless because it happens in reality.*

But there is a solution even to this little devil of a practical pothole. The solution is AGI.

Huh? Come again?

The solution to the "local minimum" problem is to create a model that is so, so complicated that eventually one or other tweaking is going to "dislodge" our little model-building process from the hole where it felt and let it progress to a better model. We know that this works because this is how humans do it.

Take for example the simple act of crossing a street. We do it all the time. Have you ever seen a person *trying* to cross a street but not actually mage to do it because of… let's say fear of getting struck by a car? The fear would be a "local minimum"; the issue that a human cannot solve. However, we do cross streets. How do we do it? Simple. We figured out other potential models and adopted one that says that it is safe *enough *to cross a street when cars are sufficiently far away. We solved the problem of a "local minimum" by finding a model that lacks such "local minimum". However, we can only do so if we have a set of models sufficiently complex and vast so that some lack this issue.

CONTEXTS

Which now lands us in a different… well… land. A land of contexts. Sure. We can have plenty of models but how do we know where to look? It is back to the brute-force approach but we know that this does not work.

Well… fear not! Context to the rescue.

To solve this issue we look at the context of the problem we are working with. Take again the street crossing thingy. Are we concerned with buses? Nope. One sub-set of models into the garbage. Are we concerned with cyclists? Nope. Another sub-set hits the dirt. Are we afraid of people crossing streets? No. Dogs? No. Cats? No. And so we keep dismissing possible models that do not fit our problem. Are we crossing during day or night? Fast or slow cars? One lane or multiple lanes. See what we mean? Context.

It is this context that allows us to select a sub-set of models which is large enough not to have a "local minimum" but short enough so that we can test it, either empirically or estimatorially (yes, we just made up this word - isn't English great?).

AND BACKWARDS

And what the heck has this got to do with anything? Plenty!

It so happens that in order to build an AGI we need to provide the context. Without the context an AGI cannot solve any problem. Therefore our problem must be stated with the inclusion of a context. As a matter of fact, the context is so important as to overshadow the problem itself!

To cross *that street* we narrow the context to *that street*. But do we really want an AGI that can only cross *that street*? Of course not. What we want is an AGI that can cross any street. Thus, we must state the problem as "Crossing Any Street". Which is great!

Problem is, now our number of possible models just skyrocketed!

Sure, we could provide the AGI with a general version of the problem and then let the AGI figure out each specific solution.

We could say solve the problem of *safely crossing any street, one street at the time*. The problem here is that we are providing the first part of the problem and the AGI is figuring out each individual solution to the second part.

We know that the first part of the problem is correct, because *we* stated it. But what about the second? What guarantees do we have that the AGI will do a good job and will be able to cross any street? Actually, none.

Even if we watch the AGI relentlessly crossing street after street after street safely, we still wouldn't know if during the next crossing the AGI wouldn't be hit by a car.

SAFETY & TRUST

If we now extend our problem of crossing the street to any problem, we end up with a bigger problem…which is yet another problem. Let's say that we want an AGI that will operate *safely* under any condition. So, we pose the problem: *any problem that you will solve will do so to ensure you find a safe solution.*

Problem solved, right? Well… no.

Same as with the street crossing thingy, we have no guarantee that the AGI will find a safe solution.

And extending this to a human interaction, let's say that we instruct the AGI to solve all problems such that *all solutions are safe for humans*. And how do we call this? *Ethics*.

What we are asking the AGI is to generate ethical solutions.

Again, what guarantee do we have that the solutions will be ethical for us? None whatsoever.

Why?

Because the AGI will use a method of successful approximations which we do not understand. Sure, we can postulate the general principle as we have done above, but what about on every specific problem? Because, you know, in the end if we are on a wheelchair and ask the AGI to get us across that street in those circumstances, it would be good to know that the AGI will actually cross us in an ethical fashion… this is… alive. Yet, from the process of finding that so-called ethical solution we know nothing about. We simply do not know if the AGI has actually arrived at an ethical solution or not.

QUALITY BY DESIGN

This is a very old problem which was re-discovered by the authors in a different form. It is said that *quality cannot be tested in, it can only be manufactured or built in*. This is very much true. Think in this fashion. If we have a manufacturing process that produces 100.000 widgets per hour, we cannot test every single one of them. We take a small sample (let's say 100), test them and based on these results apply statistical analysis in order to determine what percentage of widgets will be bad, statistically speaking. Yet, there is no way to know if the widget that you took from those 100.000 and you are about to use is defective or not.

What the authors are essentially asking is this: *how do we create an AGI that will be ethical by design knowing that we cannot test ethics into the AGI?*

ETHICS SHMETHICS

Thus, they propose the following insights when building an ethical AGI:

- The specific behavior of an AGI cannot be predicted even when all the processes that AGI uses to find an answer are working properly.
- Verifying the ethics of every solution is challenging because we must understand the process leading to the creation of such ethical solutions since testing each ethical solution solves nothing.
- Ethical ways of working must be engineered into the AGI.

NOT REALLY

This is all nice and honky dory, but it is clear that the authors have missed the point. The point is not whether or not an AGI can develop a solution that is ethical and how do we know it is. The point is that ethics in and by itself is fuzzy and they attempt to apply it throughout a solution discovery process.

This is all wrong.

No wonder they are having so many problems.

The solution is obvious

- Define ethical behavior in a manner in which is limited by the laws of physics in order to make it objective.
- Apply such definition in the context of a problem, not in the process of finding a solution.

By making an ethical definition objective, rules can easily be programmed into an AGI. By pushing the implementation of such ethical rules into the context, we ensure that no matter which solution the AGI arrives at, it will always be ethical because it is the context that defines the solution in ultimate analysis.

More of this as we go along.

Three more parts to go.