Bruno Beaufils, JeanPaul Delahaye and Philippe Mathieu
Université des Sciences et Technologies de Lille
Laboratoire d'Informatique Fondamentale de Lille  U.R.A. 369 C.N.R.S.
U.F.R. d'I.E.E.A. Bât. M3  F59655 Villeneuve d'Ascq Cedex
email: {beaufils, mathieu, delahaye}@lifl.fr
In section 2, we present in details the gradual strategy which has the same qualities than titfortat plus a progressive retaliation and forgiveness possibilities. Two experiments are reported, the first one with twelve general strategies to show the precise behavior of it, the second one with the results of a tournament we have organized in the French edition of the ``Scientific American''.
In section 3 we try to improve the strength of this strategy by using a genetic algorithm, on a genotype we have created and which includes lots of wellknown strategies (in fact our genotype can cover more than strategies). We present our ideas on a tree representation of the strategies space and finally we propose a new view of evolution of cooperation in which complexity plays a major role.
In the last sections we describe our results and we discuss about the natural behavior of this strategy and its good robustness in ecological competitions.
This game is issued from the Game Theory of John von Neumann and Oskar Morgenstern and has been introduced by RAND game theorists Merrill Flood and Melvin Dresher in 1952. The idea was to introduce some irrationality in Game Theory, which is used as a way of modelling interactions between individuals.
This game has been found to be a very good way of studying cooperation and
evolution of cooperation and thus a sort of theory of cooperation based upon
reciprocity has been set in a wide literature, such as in
[2,4,5].
The experimental studies of the IPD and its strategies need a lot of time computation and thus with the progress of computers, a lot of computerscientists and mathematicians have studied it as they have been able to use specific methods, like genetic algorithms, on it, see [1,3,6,9,19,20,25,26,30,31].
As cooperation is a topic of continuing interest for the social, zoological
and biological sciences, a lot of works in those different fields have been
made on the IPD:
[7,8,16,17,18,21,22,24,28,23].
Although all people who have studied the IPD come from different research fields, it could be said that they are all working on the same topic, which belong to the Artificial Life field, the bottomup study of Life.
Let two artificial agents have the choice between cooperation and defection. They play one against the other, in a synchronous manner, so that they do not know what the other will play. They get a score according to the situation of the move:
To have a dilemma, temptation must be better than cooperation, which must be
better than punishment, which must be better than to be the sucker. This can
be formalised as:
Since this oneshot version of the Prisoner's Dilemma is not very interesting (the most rational choice is to defect), the game is iterated, the final score being the sum of all the moves scored. Each player does not know how many moves there will be, thus each agent's strategy can be studied, to look, for instance, how each player tries to put cooperation in the game.
To avoid the oneshot Prisoner's Dilemma solution to influence strategies, by giving too much importance to temptation regarding cooperation, it is useful to add the following restriction : 2R>T+S
With this restriction, strategies have no advantage in alternatively cooperate and defect. The classical payoff matrix used is shown in table 1.
To study the behavior of strategies, two kinds of computation can be done.
The first one is a simple roundrobin tournament, in which each strategy meets all other strategies. Its final score is then the sum of all scores done in each confrontation. At the end, the strategy's strength measurement is given by its range in the tournament. This is the way the titfortat strategy has been isolated by Axelrod in [2].
The second one is a simulated ecological evolution, in which at the beginning there is a fixed population including the same quantity of each strategy. A roundrobin tournament is made and then the population of bad strategies is decreased whereas good strategies obtain new elements. The simulation is repeated until the population has been stabilised, i.e. the population does not change anymore. This is this way that nasty strategies, those who take the initiative of the first defection, have been discovered to be not very stable, because they are invaded by kind ones.
As we have just said in section 1.1 a lot of works have been done on the IPD and from different points of view. The common points of almost all those works are :
To avoid confusion with works quoted in the second point, let us call the Iterated Prisoner's Dilemma we have described, the Classical Iterated Prisoner's Dilemma (CIPD). We think that new works and new discoveries are possible on the CIPD and thus we look for good strategies for it and try to understand how and why they work. The CIPD, as a model of cooperation study, has not been totally understood. It is still a good model for this kind of problem because it is the simplest model and not everything is known about it.
One more reason, if needed, to make those researches, is that good strategies in the CIPD, are still good in a lot of variants of the game.
During our researches we have been led to test and verify classical results [2,4]. Thus, in order to do so, we tried to create a lot of strategies to look at their behaviors and to check if they had the qualities of titfortat. At this time we were trying to increase the efficiency of strategies by modifying their parameters little by little, and looking at the effect of the changes in roundrobin tournaments and ecological simulations, until we were satisfied by the results of a strategy. During these experiments we have created many strategies including gradual and discovered its strength.
This strategy acts as titfortat, except when it is time to forgive and
remember the past. It uses cooperation on the first move and then continues to
do so as long as the other player cooperates. Then after the first defection
of the other player, it defects one time and cooperates two times; after the
second defection of the opponent, it defects two times and cooperates two
times, ... after the n^{th} defection it reacts with n consecutive
defections and then calms down its opponent with two cooperations.
As we can see this strategy has the same qualities as those described by
Axelrod in [2] for titfortat except one: the
simplicity. Gradual has a memory of the game since the beginning of it.
We conducted some experiments with gradual and other strategies. Their results show the good performance of gradual. Other results are reported in [12].
We made two kinds of experiments with strategies: roundrobin tournaments and ecological simulation. In the following example we have used 12 classical strategies. Each game consists of 1000 moves, with the classical payoff shown in table 1. The general results do not depend on the precise value of payoff and on the number of move, see [12].
The 12 strategies are described below.
Results of roundrobin tournament and of a simple ecological simulation are presented in tables 2, 3 and figure 1.
They show that gradual clearly outperforms all other opponents and especially titfortat, in roundrobin as well as in ecological evolution. In this latter type of computation we have noticed that gradual and titfortat have the same type of evolution, with the difference of quantity in favor of gradual, which is far away in front of all other survivors when the population is stabilised.
We notice that gradual is relatively strong, compared to other strategies, when opposed to clever, or probabilistic strategies like prober or random, but the most significant point is that gradual has never, or not often, bad scores: in almost all cases, it has been able to install cooperation with its opponents.


Numerous other experiments (with for example random subsets of strategies) confirm the conclusion that gradual is more robust and obtain better score than titfortat in almost all contexts (exceptions especially constructed are obviously possible but are seldom and not easy to obtain).
A systematic study of the dynamic evolution of populations with 3 strategies confirms the conclusions of [9] that very complex phenomenon may sometimes occur in the IPD [15].
We have been surprised by those results and then with the help of ``Pour La Science'', the french edition of the ``Scientific American'', we have organized a tournament. Each reader was invited to submit his own strategy. A description of this tournament and its results can be found in [11,13,13]. The tournament did not used the CIPD, but a variant in which each player had the ability to give up in a definitive way with its opponent.
However the winner was a strategy which is a variant of gradual, adapted to the game with renunciation and with a different progressive function for retaliation (N(N+1)/2 instead of N, with N the number of opponent's previous defections). This fact makes us think that gradual has something titfortat has not and that classical interpretation of titfortat results concerning simplicity must be revisited.
With the same approach to all these results as Axelrod made on titfortat, we can say that the most three important qualities of gradual are: kindness (it does not begin to defect), reactivity (it defects when the opponent has defected) and forgiveness (it comes back to cooperation after punishment). These ones are well known to be good, but in gradual, unlike in titfortat, they are not joined by simplicity. In fact the main difference between titfortat and gradual is their view of the past, the former has a short one and the latter the biggest possible, since it has to know the full history of the current game to decide what to do on the next move. We can say that gradual is much more complex than titfortat. Let us recall an analyse of Axelrod in [2, page 110]:
The advice takes the form of four simple suggestions for how to do well in a durable Iterated Prisoner's Dilemma:
 1.
 Don't be envious,
 2.
 Don't be the first to defect,
 3.
 Reciprocate both cooperation and defection,
 4.
 Don't be too clever.
When Axelrod writes that, he states that a strategy has to be understandable, hence simple, mainly for its opponents to understand that all it wants is to establish a cooperation period. He thinks that if the strategy clearly announces how it will act, then the other player will be able to cooperate more quickly.
We have previously discussed this point in [14], for the Iterated Prisoner's Dilemma with Renunciation. We agree with Axelrod but we think that having a clear behavior is not always a good idea, for instance against complexcleverandnasty or random strategies. The results of gradual confirm us in this opinion.
Finally we think that gradual presents a natural behavior, which we can find in our daily life, which is, for instance, used by some creditors with their debtors. Let us think of the government and the taxpayers or an electricity company and their bad customers.
This behavior can however be interpreted in two ways:
Those two interpretations are two ways of looking at the game, as we have two ways of looking at our relationships with other people in real life. In the first case it tries to explain to the opponent what is the better choice for both of them, whereas in the second case it tries to protect itself. This is a kind of choice between opening or closing its relationship with the opponent, knowing none of the real made choice. It is clear that in real life this is not a simple choice to do. We think that gradual, instead of titfortat, offers this type of complexity to the player and thus that Axelrod's idea about simplicity is not generally true.
Gradual's behavior, its performance, its inspiration of the ``Pour la science'' tournament winner and its natural roots, suggest that it is a strong strategy which truly outperforms titfortat. The point that we found a strategy stronger than titfortat and that its superiority comes from a more complex behavior is enough to think that there is maybe another more complex strategy which is better than gradual. Gradual may be improved and in order to know if it can be the case, one of the solutions is to use optimization tools on it.
In order to try to improve gradual's performances, we have chosen to use a genetic algorithm to optimize gradual, so we have constructed a genotype and a fitness function evaluating the quality of strategies.
With this genetic algorithm we thought we would be able to run into the space of described strategies, looking for good ones, especially the ones better than gradual.
Here is the description of the genotype we have set up, using extensions of gradual's ideas, to describe a big family of strategies.
The extensions we have added to gradual are:
The 19 different genes of the genotype and their meanings are given now :
Strategies described in this way play as follows: after the first move, which
is begin, they defect with a probability of
.
If
they do not defect then they decide if a reactionpunishmentlull period has
to be launch, according to calcul_type, threshold and vision parameters. If such a period has to be launch, it is done with a
probability of
.
Parameters of such a period, i.e. reaction, punishment and lull length are computed according to the
number of done punishments (let it be N) in one of those two manners,
according, respectively, to reaction_evolution, punishment_evolution, lull_evolution: If the computation has to be
polynomial the value will be:
The most new parameter here is the vision one, which makes a big difference in the description of strategies to be used in genetic algorithms, as they have been done for now in [3,19,20], where strategies are limited to a small view of the past (3 or 4 moves in general).
This space is so open that it includes almost every well known strategies:
With this approach, titfortat could be seen as a degenerated strategy coming from this family and thus from gradual. It is a degenerated gradual, because some of its parameters are erased, in comparison to gradual.
Hence there is a chance that gradual could also be a degenerated strategy, coming from some X strategy, with this X better and more complex than gradual and then than titfortat.
Our idea is that there is no limitation to this affiliation between good strategies and that the space of strategies is full of complex good ones, which have not been found now.
The complexity could be such that the full strategies space might be seen as a tree, where the large space of strategies described here with our genotype is only one branch of it. Maybe on another branch there are other strategies better than titfortat, or gradual built with other ideas.
The genotype we have created defines a space of different strategies. We have used a simple Genetic Algorithm, using uniform crossover (resp. mutation) to cross over two elements of the population ( resp. to mute one of it). The algorithm chooses randomly a vector of 19 bits, one bit for each gene and then swaps the genes of the two individuals (resp. mutes the gene of the individual) when there is a 1 at the gene's position in the vector.
To compute the fitness of a strategy we have used a simple idea: to be good a strategy needs to beat, in a simple roundrobin tournament as those used by Axelrod, the best known strategies at this time, as well as some special ones like the simplest ones (Cooperate, Defect, Random, ...). It is however clear that this is not sufficient and that there may be more complex ways to appreciate the quality of strategies, using, for instance, its behavior in ecological simulation.
So we have chosen a selection of 34 strategies found in previous works, trying to have an heterogeneous selection of behaviors and we compute the fitness of a strategy as its rank in a roundrobin tournament including the quoted 34 strategies and itself.
The initial population was made by 150 randomly generated strategies.
We have obtained some results which are easy to understand as every gene has a clearly defined contribution in the behavior of the strategy.
Some genes have converged to remarkable values, which give the ability to precise some of our ideas and to infirm, or confirm, some of classical ideas about qualities a strategy needs to have to be good.
The less surprising result is that in almost all our experiments, the begin gene has converged very quickly to the Cooperate value, which means that good strategies, we were looking for, would not be too aggressive and have great chances to be kind ones.
This idea has been encouraged by the convergence of the alea gene.
Like the begin gene, the alea gene has converged very quickly to the 0 value, which in our experiments is interpreted as: never defect randomly.
Since the begin gene has converged almost at the same time to the Cooperate value, it is clear that strategies we were looking for have to be kind ones. That did not sound surprising since the idea that to be good, strategies have to be kind, is widely accepted.
Nevertheless this is a new way of confirming one of Axelrod's idea, using a different method than the one he has used to find it.
The forgive gene had also converged in almost all cases to the 0 value, which means that strategies must not forgive any of its opponent defection.
This is closely in relation with the fitness we have used, i.e. without any ecological simulation and thus push some known strategies back, like for instance the generous titfortat family , which is however included in the strategies space described by our genotype.
The fitness we use is not dedicated to the creation of strategies which have to be strong in ecological simulation. That could be an explanation of the convergence of this gene, which eliminates some good strategies in ecological simulation.
The last remarkable convergence is the blind gene convergence, which has converged to the No value, which means that strategies we were running to would not forget to count opponents defection during reactionpunishmentlull periods, which is not, for instance, the case of gradual.
This convergence clearly shows that good strategies would be a little bit more clever, thus more complex, than gradual and thus more than titfortat. This fact confirms our idea and thus goes against the generally accepted ideas about the relation between efficiency and simplicity.
The convergence of this gene is another confirmation that simplicity does not mean strength in the CIPD.
Just for the illustration, here is the description of one of the strategies we have found with the help of the genetic algorithm we have used on our genotype:
Cooperate, 0, Defection, 1, 0, No, 5, Polynomial, 1,1,3, Polynomial, 0,0,0, Polynomial, 15,8,4
This strategy beats gradual and thus titfortat, in roundrobin tournament, as well as in an ecological simulation. In the two cases it has finished first just in front of gradual, titfortat being two or three places behind, with a big gap in the score, or in the size of the stabilised population.
All those results have been found with the 34 strategies chosen for the fitness computation and where confirmed with an experiment involving 300 strategies.
As we said, the IPD is a good model in the study of cooperation, but it has not been often studied in its classical version. While a lot of works try to discuss, or to find how to improve the model itself, we choose to go deeper in the CIPD by looking for good strategies and by trying to understand what make those strategies strong, knowing that every results on the CIPD, can often be ported in other variations of the IPD.
We have found a strategy which leads us to reformulate some of the classical results about simplicity and then have two purposes:
For the former, we have used a genetic algorithm to look for strategies in a big space of strategies, which we have described with a genotype we have created. For the latter we just have to study the results of the experiments done by the genetic algorithm.
The first result we have found enables us to think that simplicity is not a good quality for a strategy, but that like in reallife, complexity may be advantageous.
We also think that this complexity can be so important that maybe there is not a classical linear evolution in the strength of strategies, but that the space of strategies can be seen as a tree with lots of branches. Then each branch is a large space of strategies, with good and bad ones and with some connections between branches, so that there is not a finite set of quality for a strategy to be good, but rather that particular combination of some qualities make the strength of a strategy.
We are now working on those two purposes by improving the fitness of the genetic algorithm, hoping that we could find stronger strategies; and by trying to create other genotypes, to explore other branches of the total space of strategies, hoping we could then find other combinations of qualities.
The former work implies that methods to measure the strength of a strategy have to be set, whereas the latter could reinforce our ideas about the natural emergence of the complexity in the CIPD.
A simulation software with many strategies is already available for Unix, Dos or Windows by web at http://www.lifl.fr/~mathieu/ipd or by anonymous ftp on the following site ftp.lifl.fr in pub/users/mathieu/soft