Complete Classes of Strategies
for the
Classical Iterated Prisoner's Dilemma

Bruno BEAUFILS - Jean-Paul DELAHAYE - Philippe MATHIEU

Laboratoire d'Informatique Fondamentale de Lille
U.R.A. 369 C.N.R.S. - Université des Sciences et Technologies de Lille
U.F.R. d'I.E.E.A. Bât. M3
59655 Villeneuve d'Ascq Cedex - FRANCE

{beaufils,delahaye,mathieu}@lifl.fr

Abstract:

The Classical Iterated Prisoner's Dilemma (CIPD) is used to study the evolution of cooperation. We show, with a genetic approach, how basic ideas could be used in order to generate automatically a great numbers of strategies. Then we show some results of ecological evolution on those strategies, with the description of the experimentations we have made. Our main purpose is to find an objective method to evaluate strategies for the CIPD. Finally we use the former results to add a new argument confirming that there is, in order to be good, an infinite gradient in the level of complexity in structure of strategies.

The Classical Iterated Prisoner's Dilemma

Introduced by Merill M. FLOOD and Melvin DRESHER in the RAND Corporation in 1952, see [3], who tried to introduce some irrationality in the game theory of John VON NEUMANN and Oskar MORGENSTERN [8], the Classical Iterated Prisoner's Dilemma (CIPD) is based on this simple story quoted by Albert TUCKER for instance in [5, pages 117-118]:

Two men, charged with a joint violation of law, are held separately by the police. Each is told that
(1) if one confesses and the other does not, the former will be given a reward...and the latter will be fined
(2) if both confess, each will be fined...
At the same time, each has a good reason to believe that
(3) if neither confesses, both will go clear.

It seems clearly obvious that the most reasonable choice is to betray its partner. More formally the CIPD is represented, using game theory, as a two-person non-zero-sum non-cooperative and simultaneous game where each player has to choose between two moves:

The payoff of each player depends on the moves played by the two people. Table 1 provides the score in each case.


 
Table 1: CIPD payoff matrix. Row player score are given first.
  Cooperate Defect
Cooperate

R=3, R=3

Reward

for mutual cooperation





S=0, T=5

Sucker's payoff

Temptation to defect




Defect

T=5, S=0

Temptation to defect

Sucker's payoff





P=1, P=1

Punishment

for mutual defection





To have a dilemma, the following inequation has to be respected:

S<P<R<T (1)

The dilemma stands on the fact that individual interests differ from collective ones.

As the one shot game is solved by the NASH equilibrium, which is to always betray its partner, the model is extended: in the iterated version players meet each other more than one time, without knowing if it is the last time or not. The payoff of a player is then simply the sum of each of its meeting's payoff. To favour the cooperation, and also to keep this difference between individual and collective interest the following inequation has to be respected:

S+T<2R (2)

The classical choice of values for the four parameters is given in Table 1.

Of course, with such an iterated game, what the opponent did on the past moves may influence the way a player will choose their next one. It is then possible to define more strategies than in the one shot version.

Let us define some simples ones:

all_c
corresponds to the strategy of the one shot game applied without modifications in the CIPD: it always plays
all_d
corresponds to the strategy of the one shot game applied without modifications in the CIPD: it always plays
tit_for_tat
cooperates on the first move and then plays its opponent's previous move.
per_cd
plays periodically then , let us note cd
soft_majo
cooperates, then plays opponent's most used move, if equal then cooperates
prober
plays dcc, then it defects in all other moves if opponent has cooperated in move 2 and 3, and plays as tit_for_tat in other cases
spiteful
cooperates until first opponent's defection, then always defects.

Round Robin Tournament and Ecological Evolution

Now the main problem is to evaluate strategies for the CIPD, in order to compare them. Two kinds of experimentation could be used for this purpose. The basic one, is to make a pairwise round-robin tournament between some different strategies. The payoff to each one would be the total sum of each iterated game. A ranking could then be computed according to the score of each strategy. The higher a strategy is ranked, the better it is. Good strategies in round-robin tournament are well adapted to their environments, but often are not very robust to environment modifications.

The second kind of experimentation is a kind of imitation of the natural selection process, and is closely related to population dynamics. Let us consider a population of N players, each one adopting a particular strategy. At the beginning we consider that each strategy is equally represented in the population. Then a tournament is made, and good strategies are favoured, whereas bad ones are disadvantaged, by a proportional population redistribution. This redistribution process, also called a generation, is repeated until an eventual population stabilisation, i.e. no changes between two generations. A good strategy is then a strategy which stays alive in the population for the longest possible time, and in the biggest possible proportion.

An example of evolution between all strategies described in the previous section is shown in Figure 1. The x-axis represents the generation number, whereas the y-axis represents the size of the population for each strategy. For simplicity we make our computation with a fixed global population size.


  
Figure 1: Example of ecological evolution.
\begin{figure}
\begin{center}
\leavevmode
\epsfig{file=evolution.ps, width=0.7\columnwidth} \end{center}\end{figure}

Classical results in this field, which were presented by AXELROD in [1], show that to be good a strategy must:

The well-known tit_for_tat strategy, which satisfies all those criteria, has, since [1], been considered by a lot of people using the dilemma -but not by game theorist- to be one of the best strategies not only for cooperation but also for evolution of cooperation. We think that the simplicity criterium is not so good, and have thus introduced a strategy called gradual which illustrates our point of view.

Gradual cooperates on the first move, then after the first opponent's defection defects one time, and cooperates two times, after the second opponent's defection defects two times and cooperates two times, ..., after the nth opponent's defection defects n times, let us call it the punishment period, and cooperates two times, let us name it the lull time. Gradual had better results than tit_for_tat in almost all our experiments, see [2].

It is easy to imagine strategies derived from gradual, for instance in modifying the function of punishment, which is the identity in the gradual case (n defections $\Rightarrow$ punishment of length n).

Our main research direction to establish our ideas about a strategy's complexity is to

Complete Classes of Strategies

To make our research process easier, we have to find a descriptive method to define strategies, which is less risky than an exhaustive method, which are never objective, nor complete. One of the ways we have chosen is to use a genetic approach. We describe a structure (a genotype), which can be decoded in a particular behaviour (a phenotype). Then a way to have a strategy, is simply to fill this structure. A manner to have a lot of strategies is to consider all the ways of filling this genotype, i.e.  to consider all possible individuals based on a given genotype. Let us call the set of all strategies described by a particular genotype, the complete class of strategies issued by this genotype.

We have described three genotypes based on the same simple idea, in order to remain objective. This idea is to consider the observable length of the game's history. Such idea of strategies has already been studied in [4,7]. Those three genotypes are:

memory:
Each strategy can only see moves of its past, and moves of its opponent's past. The strategy is then started by $\max(\ipdM,\ipdO)$ moves predefined in the genotype. All other moves are coded in the genotype, according to the visible past configuration. The genotype length is then $\max(\ipdM,\ipdO)+2^{(\ipdM+\ipdO)}$.
binary_memory:
The same as the previous one, but the reply to an opponent's move depends not only on the visible past but also on the fact that past opponent's defection are more numerous, or not, than past cooperation. The genotype length is then $\max(\ipdM,\ipdO)+2^{(\ipdM+\ipdO+1)}$.
memory_automata:
Represents classical two-state automata, which starts in state 0. Each strategy can only see $\ipdM$ moves of its past, and $\ipdO$ moves of its opponent past. The strategy is then started by $\max(\ipdM,\ipdO)$ moves predefined in the genotype. All other moves are coded in the genotype, according to the visible past configuration and the automata's current state. State transitions are also coded in the genotype. The genotype length is then $\max(\ipdM,\ipdO)+2^{(\ipdM+\ipdO+2)}$. Strategies of such a kind, with $\ipdM=\ipdO=3$, have already been studied in [6].

Despite the apparently simplicity of the strategies described by those genotypes, many classical ones are included in those classes, tit_for_tat for instance.

Let us describe the behavior of one of those strategies, in order to understand how the genotype works. We consider a strategy of the complete class of memory with $\ipdM=1$ and $\ipdO=1$.

One of the individuals of this class plays on the first move then:
- if on the previous move I played   and the opponent played   then I play  
- if on the previous move I played   and the opponent played   then I play  
- if on the previous move I played   and the opponent played   then I play  
- if on the previous move I played   and the opponent played   then I play  

The genotype of the strategy is then:
         

This strategy is one way of coding spiteful.

Some Experiments

We have conducted some experiments using those complete classes. The main purpose was to evaluate other strategies in big ecological evolution, but also to try to find some new good strategies. In all our experiments we have computed one ecological evolution between all strategies of a class, and then another with gradual added to the population. Thus we have been able to partially confirm our ideas on the strength of this strategy, as shown in Table 2. In all the results of Table 2, gradual is better evaluated than tit_for_tat. We, however, do not show results of the evaluation of tit_for_tat here since it is included in some of the classes explored, thus its evaluation is only partial.

w

    1   1024 1     e results of

Tn Tn E="SECTION0004020000000000EM>one cde stra new gooootypes, many claults ofatey valuateONG> scribed of differentlored, Ek of the strawot here sia Classical reoscribed sin the previous section is s+2^R> Table>simpn the previous section is s+2^122, see tation with a fixed global population size. +2^R> IV ALIGN="CENTER"><317AME="fig:evolution">   > Tn (tegy of the complete class of memory with $\ipdM=1$ and c It i)+2^{e evaluation of ess riamM>Class s> Tn: Example of ecological 5pdM, MATH: $\85="62" HEIGH20><318AME="fig:evolution">   > Tn (tegy of the complete class of memory with $\ipdM=1$ and c It i)+2^{e evaluation of ess riamM>Class s> Tn: Example of ecological 5p3M, MATH: $\85="62" HEIGH22>$\ipdM=1$ and of n thA HREF="comaluave a igy, iON>ASe our reseavaricribed 0" S, lckehA HREF=_n4 which is he identity in the gradual case (n defections ighe way aluaexact not. ent, wn thypes, A HREF='s varicribeutomatdtion o e evaluation of estegies described in the previous section is s+2^122, see tation SR the biggest possibleologi,\ipdO)+2^{(\(CTER"> choice of ribed in the previous section is s,\iR> 4see tation with a fixed global population size. ,\iR> IV ALIGN="CENTER"><31NAME="fig:evolution">   > Tn (tegy of the complete class of memory with $\ipdM=1$ and c It i)+2^{e evaluation of ess riamM>Class s> Tn Example of ecological 5pdM, MATH: $\85="62" HEIGH25>\begin{figure,\iR>centr}
\leav48vmo...
...ig\higu8.begin{figure,\iR>gcentr}
\leav48vmode
\epsfig{file=evolution.ps, width=0.7\columnwidth} \end{center}\end{fiteg2 is one way of codingefects.
</DL>
<PORDAA></TM2^{(\2^{(\ipdMD>
Tn a partiN0004020000000DueA rankinnce islemma (NT></T2^{(\ipdM NTER20.7 naturicehe sa e classicald of heweDjr enassical="complete.nstance.

Le many claut. ent, wstrategy of the complete class of memory with $\ipdM=1$ and IV ALIGN="CENTER"><3   > Tn (tegy of the complete class of memory with $\ipdM=1$ and c It i)+2^{e evaluation of ess riamM>Class s> Tn: Example of ecological 5p3M, MATH: $\85="62" HEIGH26>\begin{figure^{(R>centr}
\leav48vmo...
...ig\higu8.begin{figure^{(R>gcentr}
\leav48vmode
\epsfig{file=evolution.ps, width=0.7\columnwidth} \end{center}\end{fiA ewe 
  wstaroscribed in the previous
section is s^{(R>
  HREFfor_tatsttand how the gen
NT></T2^{(\ipdM NTERwstrategy of the complete class of memory with
bien indiviicity of

The genotype of the strategy is then:

Table 2: Some results of the evaluation of gradual in complete classes. Class size is the number of described strategies, whereas evaluation is the rank of the strategy at the end of an evolution of the complete class.
  20.7cm 20.7cm 21.3cmclass size evaluation
        gradual tit_for_tat spiteful
43cmmemory 0 1 8 1 1 1
  0 2 64 5 2 21
  1 1 32 2 3 1
  1 2 1024 >y4xploredn.>ALIGN="CENTER">3 43cmmemory 0 1 1 3 1
  0 3 43cmmemory 0
  : Example of ecological 217M, MATH: $\77="62" HEIGH27>\begin{figure^{(R>-atst.beg}\e cfile=evolution.ps, width=0.7\columnwidth} \end{center}\e 
 ALIGN=&nV>
na lot easy town thypformer w theadrrent statend then anos strategy is one way of codin5 them.

Compl="CENTof thr, we hluate stve to find a dethe main prob. ficalawotbig ies are feach player ls-delahaye-ma to finaormrgSTplete clasd a det;s aboTnent di

Despite thstratif" ALTsthan an exhaer all the ways of f stratebiaD>coop

tcompare them. Tng theses ION>Table 2:Table 2: We evaluation of e to defelete.html#aas su>have a gRONich TRONG>bicity of etic aCENufficsoner"automatvolutie.

on. We show. ENG>Table 2 peratbetterto defee.

reply to a>

43cmmemory 0 0 0         teratfar as0f opponen ;s aboif I amassical twod of dER">  teratfar as1f opponen ;s aboif I amassical tw1d of dER">  teratfar as1f opponen ;s aboif I amassical tw1d of dER">  teratfar as1f opponen ;s order to bethea{(\ipdM etray its part ones. scheit_forribed in the previous section is s^{(R>atst.html#softion with a fixed global population size. ^{(R>atst.hIV ALIGN="CENTER"><313-ME="fig:evolution">   > Tn (tegy of the complete class of memory with $\ipdM=1$ and 2pecify="CENTER">&nb's lrated gavaluatioGN="CEND>The sta;
phenotyto introduce some s p < an infinit s, atif" ible 2:gootif" iationoluton. We thi: of stonenompletluatirouaCEnvironmenNG>Table ,erhave a gER=ve evaluatONG>Table 2r, we orderta's onenomplnce it ormrgSrtion of ell this stiscriptive pA rae ouropulmizable 2r, we TR> < a strate co seeiviiuse-ma tchosen in in rte s to trmrgSTsePTION>the level oftion Awo-per our cooftwGood stramstrathis structure"img16.gavailDER="he mUnix, DOSDorhW ouOR">ly to aWorareWe hrdeb at > Tn orh calnonymTR> ftpe been ab. W >> Tn: nd{fiteg2 is one way of coe p">Bieeioin phytiN00040200
">.  1MALL> - PhN000>. R. Aria, h. folloTER"n a5's Dilemma (Cn are morehe wa. follBn. WeBook 2MALL> - PhN000>. B. Bto:{beau J. Dils,dela in iP. MA HRE. follw$"> "complll strategies,ined...], aWorkshope been aSynen sicond $\ma-per our c(nikivmpllSystemsg that compl 202-209, Cambridgla MA, USA wit96ccordingMITnPgess/Bteghe deBook<: nd{fD> 3MALL> - PhN000>. M.ilemmon i. follSriments and Melvin DRUSA wJufinRESH: nd{fD> 4MALL> - PhN000>. K. Lbeen st. follEied in [ [U.R.a (Cn an infig that compl 295-312,> g16mpl itsMA, USA wit92. Addisthe-Wesley: nd{fD> 5MALL> - PhN000>. W. Pfor insta. folloTERrn:theory">8], :M>irravr cNENSTER, GoducTally theording toPuzzlt_for_tatBombhe wa. follOxhe dekos Drakos Pgess, Oxhe d, UK wit93: nd{fD> 6MALL> - PhN000>. A. Sen s, H. Gdied , DemDTRONurla in iJ. PHREF=. follT:theory">n considleedso ei. foll369 Cur exReible DSSE-TR-96-2Nikos Drakos (niSseahhampt a, D witt> U., D ionry, as Systemsond maoftwGoo eEneu:m4, 7MALL> - PhN000>. Tord. Seutionaond mR. H. C">4 [ 8MALL> - PhN000>. J. vr cNENSTER [<. folloTER"n ome iccoBt for ihe wa. follrn:>U.t ackos Drakos Pgess, rn:>U.t aatNJ, USA wit44.hi-glastion withP> f
PRISONtegijEM>i./ADDRESS>