games 笔记1

Week 1

Dijkstra

考虑移动成本的搜索。文明中单位经过不同地形移动的举例不同。 用一个 priority queue。

优先队列里存放着各个地点。因为有可能多个路径经过同一个地点,所以如果当前loop中该地点在queue中已经有值,则只存放相对较好的值。

注: 相对于 Dijkstra , BFS 就无法判断成本,只能寻找最短路径。

1
priority = new_cost

Heuristic search (Greedy Best First Search)

经典的贪婪。 Dijkstra 可以寻找多个路径,但我们往往只需要一个。优先队列,但不加入cost so far 改为distance to goal (两点间距离公式)。

也不判断成本,且结果不一定最好,不适合复杂路径,但速度快很多。

1
priority = heuristic(goal, next)

在这里用property queue

注:

  1. heuristic(启发式)这里采用了距离公式,每种类型的图启发式是不一样的,需要自己寻找。
  2. 有可能找不到解,甚至是一个。
  3. 没有backtrack,有可能stack。

A*

A* will tell you to move from one location to another but it won’t tell you how.

把前两个合起来。比Dijstra快且结果是正确的。

1
priority = new_cost + heuristic(goal, next)

good heuristic - monotoic


week 2

Chapater 2

Best response

玩家找到最好的返回值

Let si be the strategy of player $i, i = 1, . . . , k.$

$s_{i^*}$ is a best response to the collection of
opponent strategies if and only if
I no other strategy which player i can play gives a higher
payoff.

  • 有可能有不止一个最好解(best response strategy)
  • 有可能没有最好解,比如两个玩家比较谁说的数字最大
  • 有限集合(Finite strategy space)一定会有最好解。
  • 无限集合不一定。

Nash Equilibrium

A strategy profile $(s_{1^*},s_{2^*},…s_{k^*})$ with k players, 所有玩家的战略都是对他们来说最好的就是Nash均衡。

Lecture 3

Definition 2

  1. game of perfect information: 一个抉择树每个节点都有一个条件。也就是说任意玩家在任意时间都知道自己所在的位置。反之则是imperfect information。
  2. zero-sum: 每一个叶子上所有用户的总和payoff是0。也就是说用户不会从中得到什么。
  3. without chance: 节点没有被自然控制。 反之是game with chance。

ep: 棋类是 2-person zero-sum games of perfect information without chance.
ep: Poker is a zero-sum1 game of imperfect information with chance.

Definition 3

fully specified strategy : 对一个特定的玩家,有多少种战略。把所有可能性走法加起来。

Generating all strategies for

  • 如果玩家i第一步先走的话: Ni(t) = Ni(t1) + Ni(t2) + · · · + Ni(tn).

  • 如果玩家i第一步后走的话: Ni(t) = Ni(t1) × Ni(t2) × · · · × Ni(tn).

week3

Nash Equilibrium

Def

Definition: A strategy profile (s1,s2,s3…sk) for a game with k
players, is a Nash equilibrium if each strategy is a best
response to all of the others.

  • It is not a strategy; it is a choice of strategy for all players in
    the game.
  • If the players are playing the Nash, no player has any
    incentive to change its strategy unilaterally

Mini-max approach



解释上图:

  1. 寻找Amelia的min , 1,2,3,1
  2. 寻找Scott的max, 7,3,5,6
  3. 找Amelia的min中的max 和 Scott的max中的min 的那个点。

Dominance


Dominance can be used to eliminate some strategies


mixed strategies

A mixed strategy is a strategy for a player in which:

  • I plays probabilistic combination of pure strategies;
  • I receives a probabilistic combination of payoffs

Extensive form

一个路径上有权重比例的树

At each node where the player has a
decision, assign a probability function to each
of the possible choices.

con

  1. Only for a special class of games do strategies which beat
    all opponents always exist (two-player, zero-sum, perfect
    information, no chance).
  2. A Nash equilibrium is a collection of strategies for all
    players such that each player is playing best response to
    all the others.
  3. For a zero-sum, two-player game in normal form, a
    strategy pair which is maximal in its columns and minimal
    in its rows is a Nash equilibrium.
  4. Dominance can be used to reduce the number of
    strategies.
  5. Nash equilibria also exist in general-sum games.

Week4

The minmax rule

• MAX nodes are trying to maximise the payoff to player 1.
• MIN nodes are trying to minimise the payoff to player 1.

The value V(J) of a node J is

  1. If J is a terminal node, V(J) is equal its payoff U(J).
  2. If J is a MAX node, V(J) is the maximum value of its
    children.
  3. If J is a MIN node, V(J) is the minimum value of its
    children.


suppose multi players

Can not approach an equilibria :

Week 6

Trajectory-based approaches

Q-learning

就是根据行为的后果来加强或减弱行为。如果后果是好的,则加强行为,如果后果是坏的则减弱行为。通过这种方式来获得趋利避害的能力。公式后一部分是对后果的估值。
q-training就是个建立“操作性条件反射”的过程

Q(s, a)


games 笔记1
http://example.com/2022/09/24/game-笔记/
Author
Chenxi Qu
Licensed under