games 笔记1
Week 1
Dijkstra
考虑移动成本的搜索。文明中单位经过不同地形移动的举例不同。 用一个 priority queue。
优先队列里存放着各个地点。因为有可能多个路径经过同一个地点,所以如果当前loop中该地点在queue中已经有值,则只存放相对较好的值。
注: 相对于 Dijkstra , BFS 就无法判断成本,只能寻找最短路径。
1 |
|
Heuristic search (Greedy Best First Search)
经典的贪婪。 Dijkstra 可以寻找多个路径,但我们往往只需要一个。优先队列,但不加入cost so far 改为distance to goal (两点间距离公式)。
也不判断成本,且结果不一定最好,不适合复杂路径,但速度快很多。
1 |
|
在这里用property queue
注:
- heuristic(启发式)这里采用了距离公式,每种类型的图启发式是不一样的,需要自己寻找。
- 有可能找不到解,甚至是一个。
- 没有backtrack,有可能stack。
A*
A* will tell you to move from one location to another but it won’t tell you how.
把前两个合起来。比Dijstra快且结果是正确的。
1 |
|
good heuristic - monotoic
week 2
Chapater 2
Best response
玩家找到最好的返回值
Let si be the strategy of player $i, i = 1, . . . , k.$
$s_{i^*}$ is a best response to the collection of
opponent strategies if and only if
I no other strategy which player i can play gives a higher
payoff.
- 有可能有不止一个最好解(best response strategy)
- 有可能没有最好解,比如两个玩家比较谁说的数字最大
- 有限集合(Finite strategy space)一定会有最好解。
- 无限集合不一定。
Nash Equilibrium
A strategy profile $(s_{1^*},s_{2^*},…s_{k^*})$ with k players, 所有玩家的战略都是对他们来说最好的就是Nash均衡。
Lecture 3
Definition 2
- game of perfect information: 一个抉择树每个节点都有一个条件。也就是说任意玩家在任意时间都知道自己所在的位置。反之则是imperfect information。
- zero-sum: 每一个叶子上所有用户的总和payoff是0。也就是说用户不会从中得到什么。
- without chance: 节点没有被自然控制。 反之是game with chance。
ep: 棋类是 2-person zero-sum games of perfect information without chance.
ep: Poker is a zero-sum1 game of imperfect information with chance.
Definition 3
fully specified strategy : 对一个特定的玩家,有多少种战略。把所有可能性走法加起来。
Generating all strategies for
如果玩家i第一步先走的话: Ni(t) = Ni(t1) + Ni(t2) + · · · + Ni(tn).
如果玩家i第一步后走的话: Ni(t) = Ni(t1) × Ni(t2) × · · · × Ni(tn).
week3
Nash Equilibrium
Def
Definition: A strategy profile (s1,s2,s3…sk) for a game with k
players, is a Nash equilibrium if each strategy is a best
response to all of the others.
- It is not a strategy; it is a choice of strategy for all players in
the game. - If the players are playing the Nash, no player has any
incentive to change its strategy unilaterally
Mini-max approach
解释上图:
- 寻找Amelia的min , 1,2,3,1
- 寻找Scott的max, 7,3,5,6
- 找Amelia的min中的max 和 Scott的max中的min 的那个点。
Dominance
Dominance can be used to eliminate some strategies
mixed strategies
A mixed strategy is a strategy for a player in which:
- I plays probabilistic combination of pure strategies;
- I receives a probabilistic combination of payoffs
Extensive form
一个路径上有权重比例的树
At each node where the player has a
decision, assign a probability function to each
of the possible choices.
con
- Only for a special class of games do strategies which beat
all opponents always exist (two-player, zero-sum, perfect
information, no chance). - A Nash equilibrium is a collection of strategies for all
players such that each player is playing best response to
all the others. - For a zero-sum, two-player game in normal form, a
strategy pair which is maximal in its columns and minimal
in its rows is a Nash equilibrium. - Dominance can be used to reduce the number of
strategies. - Nash equilibria also exist in general-sum games.
Week4
The minmax rule
• MAX nodes are trying to maximise the payoff to player 1.
• MIN nodes are trying to minimise the payoff to player 1.
The value V(J) of a node J is
- If J is a terminal node, V(J) is equal its payoff U(J).
- If J is a MAX node, V(J) is the maximum value of its
children. - If J is a MIN node, V(J) is the minimum value of its
children.
suppose multi players
Can not approach an equilibria :
Week 6
Trajectory-based approaches
Q-learning
就是根据行为的后果来加强或减弱行为。如果后果是好的,则加强行为,如果后果是坏的则减弱行为。通过这种方式来获得趋利避害的能力。公式后一部分是对后果的估值。
q-training就是个建立“操作性条件反射”的过程
Q(s, a)