Week 1

Dijkstra

考虑移动成本的搜索。文明中单位经过不同地形移动的举例不同。用一个 priority queue。

优先队列里存放着各个地点。因为有可能多个路径经过同一个地点，所以如果当前loop中该地点在queue中已经有值，则只存放相对较好的值。

注：相对于 Dijkstra ， BFS 就无法判断成本，只能寻找最短路径。

1	`priority = new_cost`

Heuristic search （Greedy Best First Search）

经典的贪婪。 Dijkstra 可以寻找多个路径，但我们往往只需要一个。优先队列，但不加入cost so far 改为distance to goal （两点间距离公式）。

也不判断成本，且结果不一定最好，不适合复杂路径，但速度快很多。

1	`priority = heuristic(goal, next)`

在这里用property queue

注：

heuristic(启发式)这里采用了距离公式，每种类型的图启发式是不一样的，需要自己寻找。
有可能找不到解，甚至是一个。
没有backtrack，有可能stack。

A*

A* will tell you to move from one location to another but it won’t tell you how.

把前两个合起来。比Dijstra快且结果是正确的。

1	`priority = new_cost + heuristic(goal, next)`

good heuristic - monotoic

week 2

Chapater 2

Best response

玩家找到最好的返回值

Let si be the strategy of player $i, i = 1, . . . , k.$

$s_{i^*}$ is a best response to the collection of
opponent strategies if and only if
I no other strategy which player i can play gives a higher
payoff.

有可能有不止一个最好解（best response strategy）
有可能没有最好解，比如两个玩家比较谁说的数字最大
有限集合(Finite strategy space)一定会有最好解。
无限集合不一定。

Nash Equilibrium

A strategy profile $(s_{1^*},s_{2^*},…s_{k^*})$ with k players, 所有玩家的战略都是对他们来说最好的就是Nash均衡。

Lecture 3

Definition 2

game of perfect information: 一个抉择树每个节点都有一个条件。也就是说任意玩家在任意时间都知道自己所在的位置。反之则是imperfect information。
zero-sum: 每一个叶子上所有用户的总和payoff是0。也就是说用户不会从中得到什么。
without chance：节点没有被自然控制。反之是game with chance。

ep: 棋类是 2-person zero-sum games of perfect information without chance.
ep: Poker is a zero-sum1 game of imperfect information with chance.

Definition 3

fully specified strategy : 对一个特定的玩家，有多少种战略。把所有可能性走法加起来。

Generating all strategies for

如果玩家i第一步先走的话： Ni(t) = Ni(t1) + Ni(t2) + · · · + Ni(tn).
如果玩家i第一步后走的话： Ni(t) = Ni(t1) × Ni(t2) × · · · × Ni(tn).

week3

Nash Equilibrium

Def

Definition: A strategy profile (s1,s2,s3…sk) for a game with k
players, is a Nash equilibrium if each strategy is a best
response to all of the others.

It is not a strategy; it is a choice of strategy for all players in
the game.
If the players are playing the Nash, no player has any
incentive to change its strategy unilaterally

Mini-max approach

解释上图：

寻找Amelia的min ， 1，2，3，1
寻找Scott的max， 7，3，5，6
找Amelia的min中的max 和 Scott的max中的min 的那个点。

Dominance

Dominance can be used to eliminate some strategies

mixed strategies

A mixed strategy is a strategy for a player in which:

I plays probabilistic combination of pure strategies;
I receives a probabilistic combination of payoffs

Extensive form

一个路径上有权重比例的树

At each node where the player has a
decision, assign a probability function to each
of the possible choices.

con

Only for a special class of games do strategies which beat
all opponents always exist (two-player, zero-sum, perfect
information, no chance).
A Nash equilibrium is a collection of strategies for all
players such that each player is playing best response to
all the others.
For a zero-sum, two-player game in normal form, a
strategy pair which is maximal in its columns and minimal
in its rows is a Nash equilibrium.
Dominance can be used to reduce the number of
strategies.
Nash equilibria also exist in general-sum games.

Week4

The minmax rule

• MAX nodes are trying to maximise the payoff to player 1.
• MIN nodes are trying to minimise the payoff to player 1.

The value V(J) of a node J is

If J is a terminal node, V(J) is equal its payoff U(J).
If J is a MAX node, V(J) is the maximum value of its
children.
If J is a MIN node, V(J) is the minimum value of its
children.

suppose multi players

Can not approach an equilibria :

Week 6

Trajectory-based approaches

Q-learning

就是根据行为的后果来加强或减弱行为。如果后果是好的，则加强行为，如果后果是坏的则减弱行为。通过这种方式来获得趋利避害的能力。公式后一部分是对后果的估值。
q-training就是个建立“操作性条件反射”的过程

Q(s, a)

#34111

games 笔记1

http://example.com/2022/09/24/game-笔记/

Author

Chenxi Qu

Licensed under

modeling 笔记 Previous

games101-笔记-光栅化 Next