SC3000 Artificial Intelligence Lab

NTU 2026 Sem 2.

Lab 1
- Task 1: Using different search algorithms to find a path from start to end in a grid world.
- Task 2: Implementing Monte Carlo and Q-Learning Reinforcement Learning to solve a maze.
Lab 2:
- Task 1: Encoding a scenario in First Order Logic and proving unethical behaviour using Prolog.
- Task 2: Modelling the British Royal Family succession rules as a Prolog KB.

Information

Contributor:
- Hung: Lab 1 Task 1, Lab 2 Task 2
- Allen: Lab 1 Task 2, Lab 2 Task 1
Important notes:
- As per assignment manual, Lab 1 Task 2 uses vertical x axis and horizontal y axis.

Lab architectures and designs information:

Lab 1 Detailed Description
Lab 2 Detailed Description

Notable Developments

Monte Carlo Reinforcement Learning

Author: Allen
Task: Lab 1 Task 2.2

Standard Method

The standard definition of the Q-value is:

$$Q^\pi(s,a) = \mathbb{E}_\pi\left[G_t \mid S_t = s,\ A_t = a\right]$$

My initial proposed approach was: in each episode, calculate 10 paths from start to end (ε-soft) and compute the average for each node. However, after researching the standard solution, the more common approach is to calculate Q-values using accumulated past paths (Sutton & Barto, 2018, pp. 100–101):

$$Q^{\pi_k}(s,a) = \frac{1}{k}\sum_{i=1}^{k}G_i$$

where $G_i$ is the return generated using policy $\pi_i$.

This does not strictly follow the definition above, since the mathematical definition specifies that Q should only be averaged over paths generated from the current policy, rather than accumulating past paths. However, it remains justifiable — each episode introduces only small policy improvements, so the cumulation still converges as the incremental changes are small enough.

This solution is stored in ./Lab1/part2/task2-2, reaching 90.9% similarity with the optimal solution.

Issues with the Standard Method

The standard cumulative method introduces two problems:

Memory: as episodes increase, storage grows linearly — $O(n)$.
Staleness: although policy changes are small per episode, accumulating all past returns can introduce errors over many episodes.

Improved Method — Sliding-Window Returns (Task 2.2-v2)

To address both issues, I further improved the solution to average only over the most recent 1000 returns:

$$ Q^{\pi_k} = \begin{cases} \dfrac{1}{k}\sum_{i=1}^{k}G_i & \text{if } k \leq 1000 \\ \dfrac{1}{1000}\sum_{i=k-1000}^{k}G_i & \text{if } k > 1000 \end{cases} $$

A similar algorithmic idea can be found in other agent-related papers (Mnih et al., 2013).

This solution is stored in ./Lab1/part2/task2-2-v2. The algorithm also reaches 90.9% similarity with the optimal solution, but computed in a much shorter time — the time to average and space to store the data becomes $O(1)$.

References

Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction (2nd ed.). MIT Press. http://incompleteideas.net/book/the-book-2nd.html

Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M. (2013). Playing Atari with deep reinforcement learning. arXiv. https://arxiv.org/abs/1312.5602

Name		Name	Last commit message	Last commit date
Latest commit History 136 Commits
Lab1		Lab1
Lab2		Lab2
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SC3000 Artificial Intelligence Lab

Information

Lab architectures and designs information:

Notable Developments

Monte Carlo Reinforcement Learning

Standard Method

Issues with the Standard Method

Improved Method — Sliding-Window Returns (Task 2.2-v2)

References

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SC3000 Artificial Intelligence Lab

Information

Lab architectures and designs information:

Notable Developments

Monte Carlo Reinforcement Learning

Standard Method

Issues with the Standard Method

Improved Method — Sliding-Window Returns (Task 2.2-v2)

References

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages