Add Documentation for AI Compilers and Kernel Optimization

Introduce a section explaining **AI compilers** used for optimizing neural network execution on specialized hardware.

**Suggested Topics:**

* Overview of AI compilers: **TVM**, **XLA**, **TensorRT**, **MLIR**, **Triton**
* Graph-level vs kernel-level optimizations
* Use cases: operator fusion, layout transformation, quantization
* Example: write and benchmark a custom kernel in Triton or TVM