Why llama.cpp runs substantially faster

I am thinking about creating DeBERTa version of this project. Initially I thought to use it as a backbone, because it's easier to modify than llama.cpp, but performance is really important for my case. It was mentioned in the readme that llama.cpp realization is substantially faster, I am a beginner of ggml and llama.cpp and I don't understand why. Can someone explain it?