Standardization of Transformer-Based Attention Network Execution for Pattern-Level Optimizations in Graphs.
This project aims to address the fragmentation in the implementation of inference services for transformer-based models by proposing a standard for graph representation. The standard ensures optimal utilization of hardware accelerators (GPUs and FPGAs) and introduces a GGML-based application tailored for models such as LLAMA 2, LLAMA 3, ViT, and Whisper.
-
Graph Representation Analysis
Analyze the graph representation in GGML-based applications to identify common patterns that optimize the execution of transformer-based networks on high-performance hardware. This includes studying graph construction and management in models such as LLAMA 2, LLAMA 3, ViT, and Whisper. -
Standard Design for Graph Execution
Design a standard for graph representation and execution, establishing operations, hierarchies, and structures compatible with parallel computing architectures (HPC). The standard ensures seamless integration with embedded systems equipped with GPUs and FPGAs. -
GGML-Based Application Implementation
Implement a GGML-based application to process the execution graphs of LLAMA 2 and ViT models following the designed standard, ensuring efficient execution on specialized hardware. -
Execution Graph Optimization
Optimize the LLAMA 2 execution graph using the layer fusion technique. The goal is to maximize parallelism and leverage the full computational capabilities of accelerated hardware. -
Performance Evaluation
Evaluate the impact of the developed standard and application through rigorous performance testing. Analyze key metrics such as scalability and execution time reduction to validate efficiency.
LLM_Graph_Standardization/
│
├── src/ # Source code for graph processing and optimization
│ ├── ggml_integration/ # GGML-based application modules
│ ├── graph_standard/ # Standard definitions for graph representation
│ └── hardware_support/ # FPGA and GPU integration code
│
├── tests/ # Performance and scalability tests
│
├── docs/ # Documentation and design specifications
│
└── README.md # Project overview and objectives (this file)