Skip to content

Latest commit

 

History

History
44 lines (31 loc) · 2.45 KB

File metadata and controls

44 lines (31 loc) · 2.45 KB

LLM_Graph_Standardization

📚 Project Overview

Standardization of Transformer-Based Attention Network Execution for Pattern-Level Optimizations in Graphs.

This project aims to address the fragmentation in the implementation of inference services for transformer-based models by proposing a standard for graph representation. The standard ensures optimal utilization of hardware accelerators (GPUs and FPGAs) and introduces a GGML-based application tailored for models such as LLAMA 2, LLAMA 3, ViT, and Whisper.


🎯 Objectives

  1. Graph Representation Analysis
    Analyze the graph representation in GGML-based applications to identify common patterns that optimize the execution of transformer-based networks on high-performance hardware. This includes studying graph construction and management in models such as LLAMA 2, LLAMA 3, ViT, and Whisper.

  2. Standard Design for Graph Execution
    Design a standard for graph representation and execution, establishing operations, hierarchies, and structures compatible with parallel computing architectures (HPC). The standard ensures seamless integration with embedded systems equipped with GPUs and FPGAs.

  3. GGML-Based Application Implementation
    Implement a GGML-based application to process the execution graphs of LLAMA 2 and ViT models following the designed standard, ensuring efficient execution on specialized hardware.

  4. Execution Graph Optimization
    Optimize the LLAMA 2 execution graph using the layer fusion technique. The goal is to maximize parallelism and leverage the full computational capabilities of accelerated hardware.

  5. Performance Evaluation
    Evaluate the impact of the developed standard and application through rigorous performance testing. Analyze key metrics such as scalability and execution time reduction to validate efficiency.


🏗️ Project Structure

LLM_Graph_Standardization/
│
├── src/                    # Source code for graph processing and optimization
│   ├── ggml_integration/   # GGML-based application modules
│   ├── graph_standard/     # Standard definitions for graph representation
│   └── hardware_support/   # FPGA and GPU integration code
│
├── tests/                  # Performance and scalability tests
│
├── docs/                   # Documentation and design specifications
│
└── README.md               # Project overview and objectives (this file)