Skip to content

Latest commit

 

History

History
72 lines (45 loc) · 2.01 KB

File metadata and controls

72 lines (45 loc) · 2.01 KB

AI-Testground

this is a summary of the different tests we did using different AIs. The main purpose of this summary is to show our results and opinions regarding the result.

ChatGPT-3.5

  • Task Arrival: 2

  • Development:

    • Requirement collection: 1
    • Planning and Design: 3 (it was not able to generate svg files and tried to describe them instead)
    • Implementation: 3 (Sometimes weird interpretation of the task, it is seldom to receive code that you don't need to change a bit for it to work)
    • Testing: 1-2

In total:

Chat-GPT-3.5 showed good results. However in comparison to version 4 it is clear that it lacks the ability to reason. It is easily distracted by the prompt into doing things that are not helpful. It was very fast though.

ChatGPT-4

  • Task Arrival: 1

  • Development:

    • Requirement collection: 2
    • Planning and Design: 1
    • Implementation: 1
    • Testing: 2

In total:

Chat-GPT-4 showed absolutely amazing results. It was a bit slower than 3.5 but the reasoning was impressive.

HuggingChat-OpenAssistant_oasst-sft-6-llama-30b

  • Task Arrival: 1

  • Development:

    • Requirement collection: 1
    • Planning and Design: 4
    • Implementation: 5
    • Testing: 5

In total:

HuggingChat failed totally in the technical area. It was also slow and went out of context many times while generating input.

WizardLM-7B

  • Task Arrival: 3

  • Development:

    • Requirement collection: 2
    • Planning and Design: 3
    • Implementation: 5
    • Testing: 3

In total:

WizardLM failed also totally in the technical area.

ggml-vic7b-uncensored-q5_1

AI Models that can be run with llama (CPU) and CUDA (GPU) require massive resources in order to run accordingly. Did not have enough resources

gpt4-x-alpaca-13b-native-4bit-128g

AI Models that can be run with llama (CPU) and CUDA (GPU) require massive resources in order to run accordingly. Did not have enough resources

AutoGPT

We started it directly on the machine as well as through docker. In any case it only did weird stuff and did not achieve anything.