this is a summary of the different tests we did using different AIs. The main purpose of this summary is to show our results and opinions regarding the result.
-
Task Arrival: 2
-
Development:
- Requirement collection: 1
- Planning and Design: 3 (it was not able to generate svg files and tried to describe them instead)
- Implementation: 3 (Sometimes weird interpretation of the task, it is seldom to receive code that you don't need to change a bit for it to work)
- Testing: 1-2
In total:
Chat-GPT-3.5 showed good results. However in comparison to version 4 it is clear that it lacks the ability to reason. It is easily distracted by the prompt into doing things that are not helpful. It was very fast though.
-
Task Arrival: 1
-
Development:
- Requirement collection: 2
- Planning and Design: 1
- Implementation: 1
- Testing: 2
In total:
Chat-GPT-4 showed absolutely amazing results. It was a bit slower than 3.5 but the reasoning was impressive.
-
Task Arrival: 1
-
Development:
- Requirement collection: 1
- Planning and Design: 4
- Implementation: 5
- Testing: 5
In total:
HuggingChat failed totally in the technical area. It was also slow and went out of context many times while generating input.
-
Task Arrival: 3
-
Development:
- Requirement collection: 2
- Planning and Design: 3
- Implementation: 5
- Testing: 3
In total:
WizardLM failed also totally in the technical area.
AI Models that can be run with llama (CPU) and CUDA (GPU) require massive resources in order to run accordingly. Did not have enough resources
AI Models that can be run with llama (CPU) and CUDA (GPU) require massive resources in order to run accordingly. Did not have enough resources
We started it directly on the machine as well as through docker. In any case it only did weird stuff and did not achieve anything.