AI Models Show Promise but Fall Short in Freelance Coding Tasks
A recent study comparing the performance of four large language models (LLMs) on freelance coding jobs found that while AI models can handle many real-world coding tasks, they still can’t match human effectiveness. Researchers from PeopleTec, an Alabama-based engineering consultancy, evaluated the performance of Claude 3.5 Haiku, GPT-4o-mini, Qwen 2.5, and Mistral on a dataset of 1,115 programming and data analysis challenges derived from Freelancer.com jobs.
Key Findings
- The top-performing model, Claude 3.5 Haiku, successfully completed 78.7% of tasks, earning a theoretical $1.52 million out of a possible $1.6 million.
- GPT-4o-mini followed closely, solving 77.3% of tasks.
- Open-source models Qwen 2.5 and Mistral 7B performed less impressively, completing 68.5% and 42.5% of tasks, respectively.
- Human software engineers were estimated to solve over 95% of the challenges.
Implications for Freelance Coders
While AI models are not yet ready to replace human freelance coders, they are already being used to assist with coding tasks. David Noever, chief scientist at PeopleTec, noted that people are using AI to generate job requirements, which are then answered and scored by other AI models. This trend suggests that the role of AI in freelance coding will continue to grow.
Challenges and Limitations
The study also highlighted the limitations of current AI models. When tested on OpenAI’s SWE-Lancer benchmark, the models performed significantly worse, with Claude 3.5 Sonnet resolving only 26.2% of IC SWE issues, and most of its solutions being incorrect. The researchers observed that open-source models tend to break at around 30 billion parameters, which is at the limit of what consumer-grade GPUs can handle.
Future Outlook
While AI is not yet capable of replacing human freelance coders, the gap is narrowing. Noever predicts that fully automated pipelines could emerge within months. As AI technology continues to advance, it’s likely that we’ll see significant changes in how freelance coding tasks are performed and managed.