New Technology / Ai Development
Track AI development, model progress, product releases, infrastructure shifts and strategic technology signals across the artificial intelligence sector.
AI Fails at 96% of Jobs (New Study)
Topic
AI Performance in Job Market
Key insights
- AI performs worse than humans 96.25% of the time according to a new study
- The study compared AI performance on real jobs completed by humans
- The method used for comparison was called the Remote Labor Index (RLI)
- AI was tested on 240 jobs, each paying $630 on average
- The best AI, Claude Opus 4.5, had a 3.75% success rate
- Gemini had a 1.25% success rate, the lowest among the tested AIs
Perspectives
Analysis of AI's performance in job tasks reveals significant shortcomings.
AI Underperforms Compared to Humans
- Claims AI performs worse than humans 96.25% of the time
- Highlights AIs failure to complete tasks at or better than human levels
- Warns that AI produces corrupt or unusable files
- Notes AI frequently submits incomplete work
- Points out quality issues in AI-generated outputs
- Identifies inconsistencies in AI-generated work
AI's Limited Successes
- Argues AI excels in specific tasks like report writing and simple code generation
- Proposes AI shows proficiency in creative tasks such as audio and image work
- Highlights potential for competent video generation in the near future
Neutral / Shared
- Questions the current benchmarks for AI performance
- Notes the need for human oversight in AI applications
Metrics
success_rate
3.75%
success rate of the best AI model tested
This low success rate indicates that AI is far from being a reliable substitute for human labor.
the best AI was Claude Opus 4.5 with a 3.75% success rate
failure_rate
96.25%
overall failure rate of AI compared to humans
A high failure rate suggests that AI is not yet ready for widespread application in professional settings.
a whopping 96.25% of the time
jobs_tested
240 units
number of jobs AI was tested on
Testing on a substantial number of jobs provides a more reliable assessment of AI capabilities.
AI models were tested on 240 jobs
average_payment_per_job
$630 USD
average payment for each job tested
The financial stakes involved highlight the importance of quality in AI outputs.
each paying $630 on average
lowest_success_rate
1.25%
success rate of the lowest performing AI model
This indicates that some AI models are significantly less effective than others.
Gemini was the loser with a 1.25% success rate
financial_returns
majority
CEOs' perception of financial returns from AI
Indicates widespread skepticism about AI's economic benefits.
A PWC report found that the majority of CEOs see no financial returns from AI.
ai_malfunctions
100
Reports of AI malfunctions received by the FDA
Raises concerns about the safety and reliability of AI in critical applications.
the FDA has received 100 reports of AI malfunctions, botched surgeries and misidentified body parts.
marketing_spend
$400,000 to half a million dollars USD
amount paid to individual content creators for promoting AI models
High marketing costs may indicate a lack of confidence in the product's inherent value.
$400,000 to half a million dollars each to promote their AI models.
Key entities
Timeline highlights
00:00–05:00
A study reveals that AI performs worse than humans 96.25% of the time across various job tasks. The research utilized the Remote Labor Index to compare AI outputs against human work in real freelance jobs.
- AI performs worse than humans 96.25% of the time according to a new study
- The study compared AI performance on real jobs completed by humans
- The method used for comparison was called the Remote Labor Index (RLI)
- AI was tested on 240 jobs, each paying $630 on average
- The best AI, Claude Opus 4.5, had a 3.75% success rate
- Gemini had a 1.25% success rate, the lowest among the tested AIs
05:00–10:00
AI demonstrates proficiency in specific tasks such as report writing and simple code generation, but struggles significantly in general work performance. Human oversight remains essential for AI applications in jobs requiring language, audio, and data retrieval skills.
- AI is good at report writing and generating simple code for interactive data visualization
- AI systems perform poorly on RLI benchmarks despite saturating existing benchmarks
- Human oversight is still needed for AI to impact jobs with language requirements, audio, simple advertising, or data retrieval
- A PWC report found that most CEOs see no financial returns from AI
- Gartner predicts that half of the companies that fired workers for AI will rehire them
- Microsoft reported that 30% of their code was written by AI, leading to significant software issues
10:00–15:00
Companies like Anthropic, Google, and Microsoft are investing heavily in promoting their AI models, paying individual content creators between $400,000 and $500,000. There is a significant disconnect between the perceived intelligence of AI and its actual capabilities, with concerns about the limitations of current AI architectures.
- Companies like Anthropic, Google and Microsoft have paid individual content creators $400,000 to half a million dollars each to promote their AI models
- If the current generation of AI was as revolutionary as advertised, they wouldnt need to spend so much money to convince us
- Theres a disconnect between the perception of AI intelligence and its actual capabilities
- Manipulation of language by machines leads to the illusion of intelligence
- There have been generations of AI scientists since the 1950s claiming that new technology would lead to human-level intelligence
- Jan Lee Kun believes the current AI architecture is reaching its peak