New Technology / Ai Development

Track AI development, model progress, product releases, infrastructure shifts and strategic technology signals across the artificial intelligence sector.

← back to ALL

AI Fails at 96% of Jobs (New Study)

2026-02-13T14:56:19Z

Open source

Topic

AI Performance in Job Market

Key insights

AI performs worse than humans 96.25% of the time according to a new study
The study compared AI performance on real jobs completed by humans
The method used for comparison was called the Remote Labor Index (RLI)
AI was tested on 240 jobs, each paying $630 on average
The best AI, Claude Opus 4.5, had a 3.75% success rate
Gemini had a 1.25% success rate, the lowest among the tested AIs

Perspectives

Analysis of AI's performance in job tasks reveals significant shortcomings.

AI Underperforms Compared to Humans

Claims AI performs worse than humans 96.25% of the time
Highlights AIs failure to complete tasks at or better than human levels
Warns that AI produces corrupt or unusable files
Notes AI frequently submits incomplete work
Points out quality issues in AI-generated outputs
Identifies inconsistencies in AI-generated work

AI's Limited Successes

Argues AI excels in specific tasks like report writing and simple code generation
Proposes AI shows proficiency in creative tasks such as audio and image work
Highlights potential for competent video generation in the near future

Neutral / Shared

Questions the current benchmarks for AI performance
Notes the need for human oversight in AI applications

Metrics

success_rate

3.75%

success rate of the best AI model tested

This low success rate indicates that AI is far from being a reliable substitute for human labor.

the best AI was Claude Opus 4.5 with a 3.75% success rate

failure_rate

96.25%

overall failure rate of AI compared to humans

A high failure rate suggests that AI is not yet ready for widespread application in professional settings.

a whopping 96.25% of the time

jobs_tested

240 units

number of jobs AI was tested on

Testing on a substantial number of jobs provides a more reliable assessment of AI capabilities.

AI models were tested on 240 jobs

average_payment_per_job

$630 USD

average payment for each job tested

The financial stakes involved highlight the importance of quality in AI outputs.

each paying $630 on average

lowest_success_rate

1.25%

success rate of the lowest performing AI model

This indicates that some AI models are significantly less effective than others.

Gemini was the loser with a 1.25% success rate

financial_returns

majority

CEOs' perception of financial returns from AI

Indicates widespread skepticism about AI's economic benefits.

A PWC report found that the majority of CEOs see no financial returns from AI.

ai_malfunctions

100

Reports of AI malfunctions received by the FDA

Raises concerns about the safety and reliability of AI in critical applications.

the FDA has received 100 reports of AI malfunctions, botched surgeries and misidentified body parts.

marketing_spend

$400,000 to half a million dollars USD

amount paid to individual content creators for promoting AI models

High marketing costs may indicate a lack of confidence in the product's inherent value.

$400,000 to half a million dollars each to promote their AI models.

Key entities

Companies

Anthropic • Google • Microsoft • PWC

Countries / Locations

Themes

#ai_development • #ai_marketing_spend • #ai_performance • #human_oversight • #job_market • #perceived_vs_actual • #remote_labor_index

Timeline highlights

00:00–05:00

A study reveals that AI performs worse than humans 96.25% of the time across various job tasks. The research utilized the Remote Labor Index to compare AI outputs against human work in real freelance jobs.

AI performs worse than humans 96.25% of the time according to a new study
The study compared AI performance on real jobs completed by humans
The method used for comparison was called the Remote Labor Index (RLI)
AI was tested on 240 jobs, each paying $630 on average
The best AI, Claude Opus 4.5, had a 3.75% success rate
Gemini had a 1.25% success rate, the lowest among the tested AIs

05:00–10:00

AI demonstrates proficiency in specific tasks such as report writing and simple code generation, but struggles significantly in general work performance. Human oversight remains essential for AI applications in jobs requiring language, audio, and data retrieval skills.

AI is good at report writing and generating simple code for interactive data visualization
AI systems perform poorly on RLI benchmarks despite saturating existing benchmarks
Human oversight is still needed for AI to impact jobs with language requirements, audio, simple advertising, or data retrieval
A PWC report found that most CEOs see no financial returns from AI
Gartner predicts that half of the companies that fired workers for AI will rehire them
Microsoft reported that 30% of their code was written by AI, leading to significant software issues

10:00–15:00

Companies like Anthropic, Google, and Microsoft are investing heavily in promoting their AI models, paying individual content creators between $400,000 and $500,000. There is a significant disconnect between the perceived intelligence of AI and its actual capabilities, with concerns about the limitations of current AI architectures.

Companies like Anthropic, Google and Microsoft have paid individual content creators $400,000 to half a million dollars each to promote their AI models
If the current generation of AI was as revolutionary as advertised, they wouldnt need to spend so much money to convince us
Theres a disconnect between the perception of AI intelligence and its actual capabilities
Manipulation of language by machines leads to the illusion of intelligence
There have been generations of AI scientists since the 1950s claiming that new technology would lead to human-level intelligence
Jan Lee Kun believes the current AI architecture is reaching its peak

New Technology / Ai Development

Related coverage

Adjacent technology themes

Commercialization and strategic context