StartUp / Ai Startups

Track AI startups, new venture creation, founder strategy, product direction and investment signals across the fast-moving artificial intelligence sector.

← back to ALL

This Startup Built the Infrastructure Powering Voice AI

2026-03-05T15:00:32Z

Open source

Summary

Assembly AI, founded by Dylan Fox, is a voice AI infrastructure platform that has gained significant traction since its inception in 2017. The company serves around 10,000 customers and processes nearly 2 million voice hours per day, providing essential tools for developers to create voice AI applications. Notable clients include Granola and Zoom, showcasing the platform's versatility in various sectors. Dylan's journey into AI began with self-taught programming skills and a keen interest in machine learning. His experience with the Amazon Echo highlighted the potential for improved voice recognition technology, which was lacking in the market at the time. This realization drove him to create a platform that would empower developers to build innovative voice applications. The early years of Assembly AI were challenging, marked by skepticism from investors and a lack of market interest in AI. However, the COVID-19 pandemic catalyzed a surge in voice data usage, leading to advancements in real-time voice AI applications. The company raised significant funding in 2022, allowing it to scale rapidly and enhance its offerings. Assembly AI's focus on deep subject matter expertise and close customer relationships has been pivotal in its success. The company fosters a culture of continuous feedback and hands-on product testing, enabling it to adapt quickly to market needs. This approach has allowed Assembly AI to outpace larger competitors by delivering tailored solutions.

Perspectives

Assembly AI's journey highlights the evolution of voice AI technology and the importance of customer-centric development.

Assembly AI's Growth and Innovation

Highlights significant traction with 10,000 customers and 2 million voice hours processed daily
Emphasizes the importance of deep subject matter expertise in product development
Describes the impact of COVID-19 on the acceleration of voice AI applications
Introduces Universal 3 Pro, enhancing real-time voice capabilities
Focuses on a lean operational structure to foster rapid innovation

Challenges and Market Dynamics

Acknowledges initial skepticism from investors regarding AIs viability
Notes the competitive landscape with major players like Google
Raises concerns about the sustainability of rapid growth in voice AI
Questions the effectiveness of relying solely on AI note takers for customer feedback
Considers the potential for market saturation and the need for continuous adaptation

Neutral / Shared

Recognizes the evolving landscape of voice AI technology
Mentions the importance of customer feedback in shaping product development

Metrics

customers

10,000 units

total number of customers using the platform

A large customer base indicates strong market demand.

we have about 10,000 customers

voice_hours

2 million units

daily voice hours processed

High processing volume reflects the platform's capability and usage.

we're now doing almost 2 million hours per day

voice_hours

250 million units

annual voice hours processed last year

This figure shows significant growth potential in the voice AI market.

around 250 million voice hours

first_funding_round

2022

the year Assembly AI raised its first series funding

Indicates the time taken to secure funding after founding.

You raised your series in 2022 five years after starting the company.

funding

about a hundred and sixty million dollars USD

total funding raised

This funding indicates strong investor confidence in the company's growth potential.

we raised about a hundred and sixty million dollars in the course of like I don't know three years

growth

200% year over year

growth in non real-time API usage

Such growth reflects increasing demand for voice AI solutions.

usage to our non real time APIs is still growing you know 200% year over year

growth

real-time voice agents are are like they work really well now

effectiveness of real-time voice agents

This indicates a significant improvement in technology, enhancing customer experience.

real-time voice agents are are like they work really well now

customer_experience

they can deliver a good customer experience over customer support

impact on customer service

Improved customer experience can lead to higher satisfaction and retention.

they can deliver a good customer experience over customer support

Key entities

Companies

Amazon • Assembly AI • AssemblyAI • Cisco • Google • Granola • Universal 3 Pro • Zero • Zoom

Countries / Locations

Themes

#ai_startups • #startup_ecosystem • #ambient_scribes • #assembly_ai • #assemblyai • #customer_feedback • #customer_support • #deep_subject_matter_expertise

Timeline highlights

00:00–05:00

Assembly AI is a voice AI infrastructure platform that serves around 10,000 customers and processes nearly 2 million voice hours per day. The company, founded by Dylan Fox, has notable clients including Granola and Zoom, and was one of the first AI companies funded by Y Combinator in 2017.

Assembly AI, founded by Dylan Fox, is a voice AI infrastructure platform that serves around 10,000 customers and processes nearly 2 million voice hours per day. The platform supports various use cases, including AI note takers and real-time voice agents, with notable customers like Granola and Zoom utilizing its technology
Dylan Foxs journey into AI began with a non-AI startup in college, where he taught himself programming through books on PHP, Python, and Django. This led to the development of several SaaS products, including a fundraising tool for college organizations and a feedback capture system for small businesses
Assembly AI was one of the first AI companies funded by Y Combinator in 2017, during a time when AI was still emerging. This early support helped position the company in a rapidly evolving industry

05:00–10:00

Dylan Fox's journey into programming began in college, leading him to explore machine learning and voice recognition technology. His experience with the Amazon Echo in 2015 highlighted the potential for improved voice recognition tools in a market that lacked quality options.

Dylan Fox began programming after starting a non-AI company in college, teaching himself languages like PHP and Python. His creative process in building software products led him to explore machine learning and security, ultimately joining a machine learning team at Cisco in 2015
Dylans experience with the Amazon Echo in 2015 changed his perception of voice recognition technology. The Echos effectiveness in recognizing voice commands, even in noisy environments, inspired him to further explore voice recognition technology
At the time, the market lacked accessible voice recognition tools, with existing options being poor quality or prohibitively expensive. Dylan faced challenges in obtaining developer SDKs from companies like Nuance, which required significant financial investment

10:00–15:00

Dylan Fox highlights the necessity of being passionate about the problem being solved to drive innovation in product development. He reflects on the challenges faced as a solo founder of Assembly AI, particularly during a time when interest in AI was minimal.

Dylan Fox emphasizes the importance of being obsessed with the problem being solved, as this drives the motivation to create innovative products. He believes that a lack of passion for the subject matter can lead founders to lose interest in their ventures
Reflecting on the early years of AssemblyAI, Fox notes he started the company in 2017 when interest in AI was low. As a solo founder, he faced significant challenges compared to other startups with more developed products
Fox found that building voice AI applications required a comprehensive ecosystem, including high-quality voice AI models and mobile 5G. These components were essential for developers to create effective voice AI applications
In 2021, they achieved decent performance with transcription models, but the real value was in advanced applications like sentiment analysis. Previously, developers faced barriers in training their own models for these tasks

15:00–20:00

The voice AI market was initially small and faced technological limitations, but significant advancements occurred during the COVID-19 pandemic. This led to improved models and a surge in real-time voice AI applications over the last 18 months.

In 2017, the voice AI market was small, and the technology available was inadequate, leading to doubts about the viability of pursuing this field. Despite this, the speaker found the problem engaging and had early supporters like Daniel Gross who recognized the potential of voice recognition technology
The COVID-19 pandemic significantly increased the amount of voice data generated, contributing to advancements in voice AI technology. As remote work and podcasting became more prevalent, models improved, leading to better transcription and the emergence of new NLP models for summarization and sentiment analysis
The initial real-time API faced challenges due to model limitations. It has only been in the last 18 months that real-time models have crossed a threshold in accuracy and latency, enabling the growth of real-time voice AI applications

20:00–25:00

Recent advancements in voice AI technology have led to significant product market fit for both real-time and non-real-time applications. The integration of voice AI models into robotics and consumer hardware is becoming increasingly prevalent.

Recent advancements in voice AI technology have led to significant product market fit for both real-time and non-real-time applications. New models can understand general audio, identify speaker genders, and capture background sounds, which opens up innovative use cases
As the company grew, the founder faced challenges managing a larger team and capital influx. This shift from a low-attention phase to rapid growth brought about growing pains
Raising substantial capital quickly has created pressure on the company, necessitating a strong reliance on the founders instincts. Understanding the market and refining these instincts is crucial as the company scales
Real-time voice agents have become effective and are rapidly deployed in customer support roles. Their success rate now ensures a good customer experience, making them common in service interactions
There is increasing demand for integrating voice AI models into robotics and consumer hardware. Popular robotics companies are beginning to implement these models, indicating a trend towards more voice-interactive devices

25:00–30:00

Voice AI technology is increasingly being integrated into consumer hardware, enhancing user experience and accessibility. In healthcare, ambient scribes are improving efficiency by accurately capturing doctor-patient conversations in challenging audio environments.

Voice AI technology is being integrated into consumer hardware, allowing users to interact with devices like coffee machines through voice commands instead of touchscreens. This shift enhances user experience and accessibility in everyday appliances
In the healthcare sector, companies are developing ambient scribes that capture doctor-patient conversations in challenging audio environments. These models achieve high accuracy rates, streamlining administrative tasks for healthcare professionals
Sales teams leverage ambient scribe technology to improve performance. For instance, Zeros app provides real-time advice during in-person sales interactions, significantly increasing sales representatives earnings
The focus is on building smarter voice AI models that understand context and differentiate between multiple speakers in noisy environments. This capability addresses challenges in voice recognition, such as identifying the primary speaker versus background noise
AssemblyAIs new model, Universal 3 Pro, enhances the ability to follow instructions related to audio processing. It combines reliable transcription with the flexibility to respond to user commands, positioning it between traditional speech-to-text models and multimodal language models

StartUp / Ai Startups

Related coverage

Closest startup themes

Related business and technology angles