StartUp / Ai Startups

Track AI startups, new venture creation, founder strategy, product direction and investment signals across the fast-moving artificial intelligence sector.
This Startup Built the Infrastructure Powering Voice AI
This Startup Built the Infrastructure Powering Voice AI
2026-03-05T15:00:32Z
Summary
Assembly AI, founded by Dylan Fox, is a voice AI infrastructure platform that has gained significant traction since its inception in 2017. The company serves around 10,000 customers and processes nearly 2 million voice hours per day, providing essential tools for developers to create voice AI applications. Notable clients include Granola and Zoom, showcasing the platform's versatility in various sectors. Dylan's journey into AI began with self-taught programming skills and a keen interest in machine learning. His experience with the Amazon Echo highlighted the potential for improved voice recognition technology, which was lacking in the market at the time. This realization drove him to create a platform that would empower developers to build innovative voice applications. The early years of Assembly AI were challenging, marked by skepticism from investors and a lack of market interest in AI. However, the COVID-19 pandemic catalyzed a surge in voice data usage, leading to advancements in real-time voice AI applications. The company raised significant funding in 2022, allowing it to scale rapidly and enhance its offerings. Assembly AI's focus on deep subject matter expertise and close customer relationships has been pivotal in its success. The company fosters a culture of continuous feedback and hands-on product testing, enabling it to adapt quickly to market needs. This approach has allowed Assembly AI to outpace larger competitors by delivering tailored solutions.
Perspectives
Assembly AI's journey highlights the evolution of voice AI technology and the importance of customer-centric development.
Assembly AI's Growth and Innovation
  • Highlights significant traction with 10,000 customers and 2 million voice hours processed daily
  • Emphasizes the importance of deep subject matter expertise in product development
  • Describes the impact of COVID-19 on the acceleration of voice AI applications
  • Introduces Universal 3 Pro, enhancing real-time voice capabilities
  • Focuses on a lean operational structure to foster rapid innovation
Challenges and Market Dynamics
  • Acknowledges initial skepticism from investors regarding AIs viability
  • Notes the competitive landscape with major players like Google
  • Raises concerns about the sustainability of rapid growth in voice AI
  • Questions the effectiveness of relying solely on AI note takers for customer feedback
  • Considers the potential for market saturation and the need for continuous adaptation
Neutral / Shared
  • Recognizes the evolving landscape of voice AI technology
  • Mentions the importance of customer feedback in shaping product development
Metrics
customers
10,000 units
total number of customers using the platform
A large customer base indicates strong market demand.
we have about 10,000 customers
voice_hours
2 million units
daily voice hours processed
High processing volume reflects the platform's capability and usage.
we're now doing almost 2 million hours per day
voice_hours
250 million units
annual voice hours processed last year
This figure shows significant growth potential in the voice AI market.
around 250 million voice hours
first_funding_round
2022
the year Assembly AI raised its first series funding
Indicates the time taken to secure funding after founding.
You raised your series in 2022 five years after starting the company.
funding
about a hundred and sixty million dollars USD
total funding raised
This funding indicates strong investor confidence in the company's growth potential.
we raised about a hundred and sixty million dollars in the course of like I don't know three years
growth
200% year over year
growth in non real-time API usage
Such growth reflects increasing demand for voice AI solutions.
usage to our non real time APIs is still growing you know 200% year over year
growth
real-time voice agents are are like they work really well now
effectiveness of real-time voice agents
This indicates a significant improvement in technology, enhancing customer experience.
real-time voice agents are are like they work really well now
customer_experience
they can deliver a good customer experience over customer support
impact on customer service
Improved customer experience can lead to higher satisfaction and retention.
they can deliver a good customer experience over customer support
Key entities
Companies
Amazon • Assembly AI • AssemblyAI • Cisco • Google • Granola • Universal 3 Pro • Zero • Zoom
Countries / Locations
ST
Themes
#ai_startups • #startup_ecosystem • #ambient_scribes • #assembly_ai • #assemblyai • #customer_feedback • #customer_support • #deep_subject_matter_expertise
Timeline highlights
00:00–05:00
Assembly AI is a voice AI infrastructure platform that serves around 10,000 customers and processes nearly 2 million voice hours per day. The company, founded by Dylan Fox, has notable clients including Granola and Zoom, and was one of the first AI companies funded by Y Combinator in 2017.
  • Assembly AI, founded by Dylan Fox, is a voice AI infrastructure platform that serves around 10,000 customers and processes nearly 2 million voice hours per day. The platform supports various use cases, including AI note takers and real-time voice agents, with notable customers like Granola and Zoom utilizing its technology
  • Dylan Foxs journey into AI began with a non-AI startup in college, where he taught himself programming through books on PHP, Python, and Django. This led to the development of several SaaS products, including a fundraising tool for college organizations and a feedback capture system for small businesses
  • Assembly AI was one of the first AI companies funded by Y Combinator in 2017, during a time when AI was still emerging. This early support helped position the company in a rapidly evolving industry
05:00–10:00
Dylan Fox's journey into programming began in college, leading him to explore machine learning and voice recognition technology. His experience with the Amazon Echo in 2015 highlighted the potential for improved voice recognition tools in a market that lacked quality options.
  • Dylan Fox began programming after starting a non-AI company in college, teaching himself languages like PHP and Python. His creative process in building software products led him to explore machine learning and security, ultimately joining a machine learning team at Cisco in 2015
  • Dylans experience with the Amazon Echo in 2015 changed his perception of voice recognition technology. The Echos effectiveness in recognizing voice commands, even in noisy environments, inspired him to further explore voice recognition technology
  • At the time, the market lacked accessible voice recognition tools, with existing options being poor quality or prohibitively expensive. Dylan faced challenges in obtaining developer SDKs from companies like Nuance, which required significant financial investment
10:00–15:00
Dylan Fox highlights the necessity of being passionate about the problem being solved to drive innovation in product development. He reflects on the challenges faced as a solo founder of Assembly AI, particularly during a time when interest in AI was minimal.
  • Dylan Fox emphasizes the importance of being obsessed with the problem being solved, as this drives the motivation to create innovative products. He believes that a lack of passion for the subject matter can lead founders to lose interest in their ventures
  • Reflecting on the early years of AssemblyAI, Fox notes he started the company in 2017 when interest in AI was low. As a solo founder, he faced significant challenges compared to other startups with more developed products
  • Fox found that building voice AI applications required a comprehensive ecosystem, including high-quality voice AI models and mobile 5G. These components were essential for developers to create effective voice AI applications
  • In 2021, they achieved decent performance with transcription models, but the real value was in advanced applications like sentiment analysis. Previously, developers faced barriers in training their own models for these tasks
15:00–20:00
The voice AI market was initially small and faced technological limitations, but significant advancements occurred during the COVID-19 pandemic. This led to improved models and a surge in real-time voice AI applications over the last 18 months.
  • In 2017, the voice AI market was small, and the technology available was inadequate, leading to doubts about the viability of pursuing this field. Despite this, the speaker found the problem engaging and had early supporters like Daniel Gross who recognized the potential of voice recognition technology
  • The COVID-19 pandemic significantly increased the amount of voice data generated, contributing to advancements in voice AI technology. As remote work and podcasting became more prevalent, models improved, leading to better transcription and the emergence of new NLP models for summarization and sentiment analysis
  • The initial real-time API faced challenges due to model limitations. It has only been in the last 18 months that real-time models have crossed a threshold in accuracy and latency, enabling the growth of real-time voice AI applications
20:00–25:00
Recent advancements in voice AI technology have led to significant product market fit for both real-time and non-real-time applications. The integration of voice AI models into robotics and consumer hardware is becoming increasingly prevalent.
  • Recent advancements in voice AI technology have led to significant product market fit for both real-time and non-real-time applications. New models can understand general audio, identify speaker genders, and capture background sounds, which opens up innovative use cases
  • As the company grew, the founder faced challenges managing a larger team and capital influx. This shift from a low-attention phase to rapid growth brought about growing pains
  • Raising substantial capital quickly has created pressure on the company, necessitating a strong reliance on the founders instincts. Understanding the market and refining these instincts is crucial as the company scales
  • Real-time voice agents have become effective and are rapidly deployed in customer support roles. Their success rate now ensures a good customer experience, making them common in service interactions
  • There is increasing demand for integrating voice AI models into robotics and consumer hardware. Popular robotics companies are beginning to implement these models, indicating a trend towards more voice-interactive devices
25:00–30:00
Voice AI technology is increasingly being integrated into consumer hardware, enhancing user experience and accessibility. In healthcare, ambient scribes are improving efficiency by accurately capturing doctor-patient conversations in challenging audio environments.
  • Voice AI technology is being integrated into consumer hardware, allowing users to interact with devices like coffee machines through voice commands instead of touchscreens. This shift enhances user experience and accessibility in everyday appliances
  • In the healthcare sector, companies are developing ambient scribes that capture doctor-patient conversations in challenging audio environments. These models achieve high accuracy rates, streamlining administrative tasks for healthcare professionals
  • Sales teams leverage ambient scribe technology to improve performance. For instance, Zeros app provides real-time advice during in-person sales interactions, significantly increasing sales representatives earnings
  • The focus is on building smarter voice AI models that understand context and differentiate between multiple speakers in noisy environments. This capability addresses challenges in voice recognition, such as identifying the primary speaker versus background noise
  • AssemblyAIs new model, Universal 3 Pro, enhances the ability to follow instructions related to audio processing. It combines reliable transcription with the flexibility to respond to user commands, positioning it between traditional speech-to-text models and multimodal language models