Aircall AI Voice Agents can learn directly from your public-facing content to deliver accurate, consistent, and on-brand answers during customer calls. This is made possible through Knowledge Sources, which allow your AI Agent to reference trusted information in real time.This article explains what a knowledge base is, how it benefits your AI Agent, and the current limitations to be aware of.

What is a knowledge base?

A knowledge base is a centralized library of information about your company, such as products, services, and frequently asked questions. It helps ensure information is easy to find, easy to reference, and easy for your AI Agents to understand.

Aircall uses the content you provide, for example public webpages, to build a knowledge base that your AI Agents can rely on during customer conversations.

How knowledge sources help your AI Agents

Once your content is added as a Knowledge Source, your AI Agent can:

  • Answer common questions using accurate, brand-approved information
  • Maintain consistent messaging across calls
  • Reduce repetitive manual responses
  • Reference your content instantly during customer interactions

This ensures callers receive precise and helpful answers based directly on your own published information.

Supported content types

You can add new Knowledge Sources in the following ways:

  • Block of content: Paste any plain text you want the agent to learn from
  • Webpage: Add a single public URL
  • Website: Add a main public domain, with optional subpages
  • Existing sources: Reuse or update content you have already added
Screenshot 2025-12-16 at 09.41.45.png
Note: All content added as a Knowledge Source must be publicly available.

Current limitations

To ensure the best results, be aware of the following limitations.

Gated or authentication-required pages

Knowledge Sources cannot ingest content from:

  • Login-required pages
  • Password-protected areas
  • Internal portals or dashboards
  • Pages behind paywalls

Only public URLs are supported.

Image-only content

If important information appears only as images, such as text embedded in images, diagrams, or screenshots, it may not be readable or usable by the AI Agent.

Document uploading not yet supported

You currently cannot upload files such as:

  • PDFs
  • Word documents
  • Spreadsheets
Important: Support for document uploads is planned for a future version.

Managing FAQs and website crawling in your Knowledge Base

Your existing FAQ and newly added website content can work together seamlessly in your Knowledge Base. This article explains how FAQs are handled, how website crawling works, how content is processed, and what limits apply to your AI Voice Agent.

What happens to my existing FAQ?

You do not need to remove or modify your existing FAQ. Your current FAQ is automatically saved as [Agent Name]’s FAQ, and your AI Voice Agent continues to use it as a knowledge source. You can combine multiple types of knowledge sources, including:

  • FAQ or open-text entries
  • Individual URLs
  • Crawled websites

All knowledge sources are treated equally. There is currently no prioritization or weighting between different sources.

How website crawling works

When you add a website URL to your Knowledge Base, Aircall automatically processes:

  • The page you provide
  • Pages it links to
  • Pages those linked pages reference

This applies only if the URLs share the same prefix.

Crawl depth

We crawl:

  • The provided page
  • Up to two levels deeper
  • Only if URLs share the same prefix

Example

If you add: https://website.com/depth1/

We may also crawl:

  • https://website.com/depth1/depth2
  • https://website.com/depth1/depth2/depth3

We will not crawl unrelated sections such as:

  • https://website.com/blog
  • https://website.com/contact

This ensures that only relevant sections of your website are included.

How content is extracted and cleaned

All website content goes through multi-stage processing to ensure high-quality knowledge.

CategoryDetailsPurpose / Impact
What is removed
  • Navigation menus
  • Headers and footers
  • Cookie banners
  • “Back to top” buttons
  • Advertisements
  • Images and videos
  • Base64-encoded images
  • Scripts and malicious code
  • Formatting noise and redundant HTML
Removes non-essential and potentially unsafe elements to ensure only relevant, clean content is processed.
What is kept
  • Headings
  • Paragraphs
  • Lists
  • Structured article content
Preserves structured and meaningful content that contributes to accurate knowledge retrieval.
Why this matters
  • Improves response accuracy
  • Prevents irrelevant content from affecting answers
  • Reduces unnecessary processing
  • Enhances security
  • Helps the AI retrieve relevant information more effectively
Ensures higher-quality responses, better performance, and improved reliability of the AI Voice Agent.

Processing time expectations

Processing time depends on the size of the crawl.

  • 1 to 10 pages typically process in under one minute
  • Medium-sized sections may take 5 to 10 minutes
  • Large root-level crawls may take up to 30 minutes

If you add a top-level URL such as https://website.com/, many linked pages may be processed. You can monitor progress using the document status indicator.

Character limits explained

Voice Agent context window

Your AI Voice Agent has a total working context window of 120,000 characters. This includes:

  • Crawled website content
  • FAQ and open-text entries
  • All knowledge sources combined

If the total content exceeds 120,000 characters, automatic summarization is applied before the content is used by the Voice Agent.

Important: The 120,000-character limit is a technical limitation required to ensure system performance and reliability.

Best practices for website ingestion

TopicRecommendationDetails / Examples
Start with specific URLsAdd precise, deep-linked pages instead of root domainsInstead of https://website.com/, use a targeted page such as https://website.com/help/article-name. The deeper the URL path, the more targeted the crawl.
Expand graduallyMove up one directory level at a time if broader coverage is neededFor example, move from https://website.com/help/article-name to https://website.com/help/. Avoid adding the root URL unless you truly need content from across the entire site.
Avoid over-crawlingDo not start with root-level URLs unless necessaryRoot-level URLs can capture hundreds of pages, increase processing time, trigger summarization, and introduce irrelevant content.
Use structured knowledge pagesPrioritize well-organized, content-focused pagesBest-performing sources include help centers, documentation hubs, FAQ sections, and structured articles with clear headings.
Avoid unsuitable content typesExclude pages that are dynamic, restricted, or unstructuredAvoid login-required pages, search result pages, dynamic or form-based content, news feeds, and media-heavy pages.
Review after crawlingValidate the results once processing is completeCheck the document preview to ensure the correct pages were captured, no duplicate URLs were added, and the content is structured properly. You can refresh website content later if the source page updates.
Note: When not to crawl - consider using manual FAQ or text input instead if content changes frequently:

• such as news or real-time data;
• pages require authentication;
• the website is primarily video or image-based;
• or the content is unstructured.