Aircall AI Voice Agents can learn directly from your public-facing content to deliver accurate, consistent, and on-brand answers during customer calls. This is made possible through Knowledge Sources, which allow your AI Agent to reference trusted information in real time.This article explains what a knowledge base is, how it benefits your AI Agent, and the current limitations to be aware of.
What is a knowledge base?
A knowledge base is a centralized library of information about your company, such as products, services, and frequently asked questions. It helps ensure information is easy to find, easy to reference, and easy for your AI Agents to understand.
Aircall uses the content you provide, for example public webpages, to build a knowledge base that your AI Agents can rely on during customer conversations.
How knowledge sources help your AI Agents
Once your content is added as a Knowledge Source, your AI Agent can:
- Answer common questions using accurate, brand-approved information
- Maintain consistent messaging across calls
- Reduce repetitive manual responses
- Reference your content instantly during customer interactions
This ensures callers receive precise and helpful answers based directly on your own published information.
Supported content types
You can add new Knowledge Sources in the following ways:
- Block of content: Paste any plain text you want the agent to learn from
- Webpage: Add a single public URL
- Website: Add a main public domain, with optional subpages
- Existing sources: Reuse or update content you have already added
Note: All content added as a Knowledge Source must be publicly available.
Current limitations
To ensure the best results, be aware of the following limitations.
Gated or authentication-required pages
Knowledge Sources cannot ingest content from:
- Login-required pages
- Password-protected areas
- Internal portals or dashboards
- Pages behind paywalls
Only public URLs are supported.
Image-only content
If important information appears only as images, such as text embedded in images, diagrams, or screenshots, it may not be readable or usable by the AI Agent.
Document uploading not yet supported
You currently cannot upload files such as:
- PDFs
- Word documents
- Spreadsheets
Important: Support for document uploads is planned for a future version.
Managing FAQs and website crawling in your Knowledge Base
Your existing FAQ and newly added website content can work together seamlessly in your Knowledge Base. This article explains how FAQs are handled, how website crawling works, how content is processed, and what limits apply to your AI Voice Agent.
What happens to my existing FAQ?
You do not need to remove or modify your existing FAQ. Your current FAQ is automatically saved as [Agent Name]’s FAQ, and your AI Voice Agent continues to use it as a knowledge source. You can combine multiple types of knowledge sources, including:
- FAQ or open-text entries
- Individual URLs
- Crawled websites
All knowledge sources are treated equally. There is currently no prioritization or weighting between different sources.
How website crawling works
When you add a website URL to your Knowledge Base, Aircall automatically processes:
- The page you provide
- Pages it links to
- Pages those linked pages reference
This applies only if the URLs share the same prefix.
Crawl depth
We crawl:
- The provided page
- Up to two levels deeper
- Only if URLs share the same prefix
Example
If you add: https://website.com/depth1/
We may also crawl:
https://website.com/depth1/depth2https://website.com/depth1/depth2/depth3
We will not crawl unrelated sections such as:
https://website.com/bloghttps://website.com/contact
This ensures that only relevant sections of your website are included.
How content is extracted and cleaned
All website content goes through multi-stage processing to ensure high-quality knowledge.
| Category | Details | Purpose / Impact |
|---|---|---|
| What is removed |
| Removes non-essential and potentially unsafe elements to ensure only relevant, clean content is processed. |
| What is kept |
| Preserves structured and meaningful content that contributes to accurate knowledge retrieval. |
| Why this matters |
| Ensures higher-quality responses, better performance, and improved reliability of the AI Voice Agent. |
Processing time expectations
Processing time depends on the size of the crawl.
- 1 to 10 pages typically process in under one minute
- Medium-sized sections may take 5 to 10 minutes
- Large root-level crawls may take up to 30 minutes
If you add a top-level URL such as https://website.com/, many linked pages may be processed. You can monitor progress using the document status indicator.
Character limits explained
Voice Agent context window
Your AI Voice Agent has a total working context window of 120,000 characters. This includes:
- Crawled website content
- FAQ and open-text entries
- All knowledge sources combined
If the total content exceeds 120,000 characters, automatic summarization is applied before the content is used by the Voice Agent.
Important: The 120,000-character limit is a technical limitation required to ensure system performance and reliability.
Best practices for website ingestion
| Topic | Recommendation | Details / Examples |
|---|---|---|
| Start with specific URLs | Add precise, deep-linked pages instead of root domains | Instead of https://website.com/, use a targeted page such as https://website.com/help/article-name. The deeper the URL path, the more targeted the crawl. |
| Expand gradually | Move up one directory level at a time if broader coverage is needed | For example, move from https://website.com/help/article-name to https://website.com/help/. Avoid adding the root URL unless you truly need content from across the entire site. |
| Avoid over-crawling | Do not start with root-level URLs unless necessary | Root-level URLs can capture hundreds of pages, increase processing time, trigger summarization, and introduce irrelevant content. |
| Use structured knowledge pages | Prioritize well-organized, content-focused pages | Best-performing sources include help centers, documentation hubs, FAQ sections, and structured articles with clear headings. |
| Avoid unsuitable content types | Exclude pages that are dynamic, restricted, or unstructured | Avoid login-required pages, search result pages, dynamic or form-based content, news feeds, and media-heavy pages. |
| Review after crawling | Validate the results once processing is complete | Check the document preview to ensure the correct pages were captured, no duplicate URLs were added, and the content is structured properly. You can refresh website content later if the source page updates. |
Note: When not to crawl - consider using manual FAQ or text input instead if content changes frequently:
• such as news or real-time data;
• pages require authentication;
• the website is primarily video or image-based;
• or the content is unstructured.