Aircall Help Center | Using knowledge sources for your AI Voice Agent

Aircall AI Voice Agents can learn directly from your public-facing content to deliver accurate, consistent, and on-brand answers during customer calls. This is made possible through Knowledge Sources, which allow your AI Agent to reference trusted information in real time.This article explains what a knowledge base is, how it benefits your AI Agent, and the current limitations to be aware of.

What is a knowledge base?

A knowledge base is a centralized library of information about your company, such as products, services, and frequently asked questions. It helps ensure information is easy to find, easy to reference, and easy for your AI Agents to understand.

Aircall uses the content you provide, for example public webpages, to build a knowledge base that your AI Agents can rely on during customer conversations.

How knowledge sources help your AI Agents

Once your content is added as a Knowledge Source, your AI Agent can:

Answer common questions using accurate, brand-approved information
Maintain consistent messaging across calls
Reduce repetitive manual responses
Reference your content instantly during customer interactions

This ensures callers receive precise and helpful answers based directly on your own published information.

Supported content types

You can add new Knowledge Sources in the following ways:

Block of content: Paste any plain text you want the agent to learn from
Webpage: Add a single public URL
Website: Add a main public domain, with optional subpages
Existing sources: Reuse or update content you have already added

Note: All content added as a Knowledge Source must be publicly available.

Current limitations

To ensure the best results, be aware of the following limitations.

Gated or authentication-required pages

Knowledge Sources cannot ingest content from:

Login-required pages
Password-protected areas
Internal portals or dashboards
Pages behind paywalls

Only public URLs are supported.

Image-only content

If important information appears only as images, such as text embedded in images, diagrams, or screenshots, it may not be readable or usable by the AI Agent.

Document uploading not yet supported

You currently cannot upload files such as:

PDFs
Word documents
Spreadsheets

Important: Support for document uploads is planned for a future version.

Managing FAQs and website crawling in your Knowledge Base

Your existing FAQ and newly added website content can work together seamlessly in your Knowledge Base. This article explains how FAQs are handled, how website crawling works, how content is processed, and what limits apply to your AI Voice Agent.

What happens to my existing FAQ?

You do not need to remove or modify your existing FAQ. Your current FAQ is automatically saved as [Agent Name]’s FAQ, and your AI Voice Agent continues to use it as a knowledge source. You can combine multiple types of knowledge sources, including:

FAQ or open-text entries
Individual URLs
Crawled websites

All knowledge sources are treated equally. There is currently no prioritization or weighting between different sources.

How website crawling works

When you add a website URL to your Knowledge Base, Aircall automatically processes:

The page you provide
Pages it links to
Pages those linked pages reference

This applies only if the URLs share the same prefix.

Crawl depth

We crawl:

The provided page
Up to two levels deeper
Only if URLs share the same prefix

Example

If you add: https://website.com/depth1/

We may also crawl:

https://website.com/depth1/depth2
https://website.com/depth1/depth2/depth3

We will not crawl unrelated sections such as:

https://website.com/blog
https://website.com/contact

This ensures that only relevant sections of your website are included.

How content is extracted and cleaned

All website content goes through multi-stage processing to ensure high-quality knowledge.

Category	Details	Purpose / Impact
What is removed	Navigation menus Headers and footers Cookie banners “Back to top” buttons Advertisements Images and videos Base64-encoded images Scripts and malicious code Formatting noise and redundant HTML	Removes non-essential and potentially unsafe elements to ensure only relevant, clean content is processed.
What is kept	Headings Paragraphs Lists Structured article content	Preserves structured and meaningful content that contributes to accurate knowledge retrieval.
Why this matters	Improves response accuracy Prevents irrelevant content from affecting answers Reduces unnecessary processing Enhances security Helps the AI retrieve relevant information more effectively	Ensures higher-quality responses, better performance, and improved reliability of the AI Voice Agent.

Processing time expectations

Processing time depends on the size of the crawl.

1 to 10 pages typically process in under one minute
Medium-sized sections may take 5 to 10 minutes
Large root-level crawls may take up to 30 minutes

If you add a top-level URL such as https://website.com/, many linked pages may be processed. You can monitor progress using the document status indicator.

Character limits explained

Voice Agent context window

Your AI Voice Agent has a total working context window of 120,000 characters. This includes:

Crawled website content
FAQ and open-text entries
All knowledge sources combined

If the total content exceeds 120,000 characters, automatic summarization is applied before the content is used by the Voice Agent.

Important: The 120,000-character limit is a technical limitation required to ensure system performance and reliability.

Best practices for website ingestion

Topic	Recommendation	Details / Examples
Start with specific URLs	Add precise, deep-linked pages instead of root domains	Instead of `https://website.com/`, use a targeted page such as `https://website.com/help/article-name`. The deeper the URL path, the more targeted the crawl.
Expand gradually	Move up one directory level at a time if broader coverage is needed	For example, move from `https://website.com/help/article-name` to `https://website.com/help/`. Avoid adding the root URL unless you truly need content from across the entire site.
Avoid over-crawling	Do not start with root-level URLs unless necessary	Root-level URLs can capture hundreds of pages, increase processing time, trigger summarization, and introduce irrelevant content.
Use structured knowledge pages	Prioritize well-organized, content-focused pages	Best-performing sources include help centers, documentation hubs, FAQ sections, and structured articles with clear headings.
Avoid unsuitable content types	Exclude pages that are dynamic, restricted, or unstructured	Avoid login-required pages, search result pages, dynamic or form-based content, news feeds, and media-heavy pages.
Review after crawling	Validate the results once processing is complete	Check the document preview to ensure the correct pages were captured, no duplicate URLs were added, and the content is structured properly. You can refresh website content later if the source page updates.

Note: When not to crawl - consider using manual FAQ or text input instead if content changes frequently:

• such as news or real-time data;
• pages require authentication;
• the website is primarily video or image-based;
• or the content is unstructured.

What is a knowledge base?

Aircall uses the content you provide, for example public webpages, to build a knowledge base that your AI Agents can rely on during customer conversations.

How knowledge sources help your AI Agents

Once your content is added as a Knowledge Source, your AI Agent can:

Answer common questions using accurate, brand-approved information
Maintain consistent messaging across calls
Reduce repetitive manual responses
Reference your content instantly during customer interactions

This ensures callers receive precise and helpful answers based directly on your own published information.

Supported content types

You can add new Knowledge Sources in the following ways:

Block of content: Paste any plain text you want the agent to learn from
Webpage: Add a single public URL
Website: Add a main public domain, with optional subpages
Existing sources: Reuse or update content you have already added

Note: All content added as a Knowledge Source must be publicly available.

Current limitations

To ensure the best results, be aware of the following limitations.

Gated or authentication-required pages

Knowledge Sources cannot ingest content from:

Login-required pages
Password-protected areas
Internal portals or dashboards
Pages behind paywalls

Only public URLs are supported.

Image-only content

If important information appears only as images, such as text embedded in images, diagrams, or screenshots, it may not be readable or usable by the AI Agent.

Document uploading not yet supported

You currently cannot upload files such as:

PDFs
Word documents
Spreadsheets

Important: Support for document uploads is planned for a future version.

Managing FAQs and website crawling in your Knowledge Base

What happens to my existing FAQ?

FAQ or open-text entries
Individual URLs
Crawled websites

All knowledge sources are treated equally. There is currently no prioritization or weighting between different sources.

How website crawling works

When you add a website URL to your Knowledge Base, Aircall automatically processes:

The page you provide
Pages it links to
Pages those linked pages reference

This applies only if the URLs share the same prefix.

Crawl depth

We crawl:

The provided page
Up to two levels deeper
Only if URLs share the same prefix

Example

If you add: https://website.com/depth1/

We may also crawl:

https://website.com/depth1/depth2
https://website.com/depth1/depth2/depth3

We will not crawl unrelated sections such as:

https://website.com/blog
https://website.com/contact

This ensures that only relevant sections of your website are included.

How content is extracted and cleaned

All website content goes through multi-stage processing to ensure high-quality knowledge.

Category	Details	Purpose / Impact
What is removed	Navigation menus Headers and footers Cookie banners “Back to top” buttons Advertisements Images and videos Base64-encoded images Scripts and malicious code Formatting noise and redundant HTML	Removes non-essential and potentially unsafe elements to ensure only relevant, clean content is processed.
What is kept	Headings Paragraphs Lists Structured article content	Preserves structured and meaningful content that contributes to accurate knowledge retrieval.
Why this matters	Improves response accuracy Prevents irrelevant content from affecting answers Reduces unnecessary processing Enhances security Helps the AI retrieve relevant information more effectively	Ensures higher-quality responses, better performance, and improved reliability of the AI Voice Agent.

Processing time expectations

Processing time depends on the size of the crawl.

1 to 10 pages typically process in under one minute
Medium-sized sections may take 5 to 10 minutes
Large root-level crawls may take up to 30 minutes

If you add a top-level URL such as https://website.com/, many linked pages may be processed. You can monitor progress using the document status indicator.

Character limits explained

Voice Agent context window

Your AI Voice Agent has a total working context window of 120,000 characters. This includes:

Crawled website content
FAQ and open-text entries
All knowledge sources combined

If the total content exceeds 120,000 characters, automatic summarization is applied before the content is used by the Voice Agent.

Important: The 120,000-character limit is a technical limitation required to ensure system performance and reliability.

Best practices for website ingestion

Topic	Recommendation	Details / Examples
Start with specific URLs	Add precise, deep-linked pages instead of root domains	Instead of `https://website.com/`, use a targeted page such as `https://website.com/help/article-name`. The deeper the URL path, the more targeted the crawl.
Expand gradually	Move up one directory level at a time if broader coverage is needed	For example, move from `https://website.com/help/article-name` to `https://website.com/help/`. Avoid adding the root URL unless you truly need content from across the entire site.
Avoid over-crawling	Do not start with root-level URLs unless necessary	Root-level URLs can capture hundreds of pages, increase processing time, trigger summarization, and introduce irrelevant content.
Use structured knowledge pages	Prioritize well-organized, content-focused pages	Best-performing sources include help centers, documentation hubs, FAQ sections, and structured articles with clear headings.
Avoid unsuitable content types	Exclude pages that are dynamic, restricted, or unstructured	Avoid login-required pages, search result pages, dynamic or form-based content, news feeds, and media-heavy pages.
Review after crawling	Validate the results once processing is complete	Check the document preview to ensure the correct pages were captured, no duplicate URLs were added, and the content is structured properly. You can refresh website content later if the source page updates.

Note: When not to crawl - consider using manual FAQ or text input instead if content changes frequently:

• such as news or real-time data;
• pages require authentication;
• the website is primarily video or image-based;
• or the content is unstructured.