Aircall Help Center | Using knowledge sources for your AI Messaging Agent

AI Messaging Agents can learn directly from your public-facing content to deliver accurate, consistent, and on-brand answers during customer conversations. This is made possible through Knowledge Sources, which allow your AI Messaging Agent to reference trusted information in real time. This article explains what a Knowledge Source is, how it benefits your AI Messaging Agent, and the current limitations to be aware of.

What is a knowledge source?

A knowledge source is a centralised library of information about your company, such as products, services, and frequently asked questions. It helps ensure information is easy to find, easy to reference, and easy for your AI Messaging Agent to understand.

Aircall uses the content you provide, for example public webpages, to build a knowledge source that your AI Messaging Agent can rely on during customer conversations.

How knowledge sources help your AI Messaging Agent

Once your content is added as a Knowledge Source, your AI Messaging Agent can:

Answer common questions using accurate, brand-approved information
Maintain consistent messaging across conversations
Reduce repetitive manual responses
Reference your content instantly during customer interactions

This ensures customers receive precise and helpful answers based directly on your own published information.

Supported content types

You can add new Knowledge Sources in the following ways:

Block of content: Paste any plain text you want the agent to learn from.
Webpage: Add a single public URL.
Website: Add a main public domain, with optional subpages.
Existing sources: Reuse or update content you have already added for your AI Voice Agent.

Note: All content added as a Knowledge Source must be publicly available. Sources can be shared across your AI Voice Agent and AI Messaging Agent. If a rule or piece of information applies to one channel only, create a dedicated source for it rather than adding it to a shared one.

Current limitations

To ensure the best results, be aware of the following limitations.

Gated or authentication-required pages

Knowledge Sources cannot ingest content from:

Login-required pages
Password-protected areas
Internal portals or dashboards
Pages behind paywalls

Only public URLs are supported.

Image-only content

If important information appears only as images, such as text embedded in images, diagrams, or screenshots, it may not be readable or usable by the AI Messaging Agent.

Document uploading not yet supported

You currently cannot upload files such as PDFs, Word documents, or spreadsheets.

Note: Support for document uploads is planned for a future version.

Managing website crawling in your knowledge sources

Newly added website content can work together seamlessly in your knowledge sources. This section explains how website crawling works, how content is processed, and what limits apply.

How website crawling works

When you add a website URL to your knowledge sources, Aircall automatically processes the page you provide, the pages it links to, and the pages those linked pages reference. This applies only if the URLs share the same prefix.

Crawl depth

Crawling covers the provided page and up to two levels deeper, provided URLs share the same prefix.

Example

If you add https://website.com/depth1/, the crawler may also process:

https://website.com/depth1/depth2
https://website.com/depth1/depth2/depth3

It will not crawl unrelated sections such as:

https://website.com/blog
https://website.com/contact

This ensures only relevant sections of your website are included.

Review and select pages from your website URL

After adding a website URL, you can review the sitemap generated for that source and see which pages are associated with it. From there, you can select or deselect pages to control exactly what is included in your knowledge sources.

As you add pages, you can monitor the character limit indicator to see how much of the available limit your knowledge source is using.

How content is extracted and cleaned

All website content goes through multi-stage processing to ensure high-quality knowledge.

Category	Details	Purpose
What is removed	Navigation menus, headers and footers, cookie banners, advertisements, images and videos, scripts and malicious code, formatting noise and redundant HTML	Removes non-essential and potentially unsafe elements so only relevant, clean content is processed.
What is kept	Headings, paragraphs, lists, structured article content	Preserves structured and meaningful content that contributes to accurate knowledge retrieval.
Why this matters	Improves response accuracy, prevents irrelevant content from affecting answers, reduces unnecessary processing, enhances security	Ensures higher-quality responses and improved reliability of the AI Messaging Agent.

Processing time expectations

Processing time depends on the size of the crawl:

1 to 10 pages typically process in under one minute.
Medium-sized sections may take 5 to 10 minutes.
Large root-level crawls may take up to 30 minutes.

You can monitor progress using the document status indicator.

Best practices for website ingestion

Topic	Recommendation	Details
Start with specific URLs	Add precise, deep-linked pages instead of root domains.	Instead of `https://website.com/`, use a targeted page such as `https://website.com/help/article-name`. The deeper the URL path, the more targeted the crawl.
Expand gradually	Move up one directory level at a time if broader coverage is needed.	Move from `https://website.com/help/article-name` to `https://website.com/help/`. Avoid adding the root URL unless you need content from across the entire site.
Avoid over-crawling	Do not start with root-level URLs unless necessary.	Root-level URLs can capture hundreds of pages, increase processing time, trigger summarisation, and introduce irrelevant content.
Use structured knowledge pages	Prioritise well-organised, content-focused pages.	Best-performing sources include help centres, documentation hubs, FAQ sections, and structured articles with clear headings.
Avoid unsuitable content types	Exclude pages that are dynamic, restricted, or unstructured.	Avoid login-required pages, search result pages, dynamic or form-based content, news feeds, and media-heavy pages.
Review after crawling	Validate results once processing is complete.	Check the document preview to confirm the correct pages were captured, no duplicate URLs were added, and content is structured properly. You can refresh website content later if the source page updates.

Tip: Consider using manual FAQ or text input instead of website crawling when content changes frequently (such as news or real-time data), pages require authentication, the website is primarily video or image-based, or the content is unstructured.

What is a knowledge source?

Aircall uses the content you provide, for example public webpages, to build a knowledge source that your AI Messaging Agent can rely on during customer conversations.

How knowledge sources help your AI Messaging Agent

Once your content is added as a Knowledge Source, your AI Messaging Agent can:

Answer common questions using accurate, brand-approved information
Maintain consistent messaging across conversations
Reduce repetitive manual responses
Reference your content instantly during customer interactions

This ensures customers receive precise and helpful answers based directly on your own published information.

Supported content types

You can add new Knowledge Sources in the following ways:

Block of content: Paste any plain text you want the agent to learn from.
Webpage: Add a single public URL.
Website: Add a main public domain, with optional subpages.
Existing sources: Reuse or update content you have already added for your AI Voice Agent.

Note: All content added as a Knowledge Source must be publicly available. Sources can be shared across your AI Voice Agent and AI Messaging Agent. If a rule or piece of information applies to one channel only, create a dedicated source for it rather than adding it to a shared one.

Current limitations

To ensure the best results, be aware of the following limitations.

Gated or authentication-required pages

Knowledge Sources cannot ingest content from:

Login-required pages
Password-protected areas
Internal portals or dashboards
Pages behind paywalls

Only public URLs are supported.

Image-only content

If important information appears only as images, such as text embedded in images, diagrams, or screenshots, it may not be readable or usable by the AI Messaging Agent.

Document uploading not yet supported

You currently cannot upload files such as PDFs, Word documents, or spreadsheets.

Note: Support for document uploads is planned for a future version.

Managing website crawling in your knowledge sources

Newly added website content can work together seamlessly in your knowledge sources. This section explains how website crawling works, how content is processed, and what limits apply.

How website crawling works

Crawl depth

Crawling covers the provided page and up to two levels deeper, provided URLs share the same prefix.

Example

If you add https://website.com/depth1/, the crawler may also process:

https://website.com/depth1/depth2
https://website.com/depth1/depth2/depth3

It will not crawl unrelated sections such as:

https://website.com/blog
https://website.com/contact

This ensures only relevant sections of your website are included.

Review and select pages from your website URL

As you add pages, you can monitor the character limit indicator to see how much of the available limit your knowledge source is using.

How content is extracted and cleaned

All website content goes through multi-stage processing to ensure high-quality knowledge.

Category	Details	Purpose
What is removed	Navigation menus, headers and footers, cookie banners, advertisements, images and videos, scripts and malicious code, formatting noise and redundant HTML	Removes non-essential and potentially unsafe elements so only relevant, clean content is processed.
What is kept	Headings, paragraphs, lists, structured article content	Preserves structured and meaningful content that contributes to accurate knowledge retrieval.
Why this matters	Improves response accuracy, prevents irrelevant content from affecting answers, reduces unnecessary processing, enhances security	Ensures higher-quality responses and improved reliability of the AI Messaging Agent.

Processing time expectations

Processing time depends on the size of the crawl:

1 to 10 pages typically process in under one minute.
Medium-sized sections may take 5 to 10 minutes.
Large root-level crawls may take up to 30 minutes.

You can monitor progress using the document status indicator.

Best practices for website ingestion

Topic	Recommendation	Details
Start with specific URLs	Add precise, deep-linked pages instead of root domains.	Instead of `https://website.com/`, use a targeted page such as `https://website.com/help/article-name`. The deeper the URL path, the more targeted the crawl.
Expand gradually	Move up one directory level at a time if broader coverage is needed.	Move from `https://website.com/help/article-name` to `https://website.com/help/`. Avoid adding the root URL unless you need content from across the entire site.
Avoid over-crawling	Do not start with root-level URLs unless necessary.	Root-level URLs can capture hundreds of pages, increase processing time, trigger summarisation, and introduce irrelevant content.
Use structured knowledge pages	Prioritise well-organised, content-focused pages.	Best-performing sources include help centres, documentation hubs, FAQ sections, and structured articles with clear headings.
Avoid unsuitable content types	Exclude pages that are dynamic, restricted, or unstructured.	Avoid login-required pages, search result pages, dynamic or form-based content, news feeds, and media-heavy pages.
Review after crawling	Validate results once processing is complete.	Check the document preview to confirm the correct pages were captured, no duplicate URLs were added, and content is structured properly. You can refresh website content later if the source page updates.

Tip: Consider using manual FAQ or text input instead of website crawling when content changes frequently (such as news or real-time data), pages require authentication, the website is primarily video or image-based, or the content is unstructured.