Aircall Help Center | Writing effective questions for AI evaluation in Call Scoring

This guide explains how to write clear and actionable custom AI questions for Call Scoring. Since the AI model only receives the call transcript and has no knowledge of your company, agents, or internal policies, well written questions are essential for accurate and consistent scoring.

For an overview of scorecards and automation rules, please see:

Note: Custom AI questions are available with an AI Assist Pro license. The guidance in this article applies to any question with AI evaluation enabled.

The core principle

Write each question so that a person with no background in your company could answer it correctly from the transcript alone.

Effective questions must be:

Specific: reference concrete, observable behaviors
Self contained: include all necessary context within the question
Scoped: clarify who is being evaluated and when the behavior should occur

The question field is the AI's primary instruction. Yes/No criteria provide supplementary details but cannot fix an unclear or overly broad question.

Important: Write the question as if the Yes/No criteria fields do not exist. The criteria should reinforce the question, not compensate for ambiguity.

Common mistakes and how to fix them

Subject ambiguity

Unclear subjects such as “they” or “the speaker” make evaluation unreliable.

Instead of	Use
“Did they acknowledge the customer's concern?”	“Did the agent acknowledge the customer's concern?”
“Was the speaker empathetic during the call?”	“Did the agent express empathy when the customer described their frustration?”

Always specify the subject, usually by beginning with Did the agent...

Vague language

Subjective terms like professional, helpful, or rude are not measurable without observable behaviors.

Instead of	Use
“Was the agent professional?”	“Did the agent maintain a calm tone and avoid interrupting the customer?”
“Did the agent provide helpful information?”	“Did the agent answer the customer's specific questions about billing or provide relevant account details?”
“Was the agent rude?”	“Did the agent use inappropriate language or dismiss the customer's concerns?”

Replace abstract qualities with explicit behaviors.

Scope ambiguity

References to “the situation” or “proper protocol” require context the AI does not have.

Instead of	Use
“Did the agent handle the situation well?”	“Did the agent resolve the customer's billing dispute by explaining the charges and offering a solution?”
“Was proper protocol followed?”	“Did the agent verify the customer's identity by asking for their account number and the last four digits of their SSN before discussing account details?”

Always specify the exact issue or moment being evaluated.

Open ended phrasing for Yes/No questions

Phrases like how well or how effectively cannot be answered with Yes or No.

For binary scoring, use direct phrasing such as:

“Did the agent apologize for the customer's negative experience and offer a specific solution or escalation path?”

For ranged evaluation questions, open ended phrasing is appropriate as long as the scoring guide defines each rating level:

“How well did the agent explain the product's key features and relate them to the customer's needs?”

Tip: Use a ranged evaluation question when you want nuanced scoring. For a simple pass or fail, use a Yes/No question.

Contextual dependency

The AI has no access to scripts, escalation matrices, or internal policies.

Instead of	Use
“Did the agent follow our standard greeting protocol?”	“Did the agent introduce themselves by name and ask how they could help the customer?”
“Did the agent follow the escalation matrix correctly?”	“When the customer requested a supervisor, did the agent attempt to resolve the issue first before agreeing to escalate?”
“Did the agent use the approved closing script?”	“At the end of the call, did the agent ask if the customer had any other questions and confirm satisfaction with the resolution?”

Spell out the expected behavior in the question itself.

Implicit expectations

If a behavior is required, state it explicitly.

Instead of	Use
“Did the agent verify the customer properly?”	“Did the agent ask the customer to provide their account number and verify their identity before accessing account information?”
“Did the agent document the call correctly?”	“Did the agent summarize the customer's issue and the resolution provided during the call?”

Negation and double negatives

Complex negative phrases lead to inconsistent scoring.

Instead of	Use
“Did the agent not fail to avoid being unprofessional?”	“Did the agent maintain a professional tone throughout the call?”
“Was there no evidence of the agent not listening?”	“Did the agent demonstrate active listening by summarizing the customer's concerns or asking clarifying questions?”

Keep language direct and positive.

Misalignment between the question and Yes/No criteria

If the question is generic, the AI may evaluate unintended behaviors.

Misaligned example

Question: “Agent interruptions”
Yes criteria: “The agent interrupted the caller while they were speaking”

The AI may also look for caller interruptions because the subject is not specified.

Corrected example

Question: “Did the agent interrupt the caller while they were actively speaking?”
Yes criteria: “The agent interrupted the caller mid sentence or before they finished speaking”

Guidelines for writing effective questions

Follow these principles to create predictable, accurate scoring:

Make it specific: Use observable behaviors, not subjective traits.
Name the subject: Typically “Did the agent [action]?”
Define the scope: Reference the relevant part of the call.
Keep it measurable: Ensure a clear Yes/No outcome or use a defined rating scale.
Make it self contained: Include all needed context in the question.
Use direct language: Active voice, present tense, and one concept per question.

Question checklist

Before saving a question, confirm:

Can someone with no company knowledge answer it from the transcript alone?
Is it clear who is being evaluated?
Are behaviors specific and observable?
Does the format match the scoring type (Yes/No or ranged)?
Does it avoid internal jargon or policy references?
Is the language simple and direct?
Would it score correctly even without the Yes/No criteria fields?

Scenario specific examples

Customer service

Instead of	Use
“Did the rep provide good customer service?”	“Did the agent listen to the customer's complete concern before offering a solution?”
“Was the interaction satisfactory?”	“Did the customer express satisfaction with the resolution or agree to the proposed next steps by the end of the call?”

Technical support

Instead of	Use
“Did they troubleshoot effectively?”	“Did the agent ask the customer to describe the specific error message and guide them through at least two troubleshooting steps?”
“Was the technical explanation clear?”	“Did the agent explain the solution in terms the customer could understand, avoiding technical jargon or defining any technical terms used?”

Sales calls

Instead of	Use
“Did the agent qualify the lead properly?”	“Did the agent ask about the customer's current setup, budget range, and timeline for implementation?”
“Was the pitch compelling?”	“Did the agent connect at least two product features to specific needs or pain points mentioned by the customer?”

Billing and account issues

Instead of	Use
“Did they handle the billing dispute correctly?”	“Did the agent review the specific charges in question and explain each line item the customer asked about?”
“Was account security maintained?”	“Did the agent verify the customer's identity by asking for two forms of account verification before discussing sensitive account information?”

Call management

Instead of	Use
“Did the agent manage their time well?”	“Did the agent stay focused on the customer's primary concern rather than going off on tangents?”
“Was the call handled efficiently?”	“Did the agent resolve the customer's issue or provide a clear next step within 15 minutes of identifying the problem?”

Tip: Use the AI Scorecard Evaluator tool to review your questions before publishing a scorecard. It assesses scoring difficulty and suggests improvements. Please see Understanding automated Call Scoring for details.

FAQs

Why does question phrasing matter if I've filled in the Yes/No criteria fields?

The question field is the AI's primary instruction. Criteria add helpful detail but cannot override a vague question. Broad questions may cause the AI to analyze behaviors you did not intend to evaluate.

Can I reference internal scripts or policies?

No. The AI cannot access internal documentation. Any required behavior must be described in the question itself.

What is the difference between Yes/No and ranged evaluation questions?

Yes/No questions produce a binary result. Ranged evaluation questions produce a five point score from Very poor to Excellent. Use ranged questions when nuance is required and define each level clearly in the scoring guide.

Can I use these question types without an AI Assist Pro license?

Custom AI questions require an AI Assist Pro license. Please see AI Assist: Understanding Call Scoring for plan details.

How do I know if my question is ready for AI evaluation?

Use the checklist above. If any item fails, revise your question. You can also use the AI Scorecard Evaluator tool for automated feedback.