How to design survey questions that get honest answers

Most bad survey data is not a sampling problem. It is a wording problem. The respondent answered honestly — the question just asked for something other than what you wanted to know.

This is the working checklist we use before any survey ships: question types, the three bias traps, scale design, the open-versus-closed decision, and ordering effects. None of it requires a research degree. All of it requires discipline.

What makes a survey question "honest"?

A definition first, because the word gets thrown around loosely. An honest question is one where the easiest answer for the respondent is also the truthful one. Every design decision below serves that single goal: lower the cost of telling the truth, and raise no cost on any particular answer.

A question fails this test when it telegraphs the answer you want, when it punishes truthful answers with extra work, or when it forces a respondent who thinks "it depends" to pick a side. Respondents do not lie out of malice. They satisfice — they pick the answer that ends the question fastest. Good design makes the truthful answer the fast one.

Which question type should you use for what?

Each type has one job. Misusing them is the most common structural mistake:

Single choice — when the answers are mutually exclusive and you need a clean segment. "Which plan are you on?"
Multiple choice — when several answers can be true at once. Always add "None of these" or you force false positives.
Rating scale — for intensity of a single attitude. Satisfaction, likelihood, agreement. One attitude per question, never two.
Open text — for the why behind a closed answer, or for discovery when you genuinely do not know the answer space yet.
Ranking — almost never. Respondents can reliably rank their top one or two items; everything below that is noise. If you need priorities, ask "pick your top two" as a multiple choice with a limit.
Matrix grids — the question type with the worst completion economics. A 10-row matrix is ten questions wearing a trench coat, and respondents straight-line it (pick the same column all the way down) once fatigue sets in. Break grids into separate screens or cut rows.

What are leading, loaded, and double-barreled questions?

The three classic bias traps, with the repair for each:

Leading questions suggest their own answer. "How much did our new dashboard improve your workflow?" presumes improvement; the respondent grades the size of a benefit they may not have felt. Repair: neutralise the verb. "How, if at all, did the new dashboard change your workflow?" The "if at all" is doing real work — it makes "it didn't" a legitimate, low-cost answer.

Loaded questions smuggle in an assumption the respondent never agreed to. "What do you dislike about your current provider's hidden fees?" assumes hidden fees exist. Anyone without that experience either abandons or invents one. Repair: split it. First ask whether the experience happened ("Have you encountered fees you didn't expect?"), then ask the follow-up only of those who said yes — this is exactly what branching logic is for.

Double-barreled questions ask two things and collect one answer. "How satisfied are you with our pricing and support?" A respondent who loves support and resents pricing has no honest option. Repair: one attitude per question, always. If the word "and" appears between two rateable things, split the question. This is the easiest trap to catch in review and the most common one we see in customer drafts.

How should you design rating scales?

Scale design questions generate religious wars; the evidence-backed answers are more boring than the debates:

5 or 7 points for agreement and satisfaction. Below five you lose sensitivity; above seven, respondents cannot reliably distinguish adjacent points and the extra granularity is fake precision. The survey-methodology literature (Krosnick's work on response scales is the standard reference) consistently finds reliability plateaus around 5–7 labelled points.
Label every point, or at least both ends. "1 = very dissatisfied, 5 = very satisfied" anchors the scale; unlabelled numbers drift between respondents. Fully labelled scales beat endpoint-only when the labels are natural.
Keep the direction consistent across the whole survey. If 5 means "good" on screen two, 5 must mean "good" on screen nine. Reversing direction mid-survey to "catch inattentive respondents" mostly catches honest people on autopilot and corrupts more data than it cleans.
Offer a genuine "not applicable" when it can be true. A respondent who never used the feature and is forced to rate it will rate it — and that rating is pure noise in your average.
Odd or even points? Odd (with a midpoint) when neutrality is a real position. Even (forced choice) only when you have a strong reason to believe the midpoint is hiding lean — and you accept that you are trading some honesty for it.

The 0–10 NPS scale is its own special case with its own rules — covered in running NPS the right way.

When should a question be open versus closed?

The trade is simple: closed questions are cheap to answer and cheap to analyse but can only confirm what you already imagined. Open questions are expensive on both ends but are the only way to learn something you did not think to ask.

The working rules:

Discovery phase → open. First survey on a new topic, keep it short and mostly open text. You are building the answer list for the closed version that follows.
Measurement phase → closed. Once you know the answer space, closed questions give you comparable, trendable data.
Always pair the key closed question with one open follow-up. "What's the main reason for your score?" after a rating is the highest-value question in most surveys — and the responses are exactly what AI summarisation across responses makes tractable at volume.
Never open-text a question you'll need to count. "Which country are you in?" as free text yields "USA", "U.S.", "the states", and "🇺🇸". Use a closed list for anything that becomes a chart.

A practical ceiling from experience: more than two required open-text questions in a ten-question survey and your completion rate drops noticeably. Make all but one optional.

Why does question order change the answers?

Ordering effects are real and well documented in survey methodology — earlier questions set the frame the respondent uses for later ones. The patterns worth designing around:

Ask overall ratings before specific ones. If a respondent rates twelve specific features first, their "overall satisfaction" answer becomes an arithmetic summary of those twelve rather than their actual felt experience. Overall first, specifics after.
Earlier questions prime later ones. Three questions about pricing followed by "would you recommend us?" gets you a recommendation score about pricing. Group topics, and put the questions you care most about before the topics that could contaminate them.
Easy and engaging first, sensitive late, demographics last. Respondents who have invested eight questions will answer the income question they would have bounced on at question one. And demographics first reads like a gate, which depresses starts.
Randomise answer options where order is arbitrary. In lists without a natural order, the first options get picked more (primacy bias on screens). Most survey tools, Formspring's included, can shuffle options per respondent.

A pre-flight checklist for every survey

The test we run before publishing, in order:

Read each question and ask: could a respondent guess which answer we are hoping for? If yes, rewrite.
Search the draft for "and" inside questions. Split every double-barrel.
Check every rating scale: same direction, both ends labelled, N/A where it can be true.
Count required open-text fields. More than one? Make the rest optional.
Take the survey yourself as the least-typical respondent you serve. Every question that does not apply to you is a question that needs a skip or branch.
Send it to five real people before five thousand. The pilot will catch the ambiguity you have gone blind to.

Honest answers are mostly a design output, not a respondent virtue. Get the wording, scales, and order right, and the data quality follows.

Related from this desk

Conditional logic in surveys without writing code — the branching mechanics behind "ask the follow-up only of those who said yes".
Running NPS the right way in 2026 — the 0–10 scale's special rules, sampling, and follow-up flows.
Survey response rates and how to improve them — wording gets honesty; distribution gets volume.
AI insights for form responses at scale — what to do with a thousand open-text answers.
Product side: surveys.

The Field Notes