Free-text answers are the most valuable submissions you collect and the least likely to be read. A team of three with a 5% response rate on a 50,000-customer base ends up with 2,500 paragraphs nobody opens. AI insights solve the reading problem. They do not solve every problem.

Here is what AI summarisation of form responses is good at, what it is not, and how to wire it into a workflow that produces decisions instead of dashboards.

What AI insights actually does

Given a batch of free-text responses, an insights pass produces:

  • Themes: clusters of semantically similar responses ("the editor is slow", "loading state is unclear", "I want offline mode"). Each theme has a count, a sample quote, and a representative phrasing.
  • Sentiment: per-response and per-theme positive/negative/neutral split.
  • Outliers: responses that don't fit any cluster — usually the most interesting ones.
  • Trends over time: which themes are growing, which are shrinking, and where the change started.

A good pass turns 2,500 paragraphs into ten themes with confidence and example quotes. A weekly review of those ten themes replaces the impossible "read every response" expectation.

What it gets right

For high-volume, low-stakes feedback, AI summarisation is now better than humans for two reasons:

  1. Consistency. A human reading the 800th response is not paying the same attention as on the 5th. The model is.
  2. Recall. A human remembers the responses that stood out. The model remembers all of them, including the boring middle that contains most of the signal.

For NPS follow-ups, support post-mortems, "what would you fix first?" surveys, and any feedback you collect at volume, AI clustering is the right starting point.

What it gets wrong

Two failure modes show up often enough to plan around:

Theme dilution. When two genuinely different complaints share vocabulary, the model groups them. "Slow" can mean slow page loads, slow customer support replies, or slow feature shipping cadence. A cluster called "speed" hides which of those is driving the complaint.

Sarcasm and inversion. "Yeah, the new pricing was great." Sentiment models get this wrong about 30% of the time — a finding that holds up across the survey literature on irony detection in the ACL anthology. For sentiment to be trustworthy, the model needs to see enough context — usually a full follow-up answer, not just a rating.

The fix for both: spot-check the themes against five raw responses each. If the samples match the cluster label, you can trust the count. If they don't, the cluster needs a re-prompt or a manual split.

Wiring it into a workflow

Insights without a workflow are wallpaper. The shape that works:

  1. Collect: every response gets indexed automatically as it lands.
  2. Cluster on cadence: weekly for high-volume, monthly for relationship surveys. Real-time clustering produces too much noise to act on.
  3. Diff against last period: the interesting thing is not "what are the top themes", it is "what changed". A theme that doubled in two weeks is a stronger signal than a theme that has been at the top for a year.
  4. Route by theme: each theme has an owner. Product themes go to product, support themes go to support, pricing themes go to ops. If a theme has no owner, it is not actionable, so park it.
  5. Close the loop: anyone who wrote a response in a top theme gets an email a month later: "Here's what we did about this." That single email is the difference between a feedback program and a feedback theatre.

Privacy: what you can and can't send to a model

Submissions often contain PII — email addresses, names, free text that mentions third parties. Before any AI insights pass, three things have to be true:

  • Submitters were told their response might be analysed automatically. Update your privacy policy — the transparency obligation comes from GDPR Article 13.
  • PII is stripped before being sent to the model, not just before being shown in a dashboard. A redaction pass on email, phone, and obvious names is enough for most use cases.
  • The model provider has a no-training agreement on the data. For EU customers this also means a DPA and SCCs in place — see the EDPB guidelines on international transfers.

The shortcut some teams take is "just send everything to the model, we'll worry about it later". That shortcut is a GDPR finding and an avoidable one.

When to skip AI insights entirely

Three cases where a human read is still right:

  • Volume below ~100 responses per period. Cluster sizes too small to be meaningful. A 30-minute read is faster than configuring the pass.
  • High-stakes decisions (firing a customer, shipping a major feature). The cluster summary is a starting point, not a source. Read the underlying responses before the meeting.
  • Legal or compliance feedback. Don't summarise complaints that might become evidence. Treat them as individual records.

For everything in between — onboarding NPS, CSAT after a release, "what would you fix first?" — AI insights are a better default than a spreadsheet.

Related from this desk

The honest pitch

AI insights are not a replacement for reading your customers. They are a way to read all of them instead of the loud ones. The teams that get the most out of this treat the cluster view as a reading list, not a report. The themes tell you which 50 responses are worth your hour. You still read those 50.

That is the work. The tool just makes the reading list possible.