Multimodal search and what SEO professionals should do now
The way people search is changing so quickly that the familiar marketing funnel is beginning to look almost unrecognisable. For years, we could map a neat path from awareness to consideration to conversion, and then sprinkle in retention or advocacy at the end. But today, search is not just a line, it’s a loop. People don’t simply move forward; they circle back, validate their choices, and bring new information into the process, often influenced by content formats we once didn’t even consider “searchable.”
Imagine someone spots a product on TikTok, saves it to a Pinterest board, checks Reddit for honest opinions, asks Google for a local stockist, and finally validates their choice through an AI overview summarising different reviews. That’s not a funnel—that’s a swirling journey of discovery, validation, decision, and advocacy happening simultaneously across multiple channels. For SEO and marketing professionals, the challenge is not just being present in one stage but ensuring visibility and credibility at every possible touchpoint.
What is multimodal search and why does it matter?
Multimodal search is the combination of text, voice, image and video in a single query. It matters because it reshapes discovery, makes results more entity-driven, and requires brands to optimise content in new ways.
Someone might take a photo of a chair, circle it on their phone screen, and ask, “Where can I buy this under £300?” They might then follow up with a voice query: “Is it available near Brighton?” The response isn’t a list of blue links but a blend of AI-generated summaries, shopping carousels, community posts, videos, and location data.
Search is no longer just about optimising a web page for keywords, it’s about making sure your product information, imagery, and reputation are machine-readable and credible enough to surface across varied contexts.
Google has reported that people are now using Google Lens for over 12 billion visual searches each month, and Pinterest users carry out more than 600 million visual searches every month. These are no longer fringe behaviours, they are mainstream discovery tools.
How is Gen Z changing discovery?
Around 40% of Gen Z now start their product searches on TikTok or Instagram instead of Google, often moving later to YouTube, Reddit and AI tools for validation. This forces brands to spread content across multiple platforms.
A search journey might start as a 15-second TikTok, jump to a 10-minute YouTube tutorial, pause for validation in a Reddit thread, and end in an instant checkout via social commerce. YouTube remains critical, with over 2.5 billion monthly users making it the second largest search engine after Google. TikTok and Instagram are increasingly becoming the first stop for inspiration, particularly for lifestyle, fashion, food and home categories.
This layered behaviour forces brands to adapt. Content must travel. It’s not enough to rank in Google, you need the hook on TikTok, the explainer on YouTube, the proof point in a blog, the evidence in schema markup, and the credibility in community spaces.
Why is tracking harder in multimodal search?
Tracking is harder because journeys now span across TikTok, Reddit, YouTube, AI assistants and visual search, many of which don’t provide standard analytics data. As a result, attribution becomes fragmented.
Traditional analytics models leaned heavily on “last click” attribution, giving credit to the final step before conversion. But in today’s multimodal world, last click rarely tells the whole story. How do you measure the value of a TikTok save, a Reddit upvote, or a mention in an AI overview? These signals influence decisions, but they don’t always show up in Google Analytics or standard dashboards.
Early studies suggest that Google’s AI Overviews can reduce click-through rates from traditional organic listings by up to 20–30% in some industries, while brands cited in AI summaries may see exposure and authority gains. This creates a tension: traffic might drop, but presence in AI results is now part of brand visibility.
The reality is that assists now matter as much as conversions. Pinterest saves or TikTok watch time may not drive instant sales, but they shape the choices people make later. Businesses need to treat these as valuable leading indicators, even if they don’t tie neatly to revenue in the short term.
Real-world examples
This shift is not abstract, we can see how it’s already affecting the sectors our clients work in:
-
Retail: A customer may snap a photo of a stylish chair in a café, use Google Lens to find similar products, and filter by price or material. They might then turn to TikTok for styling tips, and finally check Google Maps for a nearby showroom before buying. Without high-quality images, structured product data, and engaging video content, a retailers risk disappearing from the journey.
-
Speciality Foods: A Gen Z shopper sees a viral TikTok about a new dish. They save it, then head to YouTube for recipes, Reddit for reviews on taste and packaging, and Google to compare delivery options. If a food brand hasn’t created short videos, optimised schema for recipes and products, and built presence in communities, it will lose visibility in the decision-making loop.
-
B2B Services: Even trades are part of this shift. A homeowner might start with Instagram before-and-after videos of renovations, check YouTube for “what to ask your electrician,” and then search Google or AI assistants for “qualified electrician near me with good reviews.” Here, credibility comes not just from rankings but from video case studies, verified Google Business Profiles, and strong local reviews that AI systems can confidently cite.
What businesses need to consider, do or plan
One critical element across all industries is the importance of a well-constructed website. However strong your social presence, community validation or AI mentions, your website remains the foundation of credibility and conversion. It must stand out for audiences and be fully optimised for search engines.
-
E-commerce (Shopify): For online retailers, especially those on Shopify, a well-built site ensures product data, reviews, and imagery are easily pulled into Google’s shopping feeds and AI answers. Clean site structure, mobile-first design, and technical SEO best practices can make the difference between being surfaced in multimodal results or being overlooked.
-
B2B (WordPress): For service-led businesses, often built on bespoke WordPress sites, the website serves as the anchor of trust. It needs clear service pages, optimised case studies, and well-structured thought leadership articles that demonstrate authority. Schema, fast performance, and consistent branding all help reinforce credibility in the eyes of both human users and search engines.
In both cases, the website should not just look professional but work as the single source of truth for accurate, structured information—fuel for discovery, validation, and conversion in the multimodal funnel.
Practical checklist for businesses
To stay visible and credible in a multimodal world, businesses should be taking a long hard look at their brand's presence in Google, YouTube, TikTok and AI assistants.
We suggest:
-
Creating at least one multi-format content cluster (video, article, image set, community post) each quarter.
-
Verify product feeds and validate schema for products, services, FAQs and how-to guides.
-
Benchmark discovery and validation KPIs (impressions, saves, watch time, citations) alongside conversions.
-
Refresh websites for speed, UX, and mobile-first design, critical factors in both SEO and AI summaries.
What are the risks of inaction?
Businesses that don’t adapt risk becoming invisible in AI summaries, losing younger audiences, wasting ad spend, and undermining their websites as the foundation of trust.
What SEO professionals must do
For SEO teams, the task is no longer just about rankings. It’s about crafting answers that travel across formats, structuring data so machines can read it, and ensuring presence where people validate their choices. That means creating multi-format content, short videos, long explainers, skimmable guides, image sets, all tied back to a single source of truth. It means implementing schema properly, keeping product feeds clean and accurate, and ensuring every image, caption, and filename tells the right story.
And critically, it means expanding measurement models. Beyond traffic and conversions, we must capture mid-funnel behaviours: video watch time, saves, shares, AI citation presence, and community engagement. These are not vanity metrics; they are early signals of advocacy and long-term growth.
The bottom line
Multimodal search has fractured the funnel and blurred the boundaries of discovery, validation, and conversion. For clients, this means marketing success is no longer measured just by where they rank in Google but by how consistently their brand shows up across AI overviews, short-form video feeds, visual search tools, and community spaces. For SEO marketers, the challenge is to build answers that are helpful, verifiable, and trackable, even in a world where attribution is imperfect.
In short: search has become a story. The question is, are you telling it across every channel where your audience listens, watches, and asks?