Vision-Language Analytics

Video analytics that understand subtle behavioural nuances across movements and interactions, transforming vision into metrics.

From Vision to Behavioural Understanding

Traditional video analytics measure where people are and how long they stay. This creates useful signals, but they remain interpretations. A person standing in front of a shelf for twenty seconds may be browsing, waiting, distracted, or not engaged at all.

FootfallCam introduces a new generation of video analytics that understands context - what people are doing, how they interact, and what it means for your business.

Business Usefulness in Retail

Instead of relying on time thresholds or proximity alone, the system interprets behaviour - body orientation, hand movement, interaction patterns, and situational context. It distinguishes between a shopper actively considering a product, a casual glance, a staff member restocking, or a person simply passing time. The result is not a guess, but a classified outcome aligned to real-world behaviour.

Area	Applications	Value to Business
Customer Engagement	Detects real interactions between customers and staff	Evaluate service quality and staff responsiveness
Merchandising Impact	Understand browsing patterns and dwell intensity	Measure true product engagement beyond footfall
Checkout Service	Classifies greeting, serving, upselling, and handover moments	Measure service quality in a consistent, auditable way
SOP Adherence	Monitors defined service behaviours and process steps	Improve operational discipline and training
Validation & Coaching	Stores reviewable event clips and classified outcomes	Turn observations into evidence, coaching, and improvement

How It Works in Practice

Step 1

A trigger happens

The system detects predefined events like lingering at shelves or interacting with staff.

Step 2

Tracking narrows the focus

Movement tracking locates likely attention and intent, serving as a guardrail to isolate meaningful moments

Step 3

Understand and classify

The system interprets the context and categorises the action into business-defined levels.

Step 4

Output to Metrics

The classified outcome is converted into structured, countable, and reviewable results.

Metrics: Contextual, Not Just Movement

Contextual Categorisation

VLM refines what a person is doing within a detected zone or interaction:

Shelf zone: glance, browse, active product interest
Staff interaction: asking, being served, upsell interaction
Checkout: payment, completion, handover

This turns a simple “20 seconds in front of shelf” into a defined behaviour category.

Built-in Validation

The same mechanism verifies whether an observed action is genuinely meaningful:

Distinguishes shopper engagement vs staff restocking
Filters out idle or irrelevant presence
Confirms whether interaction actually occurred

This removes ambiguity from traditional metrics and increases trust.

Why This Matters

Moves from assumption → evidence

Reduces false interpretation of dwell time and presence

Provides verifiable context, not just positional data

Enables consistent measurement of real-world behaviour

Classifying Shopper Behaviour

The system identifies and distinguishes subtle differences in posture, attention, and interaction to classify behaviour types, from non-engagement to active product interaction and staff engagement, enabling structured interpretation of shopper movement and intent.

AI-generated illustrations only; no real customer data, surveillance footage, or personal information is used or captured.

Engagement metrics:

Not engaged

Phone Distraction

Engagement metrics:

Level 1: Light engagement

Casual Glance

Engagement metrics:

Level 2: Browsing

Active Browsing

Engagement metrics:

Level 3: High engagement

Deep Product Interaction

Engagement metrics:

Not engaged

Talking to Friend

Engagement metrics:

Level 4: Assisted Engagement

Talking About Product

The Future of In‑Store Understanding

After visitors enter, zone counting shows where people actually go inside the mall.

Zones are commonly defined as:

Major wings (for example, East / West)
Atriums
Key corridors between anchors

All Devices

Product Selector

Device Calculator

Accessories

Floor Plan Designer

Retail Segments

Industry Solutions

Shopping Malls

Fast Food Restaurants

Universities

Libraries

Washrooms

Smart Cities

Airports

Supermarkets

Offices

Museums

Public Venues

Resource Hub

Partner Program

Authorised Resellers

Distributors

Installers

Field Service Partners

System Integrators

Consultants

Partner

About FootfallCam

Vision-Language Analytics

From Vision to Behavioural Understanding

Business Usefulness in Retail

Area

Applications

Value to Business

How It Works in Practice

Step 1

A trigger happens

Step 2

Tracking narrows the focus

Step 3

Understand and classify

Step 4

Output to Metrics

Metrics: Contextual, Not Just Movement

Contextual Categorisation

Built-in Validation

Why This Matters

Classifying Shopper Behaviour

AI-generated illustrations only; no real customer data, surveillance footage, or personal information is used or captured.

Not engaged

Phone Distraction

Level 1: Light engagement

Casual Glance

Level 2: Browsing

Active Browsing

Level 3: High engagement

Deep Product Interaction

Not engaged

Talking to Friend

Level 4: Assisted Engagement

Talking About Product

The Future of In‑Store Understanding

Ready to learn more?

Next Steps

Talk to an Expert