Vision-Language Analytics

Video analytics that understand subtle behavioural nuances across movements and interactions, transforming vision into metrics.

Banner image

Understand the Moment

Move beyond movement analytics. VLM interprets behaviour and context, transforming observed activity into defined, verifiable metrics that reflect real customer engagement and service interactions.

Why This Matters

Traditional analytics measure presence and duration. VLM introduces interpretation.

Distinguishes between types of behaviour within the same space.

Allowing identical movements to be understood differently based on context, posture, and interaction patterns.

Contextual Metrics

VLM converts observed activity into structured classifications. A person in front of a shelf is no longer a dwell event, but a defined behaviour-ranging from brief attention to active product evaluation. The same applies to staff interaction, service flow, and completion stages, forming measurable operational indicators.

Contextual Categorisation

VLM refines what a person is doing within a detected zone or interaction:

  • Shelf zone: glance, browse, active product interest
  • Staff interaction: asking, being served, upsell interaction
  • Checkout: payment, completion, handover

This turns a simple “20 seconds in front of shelf” into a defined behaviour category.

Built-in Validations

The same mechanism verifies whether an observed action is genuinely meaningful:

  • Distinguishes shopper engagement vs staff restocking
  • Filters out idle or irrelevant presence
  • Confirms whether interaction actually occurred

This removes ambiguity from traditional metrics and increases trust.

Same Moment, Different Meaning

A single location can produce multiple outcomes. A stationary person may be distracted, browsing, or engaged in product evaluation. VLM resolves this ambiguity by interpreting behaviour rather than relying on duration thresholds, enabling consistent categorisation across varying real-world conditions.

Phone Distraction

FootfallCam Phone Distraction

VLM Output: Person stationary, attention directed to mobile device. No product interaction.

Not Engaged

Casual Glance

FootfallCam Casual Glance

VLM Output: Brief visual attention towards products, no hand interaction

Lvl1:Light engagement

Active Browsing

FootfallCam Active Browsing

VLM Output: Sustained visual focus on products with exploratory behaviour.

Lvl2: Browsing

Deep Product Interaction

FootfallCam Deep Product Interaction

VLM Output: Direct product handling and evaluation behaviour.

Lvl3: High Engagement

Talking to Friend

FootfallCam Talking to Friend

VLM Output: Conversation detected, not related to product interaction

Not Engaged

Talking About Product

FootfallCam Talking About Product

VLM Output: Shared attention towards product with discussion behaviour.

Lvl4: Staff Engagement

AI-generated illustrations only; no real customer data, surveillance footage, or personal information is used or captured.

Retail Applications

Across store environments, VLM enables consistent interpretation of key activities. Product engagement, staff interaction, and checkout processes can be measured as structured behaviours rather than inferred events, providing a clearer view of customer journey and operational execution.

FootfallCam Icon

Shelf Engagement

FootfallCam Icon

Staff Interaction

FootfallCam Icon

Checkout Completion

Verifiable Outcomes

Each classification is derived from observable behaviour and can be reviewed as a discrete event. This enables validation workflows where outcomes are inspected and refined, ensuring that metrics remain aligned with actual store activity rather than assumed patterns.

FootfallCam Verifiable Outcomes

Operational Definition

Behaviour categories are not fixed. Retailers define what constitutes meaningful engagement or service quality within their own context. This allows consistent measurement across different formats, from high-touch retail environments to transaction-focused stores.

Built for Practice

The system is designed to operate within defined conditions, focusing on meaningful interactions rather than continuous interpretation. This ensures that analysis remains relevant, controlled, and aligned with operational use rather than theoretical modelling.