“Vision-Language Analytics for Behaviour Understanding”

Vision-Language Models (VLMs) represent the next advancement in video analytics. Rather than measuring only positional changes or dwell time, VLM enables a system to interpret actions and sequences of behaviour within short video clips. This provides businesses with a higher-order understanding of how customers and staff interact within a space, without increasing operational complexity or compromising privacy.

FootfallCam applies VLM in a controlled, domain-specific way. The model operates entirely on-device, analysing brief video segments and converting them into clear, high-level behaviour categories. These categories are designed for aggregation, statistical analysis, and decision-making, not for identifying individuals or generating personal profiles. The output is structured behavioural data that enhances traditional people counting, queue analytics, and operational performance metrics.

The system is built around four design principles:

#1: Behaviour-Level Understanding

The model recognises sequences such as approaching, waiting, browsing, interacting with staff, or receiving assistance. It moves beyond simple coordinates to interpret the intent behind customer actions, providing richer insight into service quality, engagement, and operational flow.

#2: User-Defined Behaviour Categories

Businesses can specify, in plain language, the types of behaviours they want to monitor, for example, “customer waiting for service” or “staff using handheld PoS.” The system classifies activities into these categories, producing structured, high-level statistics tailored to operational goals.

#3: Explainable Classification

Each behavioural classification is accompanied by a short, plain-English explanation describing the cues that led to the decision. This supports transparency, validation, and alignment with internal operational definitions

#4: Guardrailed and Privacy-Centric by Design

The model analyses only the behaviours explicitly requested. It cannot infer identities, personal attributes, or any information outside the defined scope. All processing occurs locally on the device, and only aggregated behaviour counts are transmitted for reporting.

By integrating VLM with traditional footfall and queue analytics, FootfallCam enables organisations to measure aspects of the customer journey that were previously unobservable. This helps validate staffing models, evaluate service responsiveness, optimise mobile PoS deployment, and improve overall visitor experience, all through structured, anonymised behavioural insight.

All Devices

Product Selector

Device Calculator

Accessories

Floor Plan Designer

Retail Segments

Industry Solutions

Shopping Malls

Universities

Offices

Supermarkets

Washrooms

Smart Cities

Airports

Food Establishments

Libraries

Museums

Public Venues

Resource Hub

Partner

Authorised Resellers

Distributors

Installers

Field Service Partners

System Integrators

Consultants

Partner Program

About FootfallCam

Vision-Language Analytics

“Vision-Language Analytics for Behaviour Understanding”

#1: Behaviour-Level Understanding

#2: User-Defined Behaviour Categories

#3: Explainable Classification

#4: Guardrailed and Privacy-Centric by Design

Ready to learn more?

Next Steps

Talk to an Expert