Vision

A replicant must see the world. Mia looks through a camera — and understands what she sees.

How Mia sees

📷

Camera captures

→

👁

Detects faces

→

👤

Recognizes

→

📏

Estimates distance

→

🧠

Brain reacts

This cycle repeats continuously, frame by frame

📷

A camera as an eye

Mia uses a camera as her eye. The image is captured continuously and sent to software that analyzes it in real time. This is the first step: without vision, Mia doesn't know what surrounds her.

👁

Face detection

The software automatically spots faces in the image. It knows how many people are present, where they are in the field of view, and at what approximate distance they are.

👤

Recognition

Mia doesn't just see faces — she can recognize them. If she has seen you before, she knows it's you. This recognition influences her behavior: she doesn't react the same way to a stranger versus someone familiar.

📏

Distance estimation

By analyzing the size of the face in the image, Mia estimates how far away you are. Close up, she'll be more attentive. Far away, she may simply observe you. This information directly feeds her decisions.

⚡

Real time

The analysis happens continuously, frame by frame. Mia doesn't take photos — she watches constantly. Each new image updates her understanding of the scene, like our eyes continuously send information to our brain.

💡

Vision → Decision

What Mia sees directly feeds her brain. A detected face can trigger curiosity, a recognized face can provoke a social reaction, the absence of faces can lead to dream mode. Vision is the starting point of all behavior.

Vision Pipeline

Camera Continuous video capture, stream sent to the Python service

→

Python Service Face detection + facial recognition. Bounding box calculation + distance estimation.

→

REST API Results exposed to the cognitive engine — face count, positions, identities, distances

→

Scene Engine Integrates vision data into a unified representation — available to all cognitive agents

Technical Architecture

Dedicated service — independent module communicating with the cognitive engine
Detection — real-time facial detection algorithms
Recognition — comparison against a known faces database, identification by similarity
Distance — estimation based on relative face size in the frame
Cognitive integration — vision data feeds the cognitive loop at each cycle
Impacted agents — presence, proximity, sociality and curiosity agents react to vision data

Data transmitted per frame

Per detected face

X, Y position in the image
Bounding box width and height
Estimated distance (meters)
Identity (if recognized)
Detection confidence

Global data

Total number of faces
Closest face
Changes since previous frame
Capture timestamp

Next: how Mia thinks → The Brain