Vision

A replicant must see the world. Mia looks through a camera — and understands what she sees.

How Mia sees

📷
Camera captures
👁
Detects faces
👤
Recognizes
📏
Estimates distance
🧠
Brain reacts

This cycle repeats continuously, frame by frame

📷

A camera as an eye

Mia uses a camera as her eye. The image is captured continuously and sent to software that analyzes it in real time. This is the first step: without vision, Mia doesn't know what surrounds her.

👁

Face detection

The software automatically spots faces in the image. It knows how many people are present, where they are in the field of view, and at what approximate distance they are.

👤

Recognition

Mia doesn't just see faces — she can recognize them. If she has seen you before, she knows it's you. This recognition influences her behavior: she doesn't react the same way to a stranger versus someone familiar.

📏

Distance estimation

By analyzing the size of the face in the image, Mia estimates how far away you are. Close up, she'll be more attentive. Far away, she may simply observe you. This information directly feeds her decisions.

Real time

The analysis happens continuously, frame by frame. Mia doesn't take photos — she watches constantly. Each new image updates her understanding of the scene, like our eyes continuously send information to our brain.

💡

Vision → Decision

What Mia sees directly feeds her brain. A detected face can trigger curiosity, a recognized face can provoke a social reaction, the absence of faces can lead to dream mode. Vision is the starting point of all behavior.

Vision Pipeline

Camera Continuous video capture, stream sent to the Python service
Python Service Face detection + facial recognition. Bounding box calculation + distance estimation.
REST API Results exposed to the cognitive engine — face count, positions, identities, distances
Scene Engine Integrates vision data into a unified representation — available to all cognitive agents

Technical Architecture

  • Dedicated service — independent module communicating with the cognitive engine
  • Detection — real-time facial detection algorithms
  • Recognition — comparison against a known faces database, identification by similarity
  • Distance — estimation based on relative face size in the frame
  • Cognitive integration — vision data feeds the cognitive loop at each cycle
  • Impacted agents — presence, proximity, sociality and curiosity agents react to vision data

Data transmitted per frame

Per detected face

  • X, Y position in the image
  • Bounding box width and height
  • Estimated distance (meters)
  • Identity (if recognized)
  • Detection confidence

Global data

  • Total number of faces
  • Closest face
  • Changes since previous frame
  • Capture timestamp
Next: how Mia thinks → The Brain