/// FRESHLY SERVED >>>

PINK SOUP
VISUAL INTELLIGENCE AT THE
SPEED OF SIGHT

Our first lightweight visual understanding model built for real hardware. Point, detect, segment, and understand scenes at a speed and cost that makes continuous vision processing finally practical.

Brewed in Lithuania (yes, like the beet soup šaltibarščiai, but way more powerful). Making visual AI accessible, fast, and actually deployable for developers worldwide.

/// CAPABILITIES >>>

One Model, Many Capabilities

Moondream supports a growing set of visual capabilities - all accessible through natural-language prompts.

Image Captioning

Generate detailed descriptions of complex scenes and objects

ManufacturingComplianceSynthetic Data

Caption:

The image shows a man in a blue jumpsuit and yellow hard hat standing in a large industrial setting. He is wearing safety glasses and ear protection, and is holding a clipboard and a pen, appearing to be taking notes...

Image Captioning

Visual Question Answering

Ask natural language questions about any image

TransportationSecurityAgentic AI

Query: Is any vehicle unsecured? Describe.

Yes, there is an unsecured truck parked in the area. The truck is filled with boxes, and it appears to be a delivery truck. The presence of the unsecured truck and the boxes suggests that it might be a delivery service or a delivery truck parked in a public area.

Visual Question Answering

Object Detection

Locate and identify objects with precise bounding boxes

RetailInventoryTransportationRobotics

Detect: License plate

(x=0.431, y=0.713, x2=0.569, y2=0.921)

Object Detection

Pointing (x, y)

Pinpoint exact locations in images with coordinate precision

Quality ControlComplianceTransportationDefenseSurveillance

Point: Defect in train tracks.

(x=0.431, y=0.505)

Pointing (x, y)

Gaze Detection

Track and analyze where people are looking in real-time

ManufacturingSafetyTransportationRetailReal-world Agentic AI

Detect Gaze:

The operator is looking at the bottom-right section of the control panel, near the red warning light.

Gaze Detection

OCR & Document Understanding

Extract and comprehend text from documents and images

LogisticsOffice AutomationLegal

Query: Transcribe the text in natural reading order.

"Preface, The computing world has undergone a revolution since the publication of The C Programming Language in 1978. Big computers are much bigger, and personal computers have capabilities..."

OCR & Document Understanding
/// GET STARTED >>>

Get Running in Minutes.

Pink Soup is open source and you can install and run it anywhere, for free. You can have it running on your computer or in our cloud in a matter of minutes.

Run It Yourself

  • Pink Soup Station is free
  • Works with our Python and Node clients
  • Works offline, fully under your control
  • CPU or GPU compatible

Run in the Cloud

  • Spin up instantly - no downloads or DevOps
  • $5 in free monthly credits, no card required
  • Predictable pay-as-you-go pricing
  • 2 RPS on free tier, scales to 10 RPS or more with paid credits
🍜
🎨
🚀
💖
🤖
👁️
🧠
/// RIDICULOUSLY SMART AI >>>

It's Not Just Pink.
It's GENIUS Pink.

Our AI is so smart, it probably knows what you're thinking right now. (Spoiler: You're thinking "wow, this really IS pink soup!")

Sees better than your ex saw your red flags
Processes images faster than you process emotions
More layers than your therapist's notes
Pink because... why not? Life's too short for boring colors
🍜

"Yes, we named our AI after soup. Deal with it." 🥣✨