Raile — PM of the Future

GET /v1/inventory · 847ms · 12,440 results · LIVE POST /v1/booking · 1.2s · LH4478 BER→AMS · CONFIRMED AGENT disruption-watch · CDG delay +3h · ALT DRAFTED GET /v1/profile/ctx · 312ms · seat 14A · HIT PATCH /v1/booking/mgmt · upgrade exec · row 2C · DONE GET /v1/inventory · 847ms · 12,440 results · LIVE POST /v1/booking · 1.2s · LH4478 BER→AMS · CONFIRMED AGENT disruption-watch · CDG delay +3h · ALT DRAFTED GET /v1/profile/ctx · 312ms · seat 14A · HIT PATCH /v1/booking/mgmt · upgrade exec · row 2C · DONE

The thesis

"The Raile PM defines a sharp strategy, battle tests it, and defines success."

1:3 PM to engineer ratio. PMs fluent in APIs, security, and scalability. Focus is strategy, prototyping, and testing — not coordination. Not documentation. Not waiting.

A team running 13 times more experiments per quarter compounds its learning gap until it becomes insurmountable. That's not a prediction. It's already happening.

Most PMs were never actually bottlenecked by execution. They were bottlenecked by taste and judgment. Team capacity functioned as a governor that prevented bad ideas from shipping. Remove that governor and you discover who was driving and who was just steering.

— Gemini Head of Product

Open role

Role title

Product Builder —
Profile Management

Metric owned

bookings/day

Throughput. Not engagement. Not sessions. Not NPS. The number that matters is how many trips complete.

We hire operators. They build product architecture and define API SLOs before writing a line of copy. They ship production code in Cursor or Claude Code on Monday and have an experiment running by Wednesday. They write their own eval suites in Braintrust. They read a LangSmith trace without asking for help. They reverse-engineer product features from business impact and customer expectations — fast. Above all, they have taste — the judgment to know what's worth shipping when capacity is infinite, and the discipline to kill what isn't.

"They own one number: number of bookings per day. The dashboard is the review. The throughput is the strategy."

A week in the life

Prototype → evals → ship → review. Every week.

Monday

Prototype

Prototypes the rebooking agent v2 in Claude Code. Builds a working demo of the new disruption threshold logic — no PRD, no Figma review. Evidence first.

Tuesday

Evals

Writes 20 evals in Braintrust against last week's failure logs. Tags the failure modes — wrong carrier preference, missed hotel loyalty match. Defines what "passing" looks like before shipping.

Wednesday

Ship

Ships the experiment to 10% of traffic. All evals pass. Disruption watch active on the new cohort. No ceremony — a PR, a deploy, a Loom for context.

Thursday

Review

Reviews the eval deltas — one branch regressed on multi-currency bookings. Kills that branch. Opens LangSmith traces to find where the profile agent dropped the currency preference.

Friday

Users

Talks to 3 users who hit the failure mode. Records with a phone. No UX researcher in the loop — they listen, they take notes, they write the next round of evals on Monday.

What they refuse to do

Four rituals this role has killed.

Each one was a governor on bad ideas. Remove the governor — find out if you have taste.

No traditional PRDs

Replaced by

Prototype in Claude Code instead. Evidence precedes documentation. If you can't build a rough version in an afternoon, the idea isn't clear enough to document.

No quarterly business reviews

Replaced by

Throughput metrics are live. The dashboard is the review. If someone needs a quarterly meeting to understand how the product is performing, the product isn't instrumented correctly.

No monthly business reviews

Replaced by

Real-time data kills the ritual. Bookings per day is on the screen right now. The ritual existed to compensate for latency in reporting — remove the latency, remove the ritual.

No 1-year roadmaps

Replaced by

Strategy is 90-day bets, battle-tested and re-set. A roadmap that spans 12 months is a fiction dressed as a plan. Ship, measure, reset. Three times a year beats one plan across twelve months.

The stack

Not Jira. Not Confluence. Not a roadmap deck.

The tools in the stack are the tools of a builder — not a coordinator.

Claude Code / Cursor

Prototyping features directly in the codebase. The PM builds the first version — not a spec for someone else to build.

Bolt / v0

Spinning up UI and marketing demos in hours. Ships a working front-end to test with users before writing a line of backend code.

Braintrust / Arize

Writing evals, catching hallucinations before they reach production. Evals are a product metric — built during development, not bolted on after launch.

LangSmith

Observability on agent traces. Knows where the AI burns budget and fails — reads traces without needing an ML engineer to interpret them.

Linear

Issues — not sprints. No velocity tracking. No burndown. Work exists or it doesn't. The only cycle that matters is prototype → ship → eval.

Loom + a phone

User research artifacts. Talks to three users on Friday, records it, reviews it Monday morning before writing the next round of evals. No UX research team in the loop.

Not in this stack: Jira · Confluence · Roadmunk · PowerPoint · quarterly planning decks

How they work with agents

One day. A full build cycle.

The pricing agent shipped a 3% markup experiment overnight. Here's what happened next.