GET /v1/inventory · 847ms · 12,440 results · LIVE POST /v1/booking · 1.2s · LH4478 BER→AMS · CONFIRMED AGENT disruption-watch · CDG delay +3h · ALT DRAFTED GET /v1/profile/ctx · 312ms · seat 14A · HIT PATCH /v1/booking/mgmt · upgrade exec · row 2C · DONE GET /v1/inventory · 847ms · 12,440 results · LIVE POST /v1/booking · 1.2s · LH4478 BER→AMS · CONFIRMED AGENT disruption-watch · CDG delay +3h · ALT DRAFTED GET /v1/profile/ctx · 312ms · seat 14A · HIT PATCH /v1/booking/mgmt · upgrade exec · row 2C · DONE

PM of the Future

The PM at Raile is not
what you think.

The execution governor is gone. Capacity is infinite. The only bottleneck left is taste — and judgment.

The thesis

"The Raile PM defines a sharp strategy, battle tests it, and defines success."

1:3 PM to engineer ratio. PMs fluent in APIs, security, and scalability. Focus is strategy, prototyping, and testing — not coordination. Not documentation. Not waiting.

A team running 13 times more experiments per quarter compounds its learning gap until it becomes insurmountable. That's not a prediction. It's already happening.

"
Most PMs were never actually bottlenecked by execution. They were bottlenecked by taste and judgment. Team capacity functioned as a governor that prevented bad ideas from shipping. Remove that governor and you discover who was driving and who was just steering.
— Gemini Head of Product

Open role

Role title
Product Builder —
Profile Management
Metric owned
bookings/day
Throughput. Not engagement. Not sessions. Not NPS. The number that matters is how many trips complete.

We hire operators. They build product architecture and define API SLOs before writing a line of copy. They ship production code in Cursor or Claude Code on Monday and have an experiment running by Wednesday. They write their own eval suites in Braintrust. They read a LangSmith trace without asking for help. They reverse-engineer product features from business impact and customer expectations — fast. Above all, they have taste — the judgment to know what's worth shipping when capacity is infinite, and the discipline to kill what isn't.

"They own one number: number of bookings per day. The dashboard is the review. The throughput is the strategy."

A week in the life

Prototype → evals → ship → review. Every week.

Monday
Prototype

Prototypes the rebooking agent v2 in Claude Code. Builds a working demo of the new disruption threshold logic — no PRD, no Figma review. Evidence first.

Tuesday
Evals

Writes 20 evals in Braintrust against last week's failure logs. Tags the failure modes — wrong carrier preference, missed hotel loyalty match. Defines what "passing" looks like before shipping.

Wednesday
Ship

Ships the experiment to 10% of traffic. All evals pass. Disruption watch active on the new cohort. No ceremony — a PR, a deploy, a Loom for context.

Thursday
Review

Reviews the eval deltas — one branch regressed on multi-currency bookings. Kills that branch. Opens LangSmith traces to find where the profile agent dropped the currency preference.

Friday
Users

Talks to 3 users who hit the failure mode. Records with a phone. No UX researcher in the loop — they listen, they take notes, they write the next round of evals on Monday.

What they refuse to do

Four rituals this role has killed.

Each one was a governor on bad ideas. Remove the governor — find out if you have taste.

No traditional PRDs
Replaced by

Prototype in Claude Code instead. Evidence precedes documentation. If you can't build a rough version in an afternoon, the idea isn't clear enough to document.

No quarterly business reviews
Replaced by

Throughput metrics are live. The dashboard is the review. If someone needs a quarterly meeting to understand how the product is performing, the product isn't instrumented correctly.

No monthly business reviews
Replaced by

Real-time data kills the ritual. Bookings per day is on the screen right now. The ritual existed to compensate for latency in reporting — remove the latency, remove the ritual.

No 1-year roadmaps
Replaced by

Strategy is 90-day bets, battle-tested and re-set. A roadmap that spans 12 months is a fiction dressed as a plan. Ship, measure, reset. Three times a year beats one plan across twelve months.

The stack

Not Jira. Not Confluence. Not a roadmap deck.

The tools in the stack are the tools of a builder — not a coordinator.

Claude Code / Cursor

Prototyping features directly in the codebase. The PM builds the first version — not a spec for someone else to build.

Bolt / v0

Spinning up UI and marketing demos in hours. Ships a working front-end to test with users before writing a line of backend code.

Braintrust / Arize

Writing evals, catching hallucinations before they reach production. Evals are a product metric — built during development, not bolted on after launch.

LangSmith

Observability on agent traces. Knows where the AI burns budget and fails — reads traces without needing an ML engineer to interpret them.

Linear

Issues — not sprints. No velocity tracking. No burndown. Work exists or it doesn't. The only cycle that matters is prototype → ship → eval.

Loom + a phone

User research artifacts. Talks to three users on Friday, records it, reviews it Monday morning before writing the next round of evals. No UX research team in the loop.

Not in this stack: Jira · Confluence · Roadmunk · PowerPoint · quarterly planning decks

How they work with agents

One day. A full build cycle.

The pricing agent shipped a 3% markup experiment overnight. Here's what happened next.

09:00
Opens the eval suite in Braintrust
Spots a regression: the pricing agent has miscalculated for bookings where the traveller pays in a second currency. 12 failures out of 140 evals. Not in production yet — caught before it got there.
10:30
Writes 4 new evals capturing the failure mode
Three currency combinations the suite wasn't covering. One edge case from a LangSmith trace spotted at 09:45. The failure mode is now defined — not assumed.
12:00
Ships a fix in Cursor. All evals pass.
The PM wrote the fix — not a ticket, not a handoff. Opens a PR. The fix is in review by 12:20. Deploys by 12:45.
14:00
Re-runs traffic on the 3% markup experiment
Experiment was paused during the fix. Now back to 10% of traffic. Throughput numbers in the dashboard — bookings per day ticking up as expected.
16:00
Reviews LangSmith traces. Regression confirmed gone.
All currency combinations passing. Zero anomalies in the trace. Communicates the result clearly in Loom: "This feature is right 100% of the time now. It was 91% this morning." Ships the Loom to the team. No meeting needed.

"This feature is right 90% of the time. Here's what happens the other 10%." — The PM communicates uncertainty clearly. It builds trust. It's also how you write the next round of evals.