Nearly a decade leading product design at Waters Corporation — the world's largest pure-play scientific instruments company — across three intertwined workstreams: B2B e-commerce, a unified design system, and a model-assisted support triage tool now in pilot.
Waters builds the chromatography and mass spec instruments behind much of modern drug discovery and food safety. But the company sells two very different things: a $400 consumable column moves through a self-serve cart, while a $1.2M instrument moves through a sales conversation. For years, those paths lived in completely separate digital worlds.
Customers had to figure out which Waters they were dealing with every time they visited. The catalog, the marketing site, the account portal, and the support tool all looked and behaved differently — built by different teams, shipped on different stacks, governed by no shared visual language. A senior chemist placing a $50K reorder used three logins and four navigation patterns to do it.
The mandate, when I joined the digital team, was deliberately broad: make this feel like one company. What that turned into, over several years, was three workstreams that had to be done in roughly the right order — fix the front door first (e-commerce), build the language we'd use to fix everything else (design system), then start applying that language to the harder problems (support, AI).
The e-commerce side was where the friction was sharpest, so we started there. Decibel session recordings and Qualtrics feedback gave us a continuous picture of where buyers were getting stuck — not just where they were dropping off, but the moments where they paused, scrolled back, or opened a support chat. Three problems showed up over and over.
Customers routinely committed to orders before finding out what was actually in stock. Stock status lived two clicks deep in a separate "availability" tab; lead times only appeared after the order was placed. The pattern in session replays was painful and consistent: a chemist would build a cart of six items, click through to checkout, hit a wall, abandon, and call support to ask "is this actually shippable this week?"
We surfaced real-time stock and ship-window data directly in the cart row, with traffic-light visual treatment so it was scannable at a glance. Out-of-stock items now offer alternates inline rather than blocking the cart.
Waters sells tens of thousands of SKUs. A chemist looking for a specific column has to filter on chemistry, particle size, pore size, length, internal diameter, and pH range — and the legacy catalog made them do it by drilling through a tree of category pages, with no way to see how many results each filter would return until they clicked.
We rebuilt the catalog as a single faceted search surface with live result counts on every filter, query persistence in the URL, and an active-filter chip bar so customers could see exactly what they'd narrowed to and remove individual constraints without starting over. The biggest impact wasn't the filter UI itself — it was that procurement teams could now save and share a filtered view with chemists for approval.
The legacy checkout was a four-step flow: shipping address, billing address, payment method, review. Each step had its own page with its own validation pass. Session recordings showed a brutal pattern: customers reaching the review step, finding an error in their shipping address, and getting kicked back to step one with their cart preserved but their payment information cleared. Drop-off at the review step was the single largest leak in the funnel.
We collapsed the four steps into a single-page checkout with progressive disclosure — fields validate inline, sections expand and confirm as completed, and the order summary updates live. The whole experience now fits above the fold for most customers on desktop. Order-related support tickets dropped 28% within the first quarter, mostly from the elimination of "I lost my payment info" calls.
I underweighted procurement workflows in the first round. The chemist-as-buyer was easier to design for, but the actual high-value persona was the procurement coordinator placing repeat orders for an entire lab. I'd start the next round there — with workflows for purchase orders, multi-shipment splits, and approval chains — instead of bolting them on after launch.
Once the e-commerce work was shipping, the next bottleneck was obvious: every team was rebuilding the same components in slightly different ways, and engineering was burning sprint capacity on visual debt. The catalog had its own button styles. The account portal had its own modal. The support tool had three different table treatments. Nothing rolled up.
I migrated the existing scattered Sketch libraries into Figma and rebuilt from the ground up. Tokenized variables for color, spacing, typography, and elevation. Themable modes for the eventual dark and high-contrast surfaces. Over a hundred components, all built against real product work in tandem with engineering — not in isolation, not aspirationally.
The single most contested decision was insisting on a tokens-first architecture when engineering wanted to ship faster with hardcoded values. Their argument was reasonable on its face: we had a hard launch date, the token layer added a week of upfront work, and "we can refactor later." I'd seen what "later" looks like at three previous companies — it looks like never.
What I pushed for, and eventually got, was this: no hardcoded color, spacing, or type values would ship in any new component. Existing legacy code could stay until its next touch, but every new line would reference a token. I made the case in cycle-time math — the upfront week would pay back the first time we needed to add a dark mode, theme a partner property, or adjust the brand. We added dark mode six months later in two days. Without tokens, that would have been a sprint.
The library covers the surfaces every property actually needs: form controls, navigation patterns, data tables with sticky headers, modals, toasts, badges, and a dozen specialized scientific patterns (spec tables, part-number layouts, stock indicators). Every component ships with usage docs, accessibility notes, and a list of where it's currently used in production — so when someone proposes a change, we can see what would break.
The design-to-dev cycle time problem wasn't really a Figma problem. It was a handoff problem. Engineers were getting Figma frames, eyeballing the spacing, and writing CSS that drifted within two sprints. I worked with the engineering leads to define a spec format that linked every visual property in a component back to its token name — so handoff stopped being "make it look like the mock" and started being "implement these named tokens."
Combined with a contribution model that let any product team propose new components (with a lightweight design-review step before they joined the canonical library), the system stopped being a bottleneck and started being a force multiplier. Design-to-dev cycle time on covered components dropped roughly 40%.
I'd invest in visual regression testing in CI from day one. We caught most token drift in design review, but a small number of components quietly drifted in production for months because no automated check was watching them. Storybook + Chromatic would have paid for itself ten times over.
Waters fields hundreds of thousands of support cases a year. A surprising amount of that volume — by our analysis, well over half — concentrates around the same forty or so failure modes: baseline drift, pressure spikes, mobile-phase contamination, lamp aging, fitting leaks. These are problems an experienced lab manager could often diagnose in five minutes, but an early-career chemist will, reasonably, open a case for.
The triage tool gives customers two ways in, because we learned early that one path doesn't fit. Some customers know exactly what's broken and want the fastest path to a known fix. Others can describe symptoms but can't classify them — they need the system to help them name the problem before solving it.
The free-form path is the more interesting design problem. A chemist saying "baseline drift on UPLC after we swapped the mobile phase yesterday" doesn't want a chatbot personality — they want a fast read on what's likely wrong, ranked by probability, with a clear way to either try the fix or get to a human. The interface had to communicate the model's confidence honestly without making the customer do calibration math.
The honest answer about an AI feature in pilot is that the design questions and the model questions are intertwined, and we're still learning. A few things I'm specifically watching:
Calibration over precision. A 70%-confident match shown honestly is more useful than a 90%-confident match shown with no context — because the second one trains customers to ignore the system the first time it's wrong. The visual treatment of confidence is doing real work and we're testing several variants.
Escalation timing. The right moment to hand off to a specialist isn't when the model gives up — it's when the customer's frustration signal crosses a threshold. We're instrumenting that, but it's open how we use it.
Knowledge-base feedback. The triage tool is only as good as the support content it pulls from. We've started using mismatches as a signal back to the documentation team — the questions the model can't answer well are the gaps in the knowledge base.
I'd be lying if I said I had a clean post-launch number for this one. What I have is a pilot, a working hypothesis, and a measurement plan. The senior judgment call here was not pre-announcing a number — the temptation was to commit to a 35% deflection target in the kickoff deck. The tool will ship better because we didn't.
Hardcoding values to hit a launch date isn't a tradeoff between speed and quality — it's a loan with a high interest rate. The dark mode that took two days with tokens would have taken a sprint without them. I'll fight for the foundational layer every time.
Stock data existed before we touched the cart — it just lived two clicks away. Most of the conversion lift came from moving information, not creating it. The same lesson held for confidence in the AI triage UI.
Especially with AI, especially with B2B customers who'll burn an entire experiment if they trust the wrong recommendation. Showing the model's confidence was more important than maximizing it.
The components were the easy part. The hard parts were the contribution model, the deprecation policy, and the conversation with engineering about what "done" meant. The visual library was downstream of those.