A few months ago I was reviewing a pull request that introduced a caching layer in front of a service. The author's justification was: "this endpoint is slow, it takes 3 seconds." I asked when that was last measured. Turns out the benchmark was from 18 months ago, before a database migration that improved query performance by an order of magnitude. The endpoint now responded in 200ms. The caching layer was solving a problem that no longer existed.
The decision to add caching was reasonable at the time the evidence was collected. But the evidence expired, and nobody noticed. The decision survived its own justification.
This happens more often than we admit. We treat architecture decisions as permanent artifacts: "we use microservices because of X," "we chose Postgres because of Y," "we split this module because of Z." These decisions get documented (if we are lucky) in ADRs or RFCs, and then they sit there, untouched, while the world around them changes. The benchmark gets stale. The team composition shifts. The framework we rejected matures. The constraint we designed around gets removed.
The decision stays. The reason leaves.
Decisions are coupling
I've written before about how coupling is fundamentally about shared knowledge between components. An architecture decision is a similar kind of bond, but between the present and the past. When you decide "we will use event sourcing for the order system," every future change to that system is constrained by that choice. You are coupled to a decision that was made under a specific set of conditions.
This coupling is useful when the conditions still hold. It becomes technical debt when they don't. And unlike code coupling, which you can detect with static analysis, decision coupling is invisible. There's no linter that tells you "the justification for this architectural choice expired 6 months ago."
Not all evidence rots at the same speed
What I've noticed is that the reasoning behind architecture decisions is never monolithic. It's always a mix of different types of evidence, and each type has a different shelf life.
Empirical evidence — benchmarks, load test results, production metrics, incident post-mortems. This is the most convincing type of evidence, but also the fastest to expire. A benchmark from 18 months ago is almost certainly measuring a system that no longer exists. Infrastructure changes, traffic patterns shift, dependencies get upgraded. I'd say empirical evidence has a useful life of 6 to 18 months, depending on how fast your system evolves.
Theoretical reasoning — established patterns, well-understood trade-offs, things we know from computer science. "CQRS separates read and write concerns" is as true today as it was 10 years ago. This type of evidence decays slowly, on the order of years, and usually only when a paradigm shift makes the underlying model obsolete. But even theoretical reasoning can expire: "relational databases can't handle our write throughput" was true for some teams in 2015, less true after improvements in Postgres and CockroachDB.
Judgment — "in my experience, this approach works." This is the most common form of evidence in practice and the hardest to evaluate. It's highly context-dependent: it depends on who said it, what their experience was, and whether the current team shares that experience. Judgment evidence decays at a moderate rate, but it decays discontinuously: it drops to near zero when the person who made the judgment leaves the team. The reasoning stays in the ADR, but the intuition behind it walks out the door.
Constraints — regulatory requirements, organizational mandates, budget limitations. These are the most stable, but they're also the ones most likely to be wrong in the first place. "We can't use AWS because of GDPR" might have been true before the Data Privacy Framework, and false after. People rarely go back to verify whether constraints still apply.
A decision is only as strong as its weakest reason
Here's the thing that changed how I think about this: when you have multiple reasons supporting a decision, the decision's real strength is not the average of those reasons. It's the minimum.
Consider a team that chose to split a monolith into microservices. Their reasoning:
- Benchmark: the monolith couldn't handle projected load — high confidence
- Pattern: microservices enable independent deployability — high confidence
- Team readiness: the team has experience with distributed systems — low confidence
If you average these, you get a moderately confident decision. But in practice, the decision will succeed or fail based on the weakest link. If the team doesn't have the skills, it doesn't matter how good the benchmark is or how sound the pattern is. The architecture will rot because the team can't maintain it.
This is the same principle as in safety engineering. A bridge doesn't fail at its strongest cable. A system doesn't fail at its best-understood component. When I evaluate an architecture decision now, I look for the weakest supporting reason, not the strongest. That's where the risk is.
Decisions rot silently
What makes this insidious is that decision decay is gradual. It's not like a dependency that breaks your build. It's more like a slow leak. The benchmark gets a little more stale each month. The team member who understood the trade-off gets a little further from the codebase. The constraint gets quietly lifted in a policy update nobody reads.
And then one day you're adding a caching layer for an endpoint that's already fast, or maintaining a distributed system for a team that would be more productive with a modular monolith, or avoiding a framework that solved all its old problems two major versions ago.
I think a useful mental model is exponential decay. Empirical evidence halves in confidence roughly every 6-12 months. Theoretical reasoning halves every 5-10 years. Judgment halves every time the team changes significantly. Constraints should be re-verified on a fixed schedule, say annually.
You don't need to formalize this into a system (though you could). What you need is the habit of asking: when was this decision last validated, and has anything changed since?
What this looks like in practice
Here's a lightweight way to track decision freshness. If you're already using ADRs, add three fields:
# ADR-007: Use event sourcing for the order system
**Status**: Accepted
**Date**: 2024-03-15
## Evidence
| Type | Claim | Confidence | Last Verified |
|-------------|-----------------------------------------------|------------|---------------|
| Empirical | Order volume will exceed 10K/sec by Q4 2024 | 0.9 | 2024-03-15 |
| Theoretical | Event sourcing provides full audit trail | 0.95 | 2024-03-15 |
| Judgment | Team has experience with Kafka and CQRS | 0.6 | 2024-03-15 |
| Constraint | Financial regulations require transaction logs | 0.99 | 2024-03-15 |
## Review Schedule
- Empirical claims: re-verify every 6 months
- Judgment claims: re-verify when team composition changes
- Constraints: re-verify annually
## Decision Strength
Minimum confidence: **0.6** (team readiness)
The decision strength is 0.6, not 0.86 (the average). The weakest link is team readiness. That tells you where to invest: training, hiring, or pairing, not better infrastructure.
Six months later, you revisit:
## Evidence (Updated 2024-09-15)
| Type | Claim | Confidence | Last Verified |
|-------------|-----------------------------------------------|------------|---------------|
| Empirical | Order volume will exceed 10K/sec by Q4 2024 | 0.5 | 2024-09-15 |
| Theoretical | Event sourcing provides full audit trail | 0.95 | 2024-03-15 |
| Judgment | Team has experience with Kafka and CQRS | 0.8 | 2024-09-15 |
| Constraint | Financial regulations require transaction logs | 0.99 | 2024-03-15 |
## Decision Strength
Minimum confidence: **0.5** (traffic projection uncertain)
The team got more experienced (0.6 → 0.8), but the traffic projection hasn't materialized as expected (0.9 → 0.5). The weakest link shifted. The decision is now questionable for a different reason than you expected 6 months ago. Without tracking, you'd never notice.
Connect this to your team's workflow
I don't think most teams need a formal framework for this. What they need is a cultural practice: every architecture decision has a review date, just like every subscription has a renewal date. When the date comes, you spend 30 minutes asking:
- Is the empirical evidence still valid? Has the system changed?
- Is the theoretical reasoning still sound? Has the ecosystem shifted?
- Does the team still have the skills and context for this decision?
- Do the constraints still apply?
If the answer to any of these is "I don't know," that's your signal. You have a decision standing on expired reasoning.
I built Lore to capture the intent behind code changes, the reasoning that gets lost between the commit message and the next developer's confusion. But intent alone isn't enough. Intent captures why we decided this. What's missing is: is the why still valid?
The best architecture decisions are the ones you revisit. The worst are the ones nobody questions because they've been there so long that they feel like facts. They're not facts. They're opinions with expiration dates.