December 1, 2025

Artificial Intelligence (AI) has cemented its position as the inescapable conversation in boardrooms, promising a revolution in efficiency, speed, and innovation. However, amidst the excitement—or the hype—a critical concern persists, especially among financially-minded executives: How can we be sure that the investment in AI is generating a real, measurable impact that justifies the budget, rather than superficial gains or inflated promises?

The key to mitigating this risk lies in the ability to establish reliable and rigorous baselines that allow for an objective comparison of pre-AI versus post-AI performance. Without this reference point, justifying tech expenditure becomes an exercise in faith rather than a strategic financial calculation—an unacceptable position in sectors where precision and development cycles are critical, such as Fintech.

The Challenge of Measuring AI's Real Value

The central problem is not the lack of metrics, but the abundance of them and the temptation to focus on those that sound impressive but lack a direct connection to business outcomes.

The Illusion of Vanity Metrics

It's easy to fall into vanity metrics that praise the AI's activity but not its effectiveness. Examples include:

  • Number of Models Deployed: A high number of models doesn't mean a high impact; they could be inefficient or focused on low-value problems.
  • Model Training Time: Reducing training time is a technical gain, but the real value is how it translates into faster product launches or service speed.
  • Isolated Algorithmic Accuracy: A model with 99% accuracy is useless if the marginal improvement in precision doesn't significantly reduce fraud losses or increase customer satisfaction.

The goal must be to distinguish between the technical "output" and the business "outcome." The output is the line of code generated by the AI or the model's prediction; the outcome is the reduction in development cycle times or the decrease in fraud losses.

Baselines and Structured Frameworks

To combat the hype, organizations must adopt proven measurement frameworks that force them to look beyond the surface. In software engineering, frameworks like GAINS (though search results suggest a synergy of metrics from various frameworks like DORA, SPACE, and other metrics covering essential dimensions) offer a useful structure, focusing on essential dimensions that are translatable to business results:

  • Throughput (Speed and Flow): How quickly is value being delivered?
  • Quality (Stability and Resilience): How reliable and maintainable are the systems?
  • Efficacy/Impact (Business Value): How does engineering contribute to the company's strategic goal?

These frameworks provide the categories for selecting metrics that, when measured before and after AI implementation, reveal the truth about the impact.

The Fintech Imperative: Baselines in Critical Cycles

The Fintech sector operates with frantic development cycles and high sensitivity to security, accuracy, and regulatory compliance. Here, a deployment error can have direct and significant financial consequences. Therefore, establishing pre-AI baselines is not an option; it is a risk and profitability imperative.

Critical Pre-AI Metrics for Fintech

Before introducing an AI tool (e.g., for code generation, testing automation, or fraud detection), it is vital to freeze the current performance metrics to have an unmoving reference point.

The Danger of Confirmation Bias

A common risk is confirmation bias, where teams, excited by the new technology, tend to overvalue superficial gains. For example, if an AI reduces the Cycle Time by 15%, but at the same time increases the Change Failure Rate by 5% (because developers over-rely on AI-generated code without proper review), the net impact is negative or, at best, neutral.

The baseline must be an impartial judge, forcing leaders to calculate the real and holistic ROI. A solid baseline requires that throughput (speed) metrics are balanced with stability (quality) metrics, preventing speed from compromising system resilience.

 

How to Establish Reliable Baselines in 5 Steps

Establishing a baseline that dispels the hype and ensures budget justification is a structured process that goes beyond a simple average of historical data.

1. Define Clear Business Objectives (GSM Framework)

Before selecting metrics, define the business goal. Is the AI seeking to increase customer retention, reduce fraud risk, or accelerate software delivery?

  • Goal: Increase the speed of feature delivery by 25%.
  • Signal: Teams are able to move code to production faster.
  • Metric: Cycle Time and Deployment Frequency.

2. Historical Data Collection (Minimum 3-6 Months)

A baseline requires historical data that captures the team's or process's normal performance without the AI.

  • Representative Period: Select a period (ideally 3 to 6 months) that is not biased by atypical events (holidays, seasonal traffic peaks, major outages).
  • Normalization: Calculate averages, medians, and standard deviations to establish an expected performance range. The median Cycle Time is often more revealing than the average, as it eliminates the impact of outliers.

3. Freeze the Baseline

Once calculated, the baseline must be documented, communicated, and approved by executive and technical stakeholders. This number is now the "standard of truth" against which the AI's success will be measured.

4. Post-Implementation Monitoring and Correlation

Once the AI is in production, monitoring must be continuous and focused on correlation.

  • Holistic Monitoring: Don't just measure the improvement in the target metric (e.g., Cycle Time), but also stability metrics (e.g., Change Failure Rate).
  • Differentiation: Use monitoring tools to ensure that positive changes in the metric (the "gain") are statistically attributable to the new AI tool, and not to external factors or seasonal fluctuations.

5. Calculation of Net Impact and Budget Adjustment

The final result must be a report that answers the executive's question: What was the net impact of the AI on business outcomes?

  • If the fraud Error Rate baseline was 10% and is now 5%, the impact is a direct loss reduction of 5%.
  • If the Cycle Time was reduced from 15 days to 10 days, this translates to 5 fewer days to capitalize on the competitive advantage of a new feature.

Only with this tangible evidence can a further investment be justified or, conversely, can the strategic decision be made to pivot or discontinue an AI initiative that is not delivering real value over the baseline.

 Conclusion: From Faith to Evidence

AI is a powerful tool for transformation, but its true value is only unlocked when managed with discipline and constructive skepticism. The executives' fear of investing in the hype is valid; it is a reflection of responsible risk management.

By establishing clear, immutable, and business-critical pre-AI baselines—such as Cycle Time and Error Rate in the Fintech environment—organizations can transform AI investment from a technological promise into an evidence-based financial decision. This practice not only secures the budget but also builds an internal culture that values real impact over superficial activity, a fundamental principle for sustained success in the age of AI.

Related