×

A nuanced strategy on build vs buy

The phrase “build vs buy” describes the common business choice between using an off-the-shelf solution or producing one from scratch. K-Optional Software has experimented with various build-vs-buy strategies & has evolved accordingly:

  1. Build everything
  2. Buy whenever possible
  3. Buy with a migration strategy

We’ve cheered on the explosion of SaaS products over the past decade, believing that it has made system development more accessible and more competitive. After all, bespoke software development is mournfully opaque to the detriment of many great ideas, which is antithetical to K-Optional Software’s founding mission.

Here we present a case study on a specific build-vs-buy decision that we oversaw. While it highlights shortcomings of the “buy everything” part of the spectrum, it should surprise no one that a good thing has its limits.

Background

We’ve done work in the automotive appraisal / repair industry since 2018.

Natural disasters cause massive repair surges and backlogs of insurance claimants swell literally overnight. K-Optional consulted on a particular application that load-balances large influxes of vehicle claims.

We wrote software that, among other things:

  • Ingests tens of thousands of claims
  • Communicates appraisal options via automated text messages
  • Interfaces with a call-center that contacts unresponsive claimants

The application was also meant to schedule appraisals and repairs. The year prior, the client had piloted their own scheduling system on YouCanBook.me, a calendar-scheduling SaaS product; our requirements involved integrating with this solution and saving wheel-reinventing code.

Saas-integration topology

Here’s a simplified look at one slice of the v1 solution.

Saas-integration topology

To summarize, this application would communicate with claimants, kick them out to a scheduling link when they made a few selections, and let YouCanBookMe alert the system when an appointment was scheduled.

Early in the release of the first version, we observed the following trade-offs of the YouCanBookMe integration:

YouCanBookMe Integration: Pros

  • Offloading the complexities of time and timezones- there are a lot of them.
  • One less user-interface to design and build.
  • Appointment reminder emails for free!

YouCanBookMe Integration: Cons

  • Lack of control with regard to branding.
  • Inelegant & brittle iframe wrapper so that users wouldn’t be confused by the URL.
  • Zero validation input: we couldn’t help it if a claimant clicked the browser back button and scheduled twice.
  • Domain-modeling divergence.

I’ll elaborate on that last point, which is the basis for the general limits of SaaS-stringing. All in all, though, the integration was worth it at this point- it protected against scope bloat and helped us get to market quickly.

Rifts in SaaS domain-modeling

Domain-driven design is an essential principle for engineering software in the 21st-century. Essentially, it prescribes modeling software after the domain, i.e. the industry or business or system it serves. This is surprisingly not implicit when producing code. Think of a software application as an iceberg: the user-facing interface sits above water, dwarfed by the hidden mass of business logic and models below the surface. And failures in domain modeling tend to occur in the submerged majority.

User-facing application above the service, business logic below the surface.

User-facing application above the service, business logic below the surface.

As an example, consider the YouCanBookMe system: the platform refers to “booking calendars” as “profiles”. Profiles are segmented from other profiles and each of them may integrate with its own Google Calendar etc. I presume that this software originally served businesses accepting appointments on behalf of various personnel, i.e. lawyers at a law firm, where “profile” made contextual sense. YouCanBookMe also describes appointments as “bookings”- again, sensible for booking lawyers.

How we used YouCanBookMe in our system diverged from the platform’s internal representations, and therein lies the below-the-surface domain modeling rift.

Repercussions

So, how do breaks in domain models manifest themselves in production code? In ways that you couldn’t easily predict.

Since YouCanBookMe models itself after service-industry calendar management, we had to run against the grain to make it work within our application. For example, YouCanBook.me users pay on a per-profile per-month basis, but we had to spin up multiple calendars overnight for short stints; we were constantly having to adjust our billing plan manually to keep up with storm cycles (simply paying for more than necessary surprisingly didn’t work!). More seriously, little details that the service disregarded happened to matter tremendously to us.

Our system would provision a Google Calendar and create a corresponding YouCanBookMe Profile in one job. We learned the hard way that YouCanBookMe synchronizes time-zones up to a few hours after both are created. We would automatically spin up a calendar / booking page for an event, distribute it via SMS, and receive bookings in minutes, only for the time-zone to change on the booking page after people had scheduled. Double bookings would ensue as YouCanBookMe considered all previous appointments in a different time-zone. There was no way for us to set the correct time-zone from the start, forcing us to wait until the two synchronized which palpably weakened the value-add of the scheduler.

I think it’s safe to assume that professional service firm personnel didn’t immediately distribute calendars nor change them that often, an example of how a latent domain divergence may painfully rear its head.

Finally, with a SaaS dependency, you are beholden to API changes. A switch from 1-digit months to 2-digit months in YouCanBookMe’s incoming webhook date format necessitated emergency hot-fixes on our end.

The breaking point

The thing that made this topology completely untenable was actually a change in Google Calendar’s API. Starting in 2020 or so, an OAuth consent flow became necessary to share Google Calendars on behalf of an account- we had previously used a private service account to fulfill this function.

The system’s dependency on YouCanBook.me meant another dependency on Google Calendar and consequently we were better off shifting towards the “build” strategy.

The pull-request which excised YouCanBook.me and Google Calendar cut a solid 15% of the code-base and immediately improved performance and experience. Vestigial traces of these integrations still exist, though, since migrations tend to swap in-place.

Concluding advice on SaaS-Stringing

In the end, integrating with YouCanBook.me helped us launch a product quickly. The troubles came after launch as the system grew and thrived.

This experience has helped us identify a holistic strategy on “build vs buy” & SaaS-stringing:

Use existing solutions where possible under time and resource constraints AND ship with a service-migration assessment.

Our service-migration assessments are pretty informal but attempt to estimate the tech-debt associated with a dependency on a SaaS product. We answer questions like:

  • Will this product be around for a while?
  • Is this product’s domain model aligned with our own?
  • What would a migration from this SaaS product look like? What sort of resources would it necessitate?

Since implementing this strategy, we’ve struck a nice balance between launching quickly with minimal resources and ensuring long-term robustness.