Why Squash is the Best Way to Merge

Beware: opinions ahead.

Principle 1: All commits from the main branch should be deployable

Need to roll back Production? Where should I roll back to? When commits are squashed, it’s reasonably easy to see what to do: go back one commit and re-deploy. When commits are not squashed it becomes harder to see where the last “good state” was – were there commits from the last PR that were good, should we revert everything? When squashing commits this search simply isn’t necessary.

Principle 2: Pull Requests should contain a single, cohesive piece of work

If we follow Principle 2, then after the PR is merged there should be no need to deal with different parts of it separately. It should be treated as a unit. The simplest way to enforce this rule is to cram all of the changes into a single commit once it hits main.

Principle 3: Development should allow for experimentation

I’d like to be able to try out multiple approaches on a branch, iterate on my solution, and incorporate feedback without force-pushing, rewriting Git history, or engaging in anything potentially complex in managing my Git history. I’d like to see CI run and fail, and be able to handle the consequences without worrying about polluting main. When I’m reviewing code, dealing with a force-push divorcing my comments from the code they were about can be disconcerting. Development branches should be allowed to be the Wild West, but we want to contain that chaos. The best way to keep it contained is to squash the final result before applying that to main.

Principle 4: Rebasing from main should be a simple process

When I want to rebase from main, I don’t want to walk through every single commit made by every other developer. I want to apply the PRs as units and deal with the consequences. Rebasing every little commit from every development branch is a waste of time and energy, especially for commits that were eventually walked back.

Principle 5: Merging from main should not pollute the history

Sometimes rebasing is more of a pain than it’s worth. I want to be able to merge main into my branch without worrying about generating a big ugly merge commit that then becomes visible on main after merge. This is related to development branches needing to be a bit wild west, but it’s important: I’ve seen developers waste days on a rebase when a merge would have been 10 times faster. We should take whatever velocity gains we can. Git is a very smart merge tool – why give up that advantage?

Conclusion

Squashing merges helps the whole development team move faster. It’s worth enforcing this merge strategy on repositories across the organization.

10 Reasons not to Log To Your Database

1. Your database is probably not designed for the workload

Most databases are designed for a high-read, low-write use-case, with some exceptions. These exceptions are not traditional databases and are not the main point of this article. If you are using something like MySQL, Postgres, Oracle, or similar, your DB is optimized to support many reads and fewer writes. Logging to this type of database is an unnecessary performance hit.

2. Keeping infrequently accessed data in a database is expensive

Either the database is hosted on owned or rented hardware or it is deployed via a cloud service provider. Any way you slice it, putting a lot of infrequently accessed data into the system will balloon costs for storage, RAM, and CPU/IO to support the writes. Other systems can support storing this data more cheaply – look, for example, at Athena/S3 or Redshift Spectrum.

3. Once other systems rely on the location of the data it is hard to move

Analysts will want to build graphs, charts, reports, and make business decisions based on user behavior. If user behavior is primarily recorded in the production database, that is where they will go to find the information. Once a substantial number of reports leverage the data, it will be increasingly difficult to move or remove the data source, and it will require more and more buy-in from various teams. Technically, moving a large amount of semi-structured data to a new warehouse can also be challenging.

4. Security concerns

Most web frameworks support log filtering quite well. Most systems that write logs or events to the database circumvent these utilities. It’s far too easy for passwords, credit card numbers, and other information that shouldn’t be stored in the clear to be written in by mistake. Once that information finds its way in, it can become impossible to remove due to backup approaches and the simple dynamics of large tables. Speaking of that…

5. It can be hard to delete a large volume of data from a database

Large deletes can lock tables – tables that break production if they aren’t writable.

6. Data retention policies are harder to implement in the context of a transactional DB

Because of Point #5, it can be difficult to enforce a data retention policy. If the schema is really badly designed, it can be impossible. Consider the difficulty also of clearing data from historical database backups – this is far from trivial.

7. Mixing analytical queries with application queries can lead to brownouts or downtime

A large analytical query that joins across a number of tables can starve production workloads of resources leading to brownouts. Running those on a system ETL’d out of your production DB? Well, then I guess you don’t need that log data in the production DB!

8. Migrations of the table you’re logging to become untenable quickly

Migrations on massive tables are potentially dangerous and generally slow. Add a high write throughput and the situation gets much worse. It’s possible to easily run out of space on your database cluster with solutions that copy data into a new version of the table to prevent locking.

9. You will have a lot of garbage data to sort through to find what you need

Most of what’s captured in logs is useless. Why store it in your expensive, highly-available production database?

10. There are better tools for the job

See: Kafka, Kinesis, S3, Athena, BigQuery, Apache Druid, Presto, Snowflake, ElasticSearch, DataDog, Amplitude, Mixpanel, Logly, Hive, Segment.io, and Clickhouse, just to name a few technologies that can help you solve this situation that are top of mind right now.

When to extract or introduce a service

In my mind, there are a small number of legitimate reasons to split out a new back-end service in an existing architecture. Here they are:

  1. The new functionality would be easier to build using a different tech stack
  2. The scaling requirements of the new functionality are different than the current system
  3. The new functionality requires different access controls than the rest of the system
  4. A different deployment method or frequency would be better for the new functionality.

Or, stated in pain points:

  1. I can’t build this in the language we’re using
  2. This thing won’t be performant enough if I add it to our current system
  3. No way in heck am I giving our current system unfettered access to X
  4. I have to deploy 137 times a day and our current CI workflow takes an hour

There are a lot of hard things about building distributed systems. These problems include: eventual consistency, serialization overhead, network latency, segmentation tolerance, distributed consensus, service discovery, schema synchronization, ordered changes, inter-service authentication, service-to-service encryption, request loops, event loops, event ordering, lions, tigers, bears, sharknados, and most terrifyingly XML if you do it wrong. New services should not be split out lightly – often the pain of solving all the issues of distributed systems is worse than the pain of working in a crufty old codebase. Introduce network boundaries only with great thought and consideration.

Slow is Smooth and Smooth is Fast – when it’s not Just Plain Slow

This is a saying that makes the rounds every so often, and there’s a kernel of truth in it. It comes out of the military, and it has multiple meanings.

The first is that a group of people can only do something so fast without messing up. If you run as fast as you can, you’re more likely to fall. If you try to do something really quickly and skip steps, you’re more likely to make mistakes. In the military, mistakes can be really big problems. People can be hurt or die. Officers don’t like that. It looks bad on a report.

The second meaning is that you should practice slowly so that your motions become smooth. Then when you speed up your motions, they’ll still be smooth. This makes sense to me as a classical musician. It makes less sense as a software engineer, so we can safely ignore this definition for the rest of the blog post.

Usually when someone says “slow is smooth, smooth is fast”, they’re recommending caution or justifying taking the time to do something properly. Another similar saying is “haste makes waste” – the idea is that by ensuring something is done correctly you’ll avoid having to do it twice. In a lot of situations that’s good advice. Rushing to get code onto production and skipping steps in the normal process to save X amount of time will result in a Y% chance of a bug in production which will take Z amount of time to resolve. If X < Y * Z, the recommendation is correct, and “slow is smooth, smooth is fast.” However, that test does not always evaluate to true. For some changes, a full QA cycle or whatever the normal process is will take longer than just testing on production. In those cases, slow is just slow, and we should ship the thing without the extra fuss.

So, then, how are we to know the difference? We need to understand the level of risk we’re taking on in making the change, as well as the cost to remedy any bug that might be introduced. The level of risk is the likelihood that we’ll introduce a bug. The better understood, tested, etc. a piece of code is, the lower our risk of introducing an issue. The more legacy, crufty, convoluted, or complex the code is and the fewer automated safeguards we have in place generally the higher the level of risk becomes.

The cost to remedy represents the severity of issue we think we might cause. Some bugs are quite severe: Shipping a database migration that drops the wrong column (or table, God forbid!) is potentially disastrous, bad algorithmic trading code can bring a billion-dollar company to its knees (this example is from real life – Google it). In these cases the cost to remedy the issue can be very high, sometimes more than the organization can pay. Others bug scenarios are not so dangerous: a copy update on a static HTML page has limited capacity to cause harm, same with a tweak to a well-understood bit of application code that isn’t mission-critical. For the latter two, the resolution only involves fixing the code and pushing it to production, and that’s probably the end of it. A few minutes or an hour’s work at most.

As we try to make the right call about the level of risk and the cost to remedy, we also need to have a feel for our own level of certainty in our assessment of both. Are we missing something? Is that application code really as simple as it seems? Maybe it’s reused somewhere else and we’re going to create an unintended side-effect. Maybe that copy tweak is part of a contract. Maybe there’s some unknown unknown that will bite us. If we know the code like the back of our hand, we can be more certain of our assessments, but even then we can’t be totally sure. So, when in doubt (which is most of the time), falling back to “slow is smooth, smooth is fast” is the right call. But, if the risks are low, go ahead and skip that 3rd round of QA. You really don’t need it.

Hello World

I have decided to start a blog (again) because I am apparently a glutton for punishment. I’ve started blogging before but it has gone nowhere fast in the past. This time I’m doing some things differently:

  1. I am not writing code in order to blog. I grabbed WordPress and installed a template.
  2. I will not be obsessing over the look of this site. I will be writing content.
  3. This blog is tied to me personally, not a business I am trying to start.
  4. I will be writing about what I’ve seen and learned during my time in software, not commenting on the state of the industry in general.
  5. There is no posting schedule. I will write, but I might write 6 posts in one week and none for the next month.
  6. I want this to be fun, so it won’t be heavily edited or revised. A few tweaks then post it, no review process or heavy reworking.

I hope you read this someday because you found some value here and waned to see what the first post was. If that’s the case, drop me a message. I’ll know I’m not just shouting into the void that way 🙂

Will