Trying to understand… ‘Linux Server Operating Systems: Red Hat Enterprise Linux and Beyond’

thumbnail

Welcome to “Trying to understand…”, a series devoted to me trying to understand a piece of media (usually a blog post or technical article) until I feel confident that I generally understand it.

Introduction

In a previous post, I wrote

If we want to be someone who knows a lot and doesn’t make mistakes, we have to start making more mistakes. Ask all the dumb questions we’ve accumulated in our head. We have to pay our debt before we can move forward. And honestly, it’s not always a nice debt to pay, but we must suck it up and pay it anyway.

Time to pay the piper. I’ll admit a truth that I’ve been hiding for a while now: I have no idea what the hell Red Hat sells or how they make money. While not the gravest of sins, I’ve worked with Red Hat technologies before, so I feel like I should have some idea what the company does.

Anyway, I stumbled on this article on The New Stack discussing the different Linux Server OSes available. Seeing Red Hat Enterprise Linux being featured in the title, I decided to figure out why Red Hat has an OS and how they make money. As with many things I look into, I jumped down a bigger rabbit hole than I expected. But why ruin the surprise. Let’s go!

What is an enterprise Linux distribution?

Many companies nowadays run large computers (cue fanfare for very obvious fact). Banks have to process financial transactions, companies have to do payroll, internal communication systems have to stay up, etc. These computers have to run an Operating System.

Let’s say you’re a company trying to figure out which OS to install on your computers. There are three main Operating System families for computers: macOS, Windows, and the Linux family. If you want an operating system for a mainframe computer, that leaves only Windows and Linux (macOS only runs on Apple computers, which are designed for consumers not enterprise). Windows, the operating system developed by Microsoft, charges a license fee for every use of the OS. Your other option is Linux, which is free and open source. Sounds like a perfect fit!

Except there’s one problem: Linux isn’t actually an Operating System. Linux is a kernel, which is part of an operating system, but not the whole thing. Here’s a great answer to why you can’t really just use Linux by itself. GNU is a set of OS tools, that in conjunction with a kernel (usually Linux), makes a full operating system. Technically, GNU also includes its own kernel as well, but it’s pretty bad so everyone just uses Linux. This fact is why pendantic individuals will say their operating system is GNU+Linux. Great! Now you can just use Linux+GNU, except there’s another problem: Linux+GNU by itself is kinda…meh.

The more technical term for “meh” is “unopinionated.” All the functionality is there to make a computer work, but many of the nice user programs, such as a text editor or web browser, aren’t installed. You could install them from a package manager, except native Linux+GNU doesn’t have a package manager either! What you’d really like is Linux+GNU, with some pre-installed useful programs, a package manager that guarantees all installing programs will work with your OS version, and other configurations to make your life easier. Good news! These OSes exist. They are called “distributions”, or “distros” for short, because they’re meant to be distributed to the end user.

Distros exist for all types of use cases, from casual users to large mainframes. That’s part of the benefit of separating distros from Linux+GNU: improvements can be made to Linux+GNU core to benefit all distros, but a distro choosing to do something one way (for example, a causal user distro) doesn’t impact other distributions.

Popular Linux distros for enterprise include:

RHEL
openSUSE
AmazonLinux2

Understanding RHEL vs CentOS vs CentOS Stream

Now that we have that background, we can understand the premise of the article: If you’re a company who needs to decide which free Linux enterprise distro to use, you need to know the differences between them. Simple enough, right? Well, it is until you get a couple sentences into the article, where youll encounter:

A few years back, things were relatively simple regarding Red Hat Enterprise Linux (RHEL). If you needed support, you’d get a contract with Red Hat. If you didn’t, you’d run the community RHEL distro, Community Enterprise Operating system (CentOS).

Things have changed.

In 2014, Red Hat “adopted” CentOS, hoping to convert its users into RHEL customers. That didn’t work. So in 2020, Red Hat changed CentOS from being a stable RHEL clone to being a rolling Linux release distro, CentOS Stream. That’s not the same thing at all. As one user said, “The use case for CentOS is completely different than CentOS Stream. Many, many people use CentOS for production enterprise workloads, not for dev. CentOS Stream might be OK for dev/test, but it is unlikely people are going to adopt CentOS Stream for prod.”

You’d be forgiven for being confused. What’s CentOS Stream? Why is it okay for dev/test but not for prod? And what does “being a rolling Linux release distro” mean?

To clarify the relationships between the parts, I threw together a little diagram.

rhel-timeline

You can consider these to be “streams” that flow from the source, in the top left, down. The source for Linux distros is always Linux, GNU, and some other utilities. As improvements are made to these technologies, the improvements flow down the stream to the distributions. So a distribution “downstream” will receive the updates from “upstream”. Sometimes, if a downstream distro makes a code improvement that would be useful for other distros as well, the maintainers will re-write it for the upstream technology so other distros can receive the improvement as well.

In 2014, Red Hat had a three main distros:

Fedora - Fedora is the most upstream distribution for Red Hat. It’s known for rapid innovation and having the latest and greatest features. It received a lot of updates, but could contain bugs as well. As such, it’s largely used by individual users who want the latest features and don’t mind testing the features in return.

Red Hat Enterprise Linux - Once a set of changes have been in Fedora for a while, they will get bundled up and incorporated into RHEL. Therefore, RHEL will be behind the upstream Fedora in features, but is much more stable.

CentOS - CentOS was an existing, free downstream distro of RHEL. In 2014, Red Hat took ownership of CentOS, promising to continue working on it.

In 2020, Red Hat introduced a fourth distro:

CentOS Stream - A distribution downstream of Fedora but upstream of RHEL. Despite having the CentOS name, CentOS Stream differs from CentOS because it doesn’t have the same stability guarantees as CentOS.

Red Hat also announced that development would stop on CentOS and switch to CentOS Stream, with CentOS now having an End Of Life of 2024.

I’ll include a full timeline at the bottom, which maps the trajectory a bit better.

How Red Hat makes money off RHEL and why they adopted CentOS

First, it’s important to note that both Linux and GNU are licensed under the GPL, or GNU Public License. It’s a legal license that basically says “hey, if you use this open source code, then any of your code that uses our code has to be open source as well”. Therefore, all Linux distros, by virtue of using Linux+GNU, have to use the GPL as well. So how do these companies make money then?

The GPL does not prevent the selling of your product. However, it does require that anyone you sell your product to, if they ask for the source code of the product, you have to give it to them. Practically, this requirements makes it hard to sell your product if you can get the source code from someone else for free. Many companies will provide the operating system for free and sell services on top of it. For example, they might

sell consulting services where they help you configure and troubleshoot problems with the OS
sell software tools that are specifically designed to work well with the OS, or they might charge other software makers a fee to be “certified” as reliable on the OS.

While this might not matter for a individual user, who’s happy to install whatever on their personal laptop, large companies require guarantees that certain software will be reliable and safe.

OpenSUSE, a competitor to RHEL, adopts a similar approach

openSUSE at it’s core is completely FOSS, but they do make it easy to access nonfree packages in a separate repo. Their approach is the same as Debian, both of which are not endorsed by FSF because of the nonfree ease of access.

from here.

Because RHEL is licensed under GPL, because it uses Linux+GNU, it was always easy to create downstream versions of it. The most popular one was CentOS, which had ~30% of the market share in 2014. Red Hat decided that by adopting CentOS under their umbrella, they could nurture it and grow the community. Additionally, by controlling CentOS development, they could minimize the differences between the two OSes (such as binary compatibility). That way, customers would be attracted by the free CentOS and once they wanted more support they would upgrade to the paid RHEL. Except they found that companies enjoyed free stuff too much and weren’t actually upgrading to the paid RHEL, so they decided to kill it off. Or rather, they weren’t making enough money off CentOS to justify continuing to develop it. There were also some complaints about CentOS

The most frequent and most vocal complaints about CentOS have always been that work on the distribution happens behind closed doors. Users have no visibility into its status. “It’s ready when it’s ready.” CentOS lags behind RHEL, sometimes for a very long time, and we don’t know how long that will continue during any given lag. CentOS Stream fixes the biggest problem that CentOS had always had. Development will happen in the open, and I believe that we (Fedora maintainers) will be able to participate in that work.

from here.

Still, it was not a popular move in the open source community.

Where Red Hat starts to get into a legal grey area in 2020

As we’ve already covered, Linux is licensed under the GPL license. RHEL uses Linux code and therefore must comply with the GNU license. So where’s the grey area?

In 2020, alongside switching from CentOS to CentOS Stream, Red Hat updated their terms of service. I’ll just quote this good Reddit comment on the topic:

RHEL’s source code is now no longer publicly available to anyone who wants it - it’s only available to Red Hat’s customers. This is perfectly legal because the GPL only requires you to give your source code to the people you directly give binaries to, and I personally have no problem with this.

The real controversial element here, and the change I significantly disagree with, is the addition of a clause that allows Red Hat to terminate your contract if you share RHEL’s source code. While this doesn’t violate the GPL because the GPL doesn’t guarantee access to future versions of a program’s source code, only the version you were given, this move does violate the spirit of free software - this entire community is built on being able to freely share!

It’s open for debate if this move is 100% legal. The Software Freedom Conservancy’s released an analysis of the situation. At present, companies running CentOS in 2024 will need to either upgrade to the paid RHEL version, upgrade to the less stable CentOS Stream version, or switch Linux distributions, including dealing with the software pains associated with that.

There are a few projects that are trying to continue working on CentOS. The article mentions AlmaLinux and Rocky Linux. However, now that most of the CentOS developers are now Red Hat employees, it’s unclear if there will be a large enough volunteer community to maintain these distributions.

Conclusion

And that’s pretty much the end of the article! A simple question of “How does Red Hat make money?” combined with an article on “How to pick the right linux distro for your company” landed us in one of the biggest ongoing open source controversies. Who would have thought. I think the article didn’t cover openSUSE as much as it should have, considering it’s another market leader in enterpise Linux distributions, but I suspect the author was focused on binary compatibility with existing RHEL-esque distros.

Timeline of RHEL and CentOS

2000

Red Hat creates RHEL (Red Hat Enterprise Linux), a downstream distro of Linux, GNU, and a few other utilities.

2003

Red Hat creates Fedora, a distribution used to test new features before they get incorporated in RHEL. Fedora becomes downstream of Linux+GNU and upstream of RHEL. The idea is users wanting the latest and greatest features can run Fedora, and in exchange Red Hat can test new features and changes on a real user base. Updates can happen frequently, and once a set of updates has geen tested and approved, they’ll be merged downstream into the more stable RHEL.

2004

CentOS is created as a downstream of RHEL, which is open source because of the GPL.

2010

CentOS becomes the most popular webserver distro, with ~30% adoption (Debian retakes the lead in 2012, but CentOS maintains a healthy market share)

2014

Red Hat starts sponsoring CentOS, Red Hat gains ownership of CentOS trademark. Red Hat sets up a governing body for CentOS

2020

Red Hat shuts down all work on CentOS in favor of CentOS stream

2024

CentOS reaches EOL

A brief history of Unix, Linux, and kinda everything else

I started writing this as background to the article as to why OSes are necessary for computers, then realized it largely wasn’t necessary. In the future I might cut this out and repurpose it for another writing piece, but in the mean time I’ll just leave it below.

Note: Like all of history, this history is not strictly necessary for understanding the technical content in the article, but is important in the forces that shaped and continue to shape these business and social decisions.

Back when dinosaurs roamed the earth and everything was so new we didn’t have a name for it, also known as the 1950s, computers were large calculators. They were given a math program on stacks of punch cards, one at a time, and solved it. This led to people standing in lines, waiting for their turn to run the calculations on the computer. Two of the most intrinsic properties we take for granted in computers, parallelism and interactivity, are actually not intrinsic at all. They had to be created. But the software to handle the quick logic was too complex for the hardware at the time.

Over time, computer hardware became more powerful. Not good enough for people to have their own computers, let’s not go crazy here, but good enough that multiple people could load their programs into the same computer and the computer would switch between the calculations. The computer could still only run one set of calculations at a time. But if your program had a typo in line 3, you’d find out after 10 seconds rather than waiting in line for 4 hours. This concept was called “time sharing”, and used a program called an “operating system” to switch between the calculations. Even today, CPUs can only run one program at a time. Modern computers just switch fast enough between programs that it seems like multiple things are happening in parallel. When coupled with progress in interactive technologies like keyboards and teletypes (the precursor to computer screens), which allowed computers to become near-real time interactive, these operating systems multiplied the productivity of those using the computer.*

In a parallel and simultaneous story thread, after Alexander Graham Bell invented the telephone, he started AT&T (American Telephone and Telegraph company). The United States, founded on the idea of free economy, largely doesn’t regulate companies. The exception is a set of “anti-trust” laws, which are laws to prevent monopolies in the economy. The reasoning is competition drives innovation and reduces prices, and monopolies are interently anti-competitive. However, exceptions can be granted for what the government considers to be “natural monopolies”, or companies that inherently make sense as monopolies. One example of this is a telephone company, which due to the technical complexity of connecting the country together electronically (this was before standard telephone protocols were invented), needed to be handled by a single company. The US government gave AT&T an exception to the monopoly rule. In consession, AT&T was not allowed to go into the computing business and any inventions patented by AT&T must be freely licensed by other companies. If any of these rules were violated, the US government would break up AT&T. Turns out, monopolies on communications is good business and AT&T made a lot of money. In turn, AT&T had a combination of money, a guaranteed steady stream of future income, and the business need to constantly improve their technology. So they founded one of the greatest research labs ever known: Bell Labs.

This point in the story is where, dear reader, our two narrative threads intersect. As previously mentioned, time sharing systems made computers easier to use and improved productivity. One of the creations/inventions of Bell labs was the Unix operating system. Unix was was potentially the first widely-adopted operating system. As mentioned above, AT&T were barred from the computing business and had to license all their patents for free. They could still use Unix internally to boost productivity, but they were also required to license the OS to one of the other major users of computers: universities. Students loved it. Eventually, due to government changes, the US government considered AT&T to be stifling fair competition and revoked their anti-trust exemption, and were subsequently broken up. AT&T, in turn, revoked the Unix license from universities and started charging money for it (because they were no longer barred from doing so). Students, and former students who were now working in the computing industry, were not happy. It’d be like revoking the right for people to use the computer mouse.

Two major changes came from this:

BSD/GNU. Students at UC Berkeley, a storied university in the history of computer science and one of the first three users of the internet, decided to remove all AT&T copywrited parts of Unix, rewrite it from scratch, and publish it for free. They called it the Berkeley Software Distribution, or BSD. At the same time, Richard Stallman at MIT was not happy with what Unix did. So he wrote also rewrote Unix and other OS parts, published them for free on the internet. He called it GNU, short for “GNU is Not Unix”, a dig at Unix. He also created the GNU public license, or GPL, that is often used by open source projects.
Linux. A little while later, a student in Finland started a project building a new OS based on the Unix OS. This project used the GPL and eventually became Linux.

It’s safe to say that operating systems and open source software, and the legal and monetary forces that surround them, are one of the common threads throughout computing history.

One recurring thread throughout this time is the connection between computers and computer development. We take for granted the software ecosystem we have today, but early computers needed to bootstrap their software ecosystem to…help bootstrap their software ecosystem. This led programmers to greatly influence the design of modern computers, such as the semi-colon key being a primary key on the keyboard.