Changing a Cluster

There are several reasons why you might want to change the configuration of a cluster:

  • To increase or decrease cluster capacity by changing the amount of reserved memory and storage.
  • To enable high availability by adjusting the number of data centers that your cluster runs in.
  • To upgrade to new versions of Elasticsearch. You can upgrade from one major version to another, such as from 1.7.5 to 2.3.5, or from one minor version to another, such as 2.3.5 to 2.4.0. You can’t downgrade versions.
  • To change what plugins are available on your cluster.

You can change the configuration of a running cluster from the Configuration pane in the in the Elastic Cloud Console.

With the exception of major version upgrades, we can perform all these changes without having to interrupt your cluster. You can continue searching and indexing. The changes can also be done in bulk: in one action, you can add more memory, upgrade, adjust the number of plugins and adjust the number of availability zones.

We perform all of these changes by making the cluster with the new configuration join the existing cluster in its entirety. After joining, the new nodes will recover the indexes. When they are done, they will start receiving requests. When all the new nodes are ready, we bring down the old ones.

By doing it this way, we reduce the risk of doing any changes. If the new nodes have any problems, the old ones are still there, processing requests.

Note: If you use a Platform-as-a-Service provider like Heroku, the administration console is slightly different and does not allow you to make changes that will affect the price. That must be done in the platform provider’s add-on system. You can still do things like change Elasticsearch version or plugins.

Version Upgrades

When changing the version of an existing cluster, either a major or a minor upgrade is performed. The difference is that a minor upgrade takes you from 2.2 to 2.3, for example, and requires no downtime as a rolling upgrade is performed. A major upgrade takes you from from 2.3 to 5.0, for example, and requires a full cluster restart as part of the upgrade process.

Bug fix releases also require no downtime when upgrading. A bug fix release takes you from 2.3.4 to 2.3.5, for example.

Best Practices for Major Version Upgrades

For major version upgrades, we have to bring the cluster to a full stop before upgrading, as the nodes cannot communicate with each other. This is done by flushing all changes so we are sure we can recover them, then we start the cluster with the new version.

While Elasticsearch is working on making upgrades across major versions possible, major version upgrades often include so many changes that upgrades can be risky. This is usually true for any kind of software. Our recommended approach for major version upgrades is to simply make a new cluster with the latest major version, reindex everything and make sure index requests are temporarily sent to both clusters. With the new cluster ready, you can then do a hot swap and send requests to the new cluster. Since you are only billed for the hours a cluster is running, the few extra dollars added to your bill for having an extra cluster running for a while is money well spent. Since the cluster with the version known to work well is already running, you can quickly roll back if the new version has errors.

We make it easy to manage multiple clusters with different versions. We do not force customers to upgrade their clusters. If we need to end-of-life a very old version, you can expect to be notified in due time.

On High Availability

High availability (HA) is achieved by running a cluster with replicas in multiple availability zones, to prevent against downtime when inevitable infrastructure problems occur. Our article on Elasticsearch in Production covers this more extensively.

We offer the options of running in one, two, or three availability zones (AZ). Running in two AZ’s is our default high availability configuration. It provides reasonably high protection against infrastructure failures and intermittent network problems. You might want three zones if you need even higher protection, or perhaps just one zone if the cluster is mainly used for testing or development.

As mentioned above, this is something that you can change while the cluster is running. For example, when you prepare a new cluster for production use, you can first run it in a single zone, then add another zone right before deploying to production.

While running in multiple zones increases a cluster’s reliability, it does not protect against problematic searches causing nodes to run out of memory, for example. For a cluster to be highly reliable and available, it is also important to have enough memory.

Tearing It Down

The cluster’s configuration pane allows you to delete a cluster:

Deleting a running cluster

Deleting is final and cannot be undone. Billing stops immediately when the cluster has been deleted, rounding up to the nearest hour. This means you can easily start a cluster, run some tests, and tear it down again when you are done.

In not too long, we will make our cluster management APIs available. This will enable you to automate the running of, for example, final integration tests before deploying to production, for a few cents. We are also looking into making it possible to pause a small running cluster. This can be quite useful when you only occasionally use a staging cluster.