Of course we cannot always share details about our work with customers, but nevertheless it is nice to show our achievements and share some solutions.
Yesterday I was informed that one of our EC2 instances in eu-central-1 (Frankfurt) was scheduled for retirement.
My first thought was: What the f? Did someone schedule a shutdown or termination of this instance? Is this even possible? At least that's what I understood when I read "retirement". But when I logged into the AWS console I got a bit more information from the "Event log" scheduled changes:
EC2 has detected degradation of the underlying hardware hosting your Amazon EC2 instance associated with this event in the eu-central-1 region. Due to this degradation your instance could already be unreachable. We will stop your instance after 2019-07-02.
Sounds like a hardware failure to me. But why doesn't AWS hot-migrate the instance to another physical server then? Turns out, this is not possible and the instance needs to be stopped. This is what AWS also mentions in the same information:
We recommend that you stop and start the instance which will migrate the instance to a new host. Please note that any data on your local instance-store volumes will not be preserved when you stop and start your instance.
The second sentence caught my eye. I will lose my data? Panic time! This description is unfortunately not very well written or explained. What exactly is a a "local instance-store volume"? Is that the default? It requires further reading to then find out that all EC2 instances using "EBS" volumes will not lose data and can be stopped at any time. This is what I use for all my EC2 instances, as the quasi default. To verify what kind of root volume the affected instance is using, click on the "Root device" of this instance. A popup will show the device type.
Once I shut down the instance and then a few minutes later started it again, the retirement warning disappeared in the AWS console. It took another couple hours (at least that's when I re-checked) to see the alert cleared in the event log: