Date |
Details |
Resolution |
2024-04-14 |
ceph storage cluster experienced a severe degradation affecting all VMs backed by ceph-storage.
|
ceph OSDs were restarted and cluster operation normalized. operations remain stable after OSD restart.
|
2024-02-27 |
compute node went offline due to a power supply failure.
no impact due to redundant service mesh; services automatically migrated to other available compute nodes.
|
faulty power supply was replaced and compute node was restored into service.
|
2024-02-27 |
planned outage, services down for extended period while hardware was moved to a new rack.
|
rack move completed successfully and services were restored after move was completed.
|
2024-01-22 |
service instability due to faulty link between core switches.
|
optic and fiber patch on core switch were swapped.
|
2023-11-25 |
outage on a compute node caused by faulty NIC. affected services were migrated to non-affected compute nodes.
|
faulty NIC was replaced and compute node was restored into service.
|
2023-09-11 |
total outage caused by power failure due to storms in midwest US.
|
all services restored as of 4:00pm central.
|