In data center power protection systems, Uninterruptible Power Supplies (UPS) are widely used across different grades of data centers. As UPS systems operate continuously, malfunctions are inevitable. Proper operation and maintenance of UPS systems are crucial in reducing the failure rate of uninterrupted power sources and preventing abnormal shutdown incidents caused by internal control systems (wiring, software, etc.) of the UPS!
Here, we share an incident where a customer’s UPS shutdown was caused by control system abnormalities, hoping to draw the attention of all operations and maintenance colleagues!
1
Fault Symptoms:
The power monitoring system suddenly displayed multiple alarm messages: UPS abnormality, UPS output screen power loss (voltage, current both returning to 0), power loss in N column distribution cabinets in the data center, UPS system BCB battery switch tripped, etc.;
On-Site Situation:
The on-duty personnel immediately checked the alarmed UPS on-site, and the situation was as follows:
- The main and bypass power switches of the on-site UPS input cabinet were normal;
- The switches in the output cabinet were closed, but the output cabinet had lost power;
- The switches in the output cabinet corresponded to power loss in the data center column distribution cabinets;
- The battery switch of the alarmed UPS was tripped and couldn’t be closed; it remained in a tripped position;
- The status indicator lights on the UPS unit were off, the alarm light was on constantly, accompanied by an alarm sound.
2
Emergency Measures:
- Immediately check the status and parameters of the UPS power switches and power levels. The three-phase power supply voltage in the power cabinet was normal, the main and bypass switches were closed, and the current was 0; the main input switch of the UPS output screen was opened;
- Press the fault clearance button on the UPS unit;
- Open the main and bypass breakers of the UPS input, wait for 30 seconds, then close them again; at this point, the UPS will automatically switch to bypass mode;
- Change the UPS operating mode from bypass power supply to normal rectifier-inverter mode; check all output parameters of the UPS for normalcy; then close the battery switch from tripped to closed, and restore it to a normal closed position;
- Return the switches on the UPS output screen to the closed position, restoring power to the data center distribution cabinets;
- Notify the manufacturer’s personnel to inspect the faulty equipment on-site and eliminate potential issues.
3
Fault Investigation:
Upon our engineers’ arrival, data was exported from the UPS for inspection, revealing that the UPS unit issued an emergency shutdown command internally, leading to the shutdown. Further examination of the UPS unit showed that the EPO connector was not properly inserted, with no other abnormal findings.
Considering the actual situation on-site:
- The UPS power cabinet main and bypass switch states were normal, with output interruption, the BCB battery switch tripped, unable to be manually reset;
- The UPS unit’s main and bypass input loops were locked;
- After the main and bypass switches were opened and closed (UPS unit power cycle), the UPS automatically reverted to bypass mode;
- The inverter was manually started, switching the UPS to main inverter power supply mode;
- The BCB battery switch operation was successful.
The characteristics of an emergency shutdown following an EPO action were observed, thus diagnosing the UPS shutdown as a result of internal error commands.
4
Follow-up Improvement Measures:
UPS equipment serves as the final line of defense for power reliability. To ensure the safe and reliable power supply of data centers, the maintenance team should take preventive measures and learn from this incident to prevent similar issues from recurring.
Immediately inspect all UPS operational safety hazards: comprehensively check the UPS operating environment, ensure proper fan operation for adequate heat dissipation; check the battery status to ensure backup capabilities; inspect the internal EPO wiring of all UPS units to prevent similar failures in the future.
By learning from and addressing such incidents promptly, data center operations can maintain a high level of reliability and uptime in critical power supply systems.