一千萬個為什麽

搜索

如何實施緊急修復的四眼原則?



考慮這種情況(任何與真實世界情況的比較純粹是偶然的):

  • 3:07 am: incoming support call "Something in production went down, I need your help!".
  • 3:12 am: connected to the system (logon accepted) ... and no time for coffee.
  • 3:15 am: lucky you, right away you could spot the issue via some error message somewhere.
  • 3:17 am: use your SCM toolbox to grab the code, fix the issue, test it, great ... my fix works!
  • 3:20 am: get in touch with the DevOps-team to ship the fix and to get production running again.
  • 3:21 am: red flag ... "To respect , we need 2 more eyes to get approval for this fix".
  • 3:22 am: ggggrrrreat, now what, who else can we call (= wake up some manager)?

如果您實施了一些類似於我對“什麽是四眼原理的可能實現(或例子)?”,那麽你的運氣不好......這裏有你的選擇:

  • 您的修復程序將被卡住(讀取:生產將停止),直到再有2個人涉入。
  • 你找出一種方法來繞過失蹤的眼睛。

So how to implement the four-eyes principle for emergency fixes? ... So that you get production up and running asap, i.e. around 3:25 am ... And so that you can also close the call (and go back to where you came from)?

轉載註明原文: 如何實施緊急修復的四眼原則?

一共有 2 個回答:

在我最熟悉的SCM世界中,上述場景通常通過所謂的 縮寫 - 批準列表過程來解決。

這是它的藍圖:

  • Define your business hours, say from 8 am to 6 pm.
  • Define a complete approval list of (say) 3 levels of approval (for roles X, Y and Z).
  • Define an abbreviated approval list of (say) only 1 level of approval (only for roles X).
  • Planned changes always require all approvals from the complete approval list.
  • For Unplanned changes, the complete approval list is used also to gather the required approvals, provided the approvals are to be issued during the defined business hours.
  • For any approvals of unplanned changes that are to be issued outside the defined business hours:
    • Only the approvals from the abbreviated approval list (such as role X above) are required to authorize the change. And after the authorization by the abbreviated approval list is given, the deployment of the change (in the target environment) will actually be performed.
    • But additional post-approvals will be needed afterwards (within a reasonable amount of hours/days), i.e from all roles contained in the complete approval list (such as role Y and Z above), which are not also contained in the abbreviated approval list (such as role X above). And if within the (upfront) agreed amount of hours/days not all post-approvals have been issued (e.g because the fix worked "this" time, but was only like a temporary fix), then the change might be subject to a rollback. While there is at least 1 outstanding post-approval, the change is marked as "waiting post approvals".

有了這樣的解決方案,呼叫可以在 3:23 am 之前關閉......因為在 3:21 am 上不會再有紅旗...... ggggrrreat ,喝啤酒的時間來慶祝我的解決方案,讓生產再次(而不是咖啡)......手指越過優秀的崗位批準即將到來......

如果是非工作時間緊急修復,則更改實際需要的簽名更改比正常程序更少。通常,您可以部署修復程序,然後在下一個工作日進行後批準。如果修復程序未獲批準,則可以恢復並用永久性解決方案替換。

在停電的情況下,首要任務應該是恢復服務。如果你的組織在停電期間不承認這個放松的過程,那麽是的,你唯一的選擇就是開始喚醒更多的人簽字。