Wikitech labswiki https://wikitech.wikimedia.org/wiki/Main_Page MediaWiki 1.44.0-wmf.5 first-letter Media Special Talk User User talk Wikitech Wikitech talk File File talk MediaWiki MediaWiki talk Template Template talk Help Help talk Category Category talk Obsolete Obsolete talk OfficeIT OfficeIT talk Tool Tool talk Nova Resource Nova Resource Talk Heira Heira Talk TimedText TimedText talk Module Module talk Nova Resource:Tools/SAL 498 3086 2249718 2249329 2024-12-01T00:36:01Z JrandWP 37706 archive 2022-2023 2249718 wikitext text/x-wiki === 2024-11-29 === * 03:43 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 03:37 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 03:37 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 03:34 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 03:34 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 03:34 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api === 2024-11-27 === * 18:26 taavi: kubectl sudo rollout restart -n kube-system deployment coredns # update resolv.conf in coredns containers === 2024-11-26 === * 10:42 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-control-7 * 10:41 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-control-7 * 10:36 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-control-7 * 10:35 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-control-7 * 10:34 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-control-7 * 10:33 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-control-7 * 10:32 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-control-7 * 10:31 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-control-7 * 10:31 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-control-7 * 10:30 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-control-7 * 10:23 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-control-9 * 10:22 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-control-9 * 10:22 dcaro: rebooting k8s-control-9 * 10:18 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-control-8 * 10:17 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-control-8 * 10:17 dcaro: rebooting k8s-control-8 * 09:15 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-72 * 09:14 dcaro: restarting tools-k8s-worker-nfs-72 * 09:14 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-72 * 09:13 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-70 * 09:12 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-70 * 09:12 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-50 * 09:12 dcaro: restarting tools-k8s-worker-nfs-70 * 09:11 dcaro: restarting tools-k8s-worker-nfs-50 * 09:11 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-50 * 09:08 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-17 * 09:07 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-17 * 08:34 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-61 * 08:33 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-61 * 07:30 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for all NFS workers ([[phab:T380827|T380827]]) * 06:47 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for all NFS workers ([[phab:T380827|T380827]]) === 2024-11-25 === * 13:05 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-cli * 12:59 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-cli === 2024-11-23 === * 07:27 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder ([[phab:T358225|T358225]]) * 07:21 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder ([[phab:T358225|T358225]]) === 2024-11-20 === * 15:15 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 15:09 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 14:21 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-admission * 14:15 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 12:15 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-admission * 12:09 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 00:22 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission ([[phab:T362867|T362867]]) * 00:16 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission ([[phab:T362867|T362867]]) === 2024-11-19 === * 21:52 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 21:46 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 21:36 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer * 21:30 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer * 21:11 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-api * 21:05 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api * 21:05 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer * 20:59 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer * 20:54 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component wmcs-k8s-metrics * 20:53 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-emailer * 20:53 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer * 20:48 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component wmcs-k8s-metrics * 20:38 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component envvars-api * 20:31 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api * 20:31 raymond-ndibe@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.component.deploy (exit_code=97) for component envvars-api ([[phab:T362867|T362867]]) * 20:31 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api ([[phab:T362867|T362867]]) * 20:30 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component envvars-api ([[phab:T362867|T362867]]) * 20:28 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api ([[phab:T362867|T362867]]) * 20:17 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component calico ([[phab:T362867|T362867]]) * 20:12 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component calico ([[phab:T362867|T362867]]) * 20:07 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component wmcs-k8s-metrics ([[phab:T362867|T362867]]) * 20:01 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component wmcs-k8s-metrics ([[phab:T362867|T362867]]) * 19:37 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission ([[phab:T362867|T362867]]) * 19:32 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission ([[phab:T362867|T362867]]) * 19:30 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission ([[phab:T362867|T362867]]) * 19:23 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission ([[phab:T362867|T362867]]) * 15:52 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 15:46 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api === 2024-11-18 === * 14:45 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component tools-webservice * 14:39 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component tools-webservice * 14:35 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component tools-webservice * 14:33 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component tools-webservice * 11:15 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer * 11:09 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer === 2024-11-15 === * 14:05 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-db-5.tools.eqiad1.wikimedia.cloud ([[phab:T352206|T352206]]) * 14:04 fnegri@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-db-5.tools.eqiad1.wikimedia.cloud ([[phab:T352206|T352206]]) * 14:03 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.create_instance_with_prefix (exit_code=0) with prefix 'tools-db' ([[phab:T352206|T352206]]) * 13:57 fnegri@cloudcumin1001: START - Cookbook wmcs.vps.create_instance_with_prefix with prefix 'tools-db' ([[phab:T352206|T352206]]) * 13:57 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) ([[phab:T352206|T352206]]) * 13:57 fnegri@cloudcumin1001: START - Cookbook wmcs.openstack.quota_increase ([[phab:T352206|T352206]]) * 13:50 fnegri@cloudcumin1001: END (FAIL) - Cookbook wmcs.vps.create_instance_with_prefix (exit_code=99) with prefix 'tools-db' ([[phab:T352206|T352206]]) * 13:49 fnegri@cloudcumin1001: START - Cookbook wmcs.vps.create_instance_with_prefix with prefix 'tools-db' ([[phab:T352206|T352206]]) === 2024-11-14 === * 13:16 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component tools-webservice * 13:10 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component tools-webservice * 13:04 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component tools-webservice * 13:02 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component tools-webservice * 13:02 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component tools-webservice * 12:59 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component tools-webservice === 2024-11-12 === * 15:50 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 15:43 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 10:27 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component tools-webservice * 10:20 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component tools-webservice * 10:11 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component tools-webservice * 10:08 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component tools-webservice === 2024-11-11 === * 16:02 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-db-4.tools.eqiad1.wikimedia.cloud ([[phab:T352206|T352206]]) * 15:58 fnegri@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-db-4.tools.eqiad1.wikimedia.cloud ([[phab:T352206|T352206]]) * 14:44 fnegri@cloudcumin1001: END (FAIL) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=99) on tools-db-4.tools.eqiad1.wikimedia.cloud ([[phab:T352206|T352206]]) * 14:42 fnegri@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-db-4.tools.eqiad1.wikimedia.cloud ([[phab:T352206|T352206]]) * 14:41 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.create_instance_with_prefix (exit_code=0) with prefix 'tools-db' ([[phab:T352206|T352206]]) * 14:37 fnegri@cloudcumin1001: START - Cookbook wmcs.vps.create_instance_with_prefix with prefix 'tools-db' ([[phab:T352206|T352206]]) * 14:01 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 13:55 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway === 2024-11-10 === * 02:47 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) ([[phab:T362867|T362867]]) * 02:47 raymond-ndibe@cloudcumin1001: Updating container image docker-registry.tools.wmflabs.org/kube-state-metrics:v2.11.0 ([[phab:T362867|T362867]]) * 02:47 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T362867|T362867]]) === 2024-11-06 === * 16:27 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 16:22 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 15:48 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 15:43 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 10:14 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-24 ([[phab:T379139|T379139]]) * 10:13 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-24 ([[phab:T379139|T379139]]) * 07:57 sstefanova@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component tools-webservice * 07:52 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component tools-webservice * 07:20 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers * 07:14 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers === 2024-11-05 === * 17:20 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers * 17:13 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers * 09:40 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 09:34 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 08:38 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component wmcs-k8s-metrics * 08:32 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component wmcs-k8s-metrics * 08:23 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder * 08:17 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 07:49 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component calico * 07:44 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component calico === 2024-11-04 === * 16:39 sstefanova@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-api * 16:34 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 16:30 sstefanova@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-api * 16:25 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 16:22 sstefanova@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-api * 16:21 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 15:05 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 14:59 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 14:47 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-ingress-9 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:46 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-ingress-9 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:46 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-ingress-8 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:45 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-ingress-8 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:45 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-ingress-7 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:44 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-ingress-7 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:42 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-9 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:41 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-9 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:41 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-8 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:40 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-8 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:40 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-76 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:39 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-76 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:38 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-75 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:37 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-75 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:37 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-74 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:36 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-74 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:36 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-73 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:35 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-73 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:35 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-72 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:34 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-72 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:34 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-71 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:33 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-71 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:33 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-70 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:32 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-70 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:32 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-7 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:29 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-69 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:29 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-68 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:28 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-68 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:28 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-67 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:27 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-67 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:27 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-66 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:26 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-66 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:26 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-65 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:25 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-65 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:25 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-61 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:24 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-61 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:20 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-61 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:14 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-61 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:14 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-58 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:14 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-58 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:08 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-58 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:02 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-58 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:02 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-57 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:01 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-57 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:01 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-55 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:00 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-55 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:00 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-54 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:59 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-54 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:59 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-53 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:57 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-53 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:57 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-50 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:56 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-50 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:56 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-5 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:55 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-5 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:55 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-48 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:54 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-48 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:54 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-47 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:53 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-47 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:53 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-46 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:52 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-46 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:51 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-45 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:50 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-45 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:50 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-44 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:49 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-44 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:49 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-43 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:48 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-43 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:48 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-42 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:47 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-42 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:47 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-41 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:46 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-41 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:46 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-40 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:44 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-40 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:44 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-39 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:43 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-39 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:43 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-38 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:42 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-38 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:42 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-37 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:41 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-37 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:41 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-36 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:40 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-36 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:40 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-35 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:39 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-35 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:38 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-34 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:37 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-34 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:37 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-33 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:36 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-33 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:36 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-32 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:35 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-32 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:35 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-3 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:34 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-3 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:34 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-27 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:33 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-27 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:33 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-26 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:31 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-26 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:31 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-24 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:30 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-24 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:30 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-23 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:29 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-23 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:29 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-22 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:28 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-22 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:28 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-21 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:27 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-21 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:27 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-2 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:26 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-2 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:26 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-19 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:25 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-19 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:20 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-19 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:14 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-19 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:14 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-17 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:13 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-17 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:13 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-16 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:12 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-16 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:11 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-14 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:10 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-14 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:10 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-13 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:10 sstefanova@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api * 13:09 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-13 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:09 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-12 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:08 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-12 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:08 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-11 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:07 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-11 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:07 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-10 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:06 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-10 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:04 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 13:04 sstefanova@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api * 13:02 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 12:55 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-10 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 12:49 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-10 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 12:47 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-10 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 12:41 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-10 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 12:41 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-1 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 12:40 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-1 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 12:40 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-108 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 12:39 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-108 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 12:39 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-107 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 12:38 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-107 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 12:38 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-106 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 12:37 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-106 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 12:36 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-105 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 12:35 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-105 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 12:35 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-103 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 12:34 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-103 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 12:34 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-102 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 12:33 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-102 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 12:22 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-9 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 12:22 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-api * 12:16 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api * 12:13 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-9 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 12:11 sstefanova@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component envvars-api * 12:06 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api * 12:03 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-8 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 11:59 sstefanova@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component envvars-api * 11:58 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api * 11:57 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-8 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 11:49 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-7 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 11:42 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-7 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 11:38 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.prepare_upgrade (exit_code=0) for cluster tools upgrade from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 11:26 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.prepare_upgrade for cluster tools upgrade from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 11:19 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission * 11:14 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission * 10:56 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 10:50 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 10:42 dcaro: added api.svc.toolforge.org dns record entry * 10:32 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-admission * 10:25 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 10:15 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission * 10:11 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 09:56 sstefanova@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component registry-admission * 09:55 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 09:51 sstefanova@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component registry-admission * 09:48 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 09:28 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 09:23 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway === 2024-10-22 === * 13:05 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-23 * 13:00 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-23 * 12:58 aborrero@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-worker-nfs-33, tools-k8s-woker-nfs-23 * 12:52 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-33, tools-k8s-woker-nfs-23 * 09:05 arturo: restart puppetserver service for [[phab:T377803|T377803]] === 2024-10-16 === * 09:41 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 09:37 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 09:24 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 09:07 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 09:00 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2024-10-15 === * 17:20 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 17:14 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 16:16 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component api-gateway * 16:14 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway === 2024-10-14 === * 09:14 dcaro: migrating pipelineruns stored versions to v1 ([[phab:T376710|T376710]]) * 07:26 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-9 * 07:24 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-9 * 07:24 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-worker-nfs-9 * 07:23 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-9 === 2024-10-09 === * 09:27 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers * 09:20 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers * 09:17 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers * 09:11 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers === 2024-10-08 === * 13:34 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component toolforge-weld ([[phab:T376710|T376710]]) * 13:27 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component toolforge-weld ([[phab:T376710|T376710]]) * 12:38 dcaro: tests are passing correctly, upgrade finished, will investigate the increased slowness as a followup * 12:27 dcaro: upgrade finished, build actions have become slower than usual ([[phab:T376710|T376710]]), running tests and investigating * 12:02 dcaro: starting toolforge builds-builder upgrade, no downtime expected though some builds might fail to start/list/log/show while the upgrade is in progress [[phab:T374908|T374908]] * 08:26 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers * 08:24 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers * 08:24 aborrero@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component maintain-kubeusers * 08:24 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers === 2024-10-04 === * 11:57 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 11:51 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 11:44 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 11:38 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2024-10-02 === * 09:11 fnegri@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component maintain-kubeusers * 09:07 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers === 2024-10-01 === * 10:52 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 10:46 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 10:32 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 10:28 dcaro: updated ci image with latest precommit versions * 10:27 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 09:52 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component ingress-admission * 09:47 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission === 2024-09-30 === * 18:25 taavi: run striker migrations [[phab:T359428|T359428]] === 2024-09-28 === * 00:14 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli * 00:07 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli === 2024-09-27 === * 23:58 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component toolforge-weld * 23:52 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component toolforge-weld === 2024-09-26 === * 16:45 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission * 16:40 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission * 16:24 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission * 16:18 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 16:18 wmbot~raymondndibe@wmf3402: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component registry-admission * 16:08 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 16:05 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 15:58 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 10:26 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli * 10:20 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 10:12 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-cli * 10:05 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component envvars-cli * 07:53 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component toolforge-weld * 07:46 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component toolforge-weld === 2024-09-25 === * 08:00 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-7 * 07:59 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-7 === 2024-09-24 === * 22:11 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers ([[phab:T375157|T375157]]) * 22:03 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers ([[phab:T375157|T375157]]) * 21:48 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component kyverno ([[phab:T359641|T359641]]) * 21:41 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component kyverno ([[phab:T359641|T359641]]) === 2024-09-20 === * 20:12 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component calico ([[phab:T341066|T341066]]) * 20:08 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component calico ([[phab:T341066|T341066]]) * 20:08 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component calico ([[phab:T341066|T341066]]) * 20:06 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component calico ([[phab:T341066|T341066]]) * 19:36 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component calico ([[phab:T341066|T341066]]) * 19:31 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component calico ([[phab:T341066|T341066]]) * 17:06 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) ([[phab:T359641|T359641]]) * 17:06 raymond-ndibe@cloudcumin1001: Updating container image docker-registry.tools.wmflabs.org/calico/pod2daemon-flexvol:v3.28.2 ([[phab:T359641|T359641]]) * 17:05 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T359641|T359641]]) * 17:04 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) ([[phab:T359641|T359641]]) * 17:04 raymond-ndibe@cloudcumin1001: Updating container image docker-registry.tools.wmflabs.org/calico/typha:v3.28.2 ([[phab:T359641|T359641]]) * 17:04 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T359641|T359641]]) * 17:04 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) ([[phab:T359641|T359641]]) * 17:03 raymond-ndibe@cloudcumin1001: Updating container image docker-registry.tools.wmflabs.org/calico/node:v3.28.2 ([[phab:T359641|T359641]]) * 17:03 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T359641|T359641]]) * 17:02 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) ([[phab:T359641|T359641]]) * 17:02 raymond-ndibe@cloudcumin1001: Updating container image docker-registry.tools.wmflabs.org/calico/kube-controllers:v3.28.2 ([[phab:T359641|T359641]]) * 17:02 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T359641|T359641]]) * 16:59 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) ([[phab:T359641|T359641]]) * 16:59 raymond-ndibe@cloudcumin1001: Updating container image docker-registry.tools.wmflabs.org/calico/ctl:v3.28.2 ([[phab:T359641|T359641]]) * 16:59 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T359641|T359641]]) * 16:57 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) ([[phab:T359641|T359641]]) * 16:56 raymond-ndibe@cloudcumin1001: Updating container image docker-registry.tools.wmflabs.org/calico/cni:v3.28.2 ([[phab:T359641|T359641]]) * 16:56 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T359641|T359641]]) * 16:54 wmbot~raymondndibe@wmf3402: Updating container image docker-registry.tools.wmflabs.org/calico/cni:v3.28.2 ([[phab:T359641|T359641]]) * 16:54 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T359641|T359641]]) * 06:29 wmbot~dcaro@urcuchillay: END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=1) * 00:39 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component wmcs-k8s-metrics ([[phab:T359641|T359641]]) * 00:32 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component wmcs-k8s-metrics ([[phab:T359641|T359641]]) === 2024-09-19 === * 23:17 wmbot~raymondndibe@wmf3402: END (ERROR) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=97) ([[phab:T359641|T359641]]) * 23:17 wmbot~raymondndibe@wmf3402: Updating container image docker-registry.tools.wmflabs.org/metrics-server:v0.7.10 ([[phab:T359641|T359641]]) * 23:17 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T359641|T359641]]) * 23:12 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) ([[phab:T359641|T359641]]) * 23:11 wmbot~raymondndibe@wmf3402: Updating container image docker-registry.tools.wmflabs.org/kube-state-metrics:v2.10.1 ([[phab:T359641|T359641]]) * 23:11 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T359641|T359641]]) * 22:38 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) ([[phab:T359641|T359641]]) * 22:37 wmbot~raymondndibe@wmf3402: Updating container image docker-registry.tools.wmflabs.org/metrics-server:v0.7.1 ([[phab:T359641|T359641]]) * 22:37 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T359641|T359641]]) * 22:36 wmbot~raymondndibe@wmf3402: END (FAIL) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=99) ([[phab:T359641|T359641]]) * 22:36 wmbot~raymondndibe@wmf3402: Updating container image docker-registry.tools.wmflabs.org/metrics-server:v0.7.1 ([[phab:T359641|T359641]]) * 22:36 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T359641|T359641]]) * 22:35 wmbot~raymondndibe@wmf3402: END (ERROR) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=97) ([[phab:T359641|T359641]]) * 22:35 wmbot~raymondndibe@wmf3402: Updating container image docker-registry.tools.wmflabs.org/docker-registry.tools.wmflabs.org/metrics-server:v0.7.1 ([[phab:T359641|T359641]]) * 22:35 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T359641|T359641]]) * 17:47 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli ([[phab:T341066|T341066]]) * 17:41 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli ([[phab:T341066|T341066]]) * 17:13 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api ([[phab:T341066|T341066]]) * 17:06 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api ([[phab:T341066|T341066]]) * 16:48 wmbot~raymondndibe@wmf3402: END (ERROR) - Cookbook wmcs.toolforge.component.deploy (exit_code=97) for component jobs-api ([[phab:T341066|T341066]]) * 16:46 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api ([[phab:T341066|T341066]]) * 16:45 wmbot~dcaro@urcuchillay: END (ERROR) - Cookbook wmcs.toolforge.component.deploy (exit_code=97) for component jobs-api * 16:43 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 16:38 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api ([[phab:T341066|T341066]]) * 16:26 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api ([[phab:T341066|T341066]]) * 16:10 dcaro: rebooting tools-k8s-worker-nfs-24 it's stuck without network * 16:08 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 16:08 wmbot~dcaro@urcuchillay: END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=255) * 16:07 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 16:07 wmbot~dcaro@urcuchillay: END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=255) * 16:07 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 15:28 wmbot~raymondndibe@wmf3402: END (ERROR) - Cookbook wmcs.toolforge.component.deploy (exit_code=97) for component jobs-api ([[phab:T341066|T341066]]) * 15:27 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api ([[phab:T341066|T341066]]) * 15:19 wmbot~raymondndibe@wmf3402: END (ERROR) - Cookbook wmcs.toolforge.component.deploy (exit_code=97) for component jobs-api ([[phab:T341066|T341066]]) * 15:18 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api ([[phab:T341066|T341066]]) * 15:08 wmbot~raymondndibe@wmf3402: END (ERROR) - Cookbook wmcs.toolforge.component.deploy (exit_code=97) for component jobs-api ([[phab:T341066|T341066]]) * 15:07 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api ([[phab:T341066|T341066]]) * 15:01 wmbot~raymondndibe@wmf3402: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-api ([[phab:T341066|T341066]]) * 14:57 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api ([[phab:T341066|T341066]]) * 14:56 wmbot~raymondndibe@wmf3402: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-api ([[phab:T341066|T341066]]) * 14:50 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api ([[phab:T341066|T341066]]) === 2024-09-17 === * 08:46 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-70 ([[phab:T359641|T359641]]) * 08:43 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-70 ([[phab:T359641|T359641]]) * 08:43 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-k8s-worker-nfs-70.tools.eqiad1.wikimedia.cloud ([[phab:T359641|T359641]]) * 08:41 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-75 ([[phab:T359641|T359641]]) * 08:40 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-k8s-worker-nfs-70.tools.eqiad1.wikimedia.cloud ([[phab:T359641|T359641]]) * 08:40 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-75 ([[phab:T359641|T359641]]) * 08:35 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-k8s-worker-nfs-75.tools.eqiad1.wikimedia.cloud ([[phab:T359641|T359641]]) * 08:32 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-k8s-worker-nfs-75.tools.eqiad1.wikimedia.cloud ([[phab:T359641|T359641]]) * 03:24 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-ingress-9 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 03:23 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-ingress-9 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 03:20 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-ingress-8 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 03:19 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-ingress-8 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 03:19 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-ingress-7 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 03:18 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-ingress-7 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 03:13 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=99) for host tools-k8s-worker-nfs-64 * 03:10 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=99) for host tools-k8s-worker-nfs-63 * 03:08 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-nfs-64 ([[phab:T359641|T359641]]) * 03:07 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 03:07 raymond-ndibe@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-76.tools.eqiad1.wikimedia.cloud to the cluster * 03:04 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-nfs-63 ([[phab:T359641|T359641]]) * 03:00 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 03:00 raymond-ndibe@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-75.tools.eqiad1.wikimedia.cloud to the cluster * 02:57 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster ([[phab:T359641|T359641]]) * 02:50 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster ([[phab:T359641|T359641]]) * 02:46 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 02:46 raymond-ndibe@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-74.tools.eqiad1.wikimedia.cloud to the cluster * 02:45 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=99) for host tools-k8s-worker-nfs-62 * 02:45 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=99) for host tools-k8s-worker-nfs-60 * 02:40 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-nfs-62 ([[phab:T359641|T359641]]) * 02:40 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-nfs-60 ([[phab:T359641|T359641]]) * 02:38 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 02:38 raymond-ndibe@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-73.tools.eqiad1.wikimedia.cloud to the cluster * 02:36 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster ([[phab:T359641|T359641]]) * 02:32 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 02:32 raymond-ndibe@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-72.tools.eqiad1.wikimedia.cloud to the cluster * 02:29 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster ([[phab:T359641|T359641]]) * 02:24 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 02:24 raymond-ndibe@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-71.tools.eqiad1.wikimedia.cloud to the cluster * 02:22 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster ([[phab:T359641|T359641]]) * 02:15 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster ([[phab:T359641|T359641]]) * 02:12 raymond-ndibe@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=97) for a worker-nfs role in the tools cluster ([[phab:T359641|T359641]]) * 02:10 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=99) for host tools-k8s-worker-nfs-6 * 02:10 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=99) for host tools-k8s-worker-nfs-56 * 02:08 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 02:08 raymond-ndibe@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-70.tools.eqiad1.wikimedia.cloud to the cluster * 02:05 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-nfs-6 ([[phab:T359641|T359641]]) * 02:04 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-nfs-56 ([[phab:T359641|T359641]]) * 02:02 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=99) for host tools-k8s-worker-nfs-49 * 02:02 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=99) for host tools-k8s-worker-nfs-31 * 01:58 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster ([[phab:T359641|T359641]]) * 01:58 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster ([[phab:T359641|T359641]]) * 01:58 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 01:57 raymond-ndibe@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-69.tools.eqiad1.wikimedia.cloud to the cluster * 01:57 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-nfs-49 ([[phab:T359641|T359641]]) * 01:57 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-nfs-31 ([[phab:T359641|T359641]]) * 01:56 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=99) for host tools-k8s-worker-nfs-30 * 01:54 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-64 ([[phab:T359641|T359641]]) * 01:53 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=99) for host tools-k8s-worker-nfs-29 * 01:50 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-nfs-30 ([[phab:T359641|T359641]]) * 01:49 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-64 ([[phab:T359641|T359641]]) * 01:48 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster ([[phab:T359641|T359641]]) * 01:48 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-nfs-29 ([[phab:T359641|T359641]]) * 01:46 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-64 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:46 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-63 ([[phab:T359641|T359641]]) * 01:45 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=99) for host tools-k8s-worker-nfs-28 * 01:42 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 01:42 raymond-ndibe@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-68.tools.eqiad1.wikimedia.cloud to the cluster * 01:40 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-63 ([[phab:T359641|T359641]]) * 01:40 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-64 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:40 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-63 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:40 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-nfs-28 ([[phab:T359641|T359641]]) * 01:35 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-62 ([[phab:T359641|T359641]]) * 01:34 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-63 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:34 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-62 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:34 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-60 ([[phab:T359641|T359641]]) * 01:33 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster ([[phab:T359641|T359641]]) * 01:32 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 01:32 raymond-ndibe@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-67.tools.eqiad1.wikimedia.cloud to the cluster * 01:29 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-62 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:29 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-61 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:28 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-60 ([[phab:T359641|T359641]]) * 01:28 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-6 ([[phab:T359641|T359641]]) * 01:28 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-61 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:28 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-60 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:23 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster ([[phab:T359641|T359641]]) * 01:23 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 01:23 raymond-ndibe@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-66.tools.eqiad1.wikimedia.cloud to the cluster * 01:23 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-6 ([[phab:T359641|T359641]]) * 01:22 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-60 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:22 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-6 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:21 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-56 ([[phab:T359641|T359641]]) * 01:16 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-6 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:16 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-57 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:15 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-56 ([[phab:T359641|T359641]]) * 01:15 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-57 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:15 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-56 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:14 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-49 ([[phab:T359641|T359641]]) * 01:14 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster ([[phab:T359641|T359641]]) * 01:09 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-56 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:09 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-50 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:09 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-49 ([[phab:T359641|T359641]]) * 01:08 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-50 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:08 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-49 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:04 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-31 ([[phab:T359641|T359641]]) * 01:02 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-49 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:02 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-46 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:01 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-46 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:01 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-38 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:01 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-38 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:00 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-36 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:00 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-36 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:00 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-32 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 00:59 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-31 ([[phab:T359641|T359641]]) * 00:59 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-32 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 00:59 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-31 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 00:58 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-30 ([[phab:T359641|T359641]]) * 00:53 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-31 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 00:53 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-30 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 00:52 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-29 ([[phab:T359641|T359641]]) * 00:47 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-29 ([[phab:T359641|T359641]]) * 00:47 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-30 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 00:47 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-29 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 00:47 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-28 ([[phab:T359641|T359641]]) * 00:41 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-28 ([[phab:T359641|T359641]]) * 00:41 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-29 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 00:41 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-28 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 00:35 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-28 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 00:35 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-27 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 00:34 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-27 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 00:34 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-26 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 00:33 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-26 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 00:33 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-22 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 00:32 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-22 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 00:32 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-21 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 00:31 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-21 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 00:30 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-60, tools-k8s-worker-nfs-61, tools-k8s-worker-nfs-62, tools-k8s-worker-nfs-63 ([[phab:T359641|T359641]]) * 00:26 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-50, tools-k8s-worker-nfs-56, tools-k8s-worker-nfs-57, tools-k8s-worker-nfs-6 ([[phab:T359641|T359641]]) * 00:10 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-50, tools-k8s-worker-nfs-56, tools-k8s-worker-nfs-57, tools-k8s-worker-nfs-6 ([[phab:T359641|T359641]]) * 00:09 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-38, tools-k8s-worker-nfs-46, tools-k8s-worker-nfs-49, tools-k8s-worker-nfs-50 ([[phab:T359641|T359641]]) * 00:09 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-60, tools-k8s-worker-nfs-61, tools-k8s-worker-nfs-62, tools-k8s-worker-nfs-63 ([[phab:T359641|T359641]]) * 00:04 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-31, tools-k8s-worker-nfs-32, tools-k8s-worker-nfs-33, tools-k8s-worker-nfs-36 ([[phab:T359641|T359641]]) === 2024-09-16 === * 17:56 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-45 * 17:51 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-45 * 17:46 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-6 * 17:40 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-6 === 2024-09-13 === * 11:18 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-54 ([[phab:T374692|T374692]]) * 11:13 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-54 ([[phab:T374692|T374692]]) * 09:42 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-55, tools-k8s-worker-nfs-5, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-14 ([[phab:T374692|T374692]]) * 09:20 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-55, tools-k8s-worker-nfs-5, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-14 ([[phab:T374692|T374692]]) * 09:12 aborrero@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-worker-55, tools-k8s-worker-nfs-5, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-14 ([[phab:T374692|T374692]]) * 09:12 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-55, tools-k8s-worker-nfs-5, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-14 ([[phab:T374692|T374692]]) === 2024-09-12 === * 12:06 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-33 ([[phab:T374612|T374612]]) * 11:59 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-33 ([[phab:T374612|T374612]]) * 11:54 aborrero@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-worker-nfs-23, tools-k8s-worker-16, tools-k8s-worker-nfs-33 ([[phab:T374612|T374612]]) * 11:48 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-23, tools-k8s-worker-16, tools-k8s-worker-nfs-33 ([[phab:T374612|T374612]]) * 11:42 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-28 ([[phab:T374612|T374612]]) * 11:37 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-28 ([[phab:T374612|T374612]]) === 2024-09-11 === * 10:27 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers * 10:20 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers === 2024-09-09 === * 16:23 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component cert-manager * 16:16 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component cert-manager === 2024-09-06 === * 08:47 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 08:42 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 08:38 wmbot~raymondndibe@wmf3402: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-api * 08:36 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 07:14 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) * 07:14 sstefanova@cloudcumin1001: Updating container image docker-registry.tools.wmflabs.org/pause:3.6 * 07:14 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry === 2024-09-05 === * 13:50 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) ([[phab:T359641|T359641]]) * 13:50 wmbot~raymondndibe@wmf3402: Updating container image docker-registry.tools.wmflabs.org/cert-manager/stakater-reloader:v1.1.0 ([[phab:T359641|T359641]]) * 13:50 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T359641|T359641]]) * 13:46 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) ([[phab:T359641|T359641]]) * 13:45 wmbot~raymondndibe@wmf3402: Updating container image docker-registry.tools.wmflabs.org/cert-manager/startupapicheck:v1.15.3 ([[phab:T359641|T359641]]) * 13:45 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T359641|T359641]]) * 13:41 wmbot~raymondndibe@wmf3402: END (FAIL) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=99) ([[phab:T359641|T359641]]) * 13:41 wmbot~raymondndibe@wmf3402: Updating container image docker-registry.tools.wmflabs.org/cert-manager/startupapicheck:v1.15.3 ([[phab:T359641|T359641]]) * 13:41 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T359641|T359641]]) * 13:40 wmbot~raymondndibe@wmf3402: END (FAIL) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=99) ([[phab:T359641|T359641]]) * 13:40 wmbot~raymondndibe@wmf3402: Updating container image docker-registry.tools.wmflabs.org/cert-manager/startupapicheck:v1.15.3 ([[phab:T359641|T359641]]) * 13:40 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T359641|T359641]]) * 13:28 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) ([[phab:T359641|T359641]]) * 13:27 wmbot~raymondndibe@wmf3402: Updating container image docker-registry.tools.wmflabs.org/cert-manager/cainjector:v1.15.3 ([[phab:T359641|T359641]]) * 13:27 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T359641|T359641]]) * 13:26 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) ([[phab:T359641|T359641]]) * 13:26 wmbot~raymondndibe@wmf3402: Updating container image docker-registry.tools.wmflabs.org/cert-manager/webhook:v1.15.3 ([[phab:T359641|T359641]]) * 13:26 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T359641|T359641]]) * 13:24 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) ([[phab:T359641|T359641]]) * 13:23 wmbot~raymondndibe@wmf3402: Updating container image docker-registry.tools.wmflabs.org/cert-manager/controller:v1.15.3 ([[phab:T359641|T359641]]) * 13:23 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T359641|T359641]]) === 2024-09-04 === * 14:08 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 14:04 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 14:03 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 14:02 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 14:02 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers * 13:56 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers * 13:41 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 13:37 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 13:36 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 13:35 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 13:07 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission * 13:03 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission * 13:02 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission * 13:02 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission === 2024-09-03 === * 20:19 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission * 19:53 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer * 19:48 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer * 19:36 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 19:29 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 15:46 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component kyverno * 15:40 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component kyverno * 15:29 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component kyverno * 15:22 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component kyverno * 14:41 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=99) * 14:41 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 14:30 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-admission * 14:24 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 14:06 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry (exit_code=0) ([[phab:T359641|T359641]]) * 14:05 wmbot~dcaro@urcuchillay: Updating container image docker-registry.tools.wmflabs.org/bitnami-kubectl:1.28.5 ([[phab:T359641|T359641]]) * 14:05 wmbot~dcaro@urcuchillay: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-reports-controller:v1.12.5 ([[phab:T359641|T359641]]) * 14:05 wmbot~dcaro@urcuchillay: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-cleanup-controller:v1.12.5 ([[phab:T359641|T359641]]) * 14:05 wmbot~dcaro@urcuchillay: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-background-controller:v1.12.5 ([[phab:T359641|T359641]]) * 14:04 wmbot~dcaro@urcuchillay: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyvernopre:v1.12.5 ([[phab:T359641|T359641]]) * 14:04 wmbot~dcaro@urcuchillay: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyverno-cli:v1.12.5 ([[phab:T359641|T359641]]) * 14:04 wmbot~dcaro@urcuchillay: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyverno:v1.12.5 ([[phab:T359641|T359641]]) * 14:04 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry ([[phab:T359641|T359641]]) * 13:56 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission * 13:56 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry (exit_code=0) ([[phab:T359641|T359641]]) * 13:55 wmbot~dcaro@urcuchillay: Updating container image docker-registry.tools.wmflabs.org/bitnami-kubectl:1.28.5 ([[phab:T359641|T359641]]) * 13:54 wmbot~dcaro@urcuchillay: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-reports-controller:v1.12.5 ([[phab:T359641|T359641]]) * 13:54 wmbot~dcaro@urcuchillay: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-cleanup-controller:v1.12.5 ([[phab:T359641|T359641]]) * 13:53 wmbot~dcaro@urcuchillay: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-background-controller:v1.12.5 ([[phab:T359641|T359641]]) * 13:53 wmbot~dcaro@urcuchillay: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyvernopre:v1.12.5 ([[phab:T359641|T359641]]) * 13:53 wmbot~dcaro@urcuchillay: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyverno:v1.12.5 ([[phab:T359641|T359641]]) * 13:53 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry ([[phab:T359641|T359641]]) * 13:50 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 13:23 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 13:17 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 13:04 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 12:59 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 11:59 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-api * 11:53 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api * 10:21 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 10:15 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 09:57 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder * 09:51 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 05:15 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-31 from 1.25.16 to 1.26.15 * 05:13 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-31 from 1.25.16 to 1.26.15 * 05:12 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-30 from 1.25.16 to 1.26.15 * 05:11 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-30 from 1.25.16 to 1.26.15 * 05:11 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-29 from 1.25.16 to 1.26.15 * 05:10 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-29 from 1.25.16 to 1.26.15 === 2024-09-02 === * 14:31 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-108 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 14:30 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-108 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 14:30 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-107 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 14:29 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-107 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 14:29 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-106 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 14:28 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-106 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 14:28 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-105 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 14:27 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-105 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 14:27 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-103 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 14:26 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-103 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 14:26 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-102 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 14:24 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-102 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 14:20 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-64 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 14:19 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-64 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 14:17 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-63 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 14:17 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-63 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:33 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-28 from 1.25.16 to 1.26.15 * 13:32 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-28 from 1.25.16 to 1.26.15 * 13:32 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-27 from 1.25.16 to 1.26.15 * 13:30 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-27 from 1.25.16 to 1.26.15 * 13:30 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-26 from 1.25.16 to 1.26.15 * 13:30 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-63 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:29 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-63 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:29 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-62 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:29 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-26 from 1.25.16 to 1.26.15 * 13:28 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-24 from 1.25.16 to 1.26.15 * 13:28 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-62 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:28 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-61 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:27 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-24 from 1.25.16 to 1.26.15 * 13:27 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-61 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:27 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-60 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:26 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-60 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:26 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-58 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:25 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-23 from 1.25.16 to 1.26.15 * 13:24 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-58 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:24 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-57 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:24 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-23 from 1.25.16 to 1.26.15 * 13:23 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-57 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:23 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-56 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:23 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-22 from 1.25.16 to 1.26.15 * 13:22 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-22 from 1.25.16 to 1.26.15 * 13:22 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-56 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:22 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-55 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:21 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-21 from 1.25.16 to 1.26.15 * 13:21 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-55 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:21 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-54 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:20 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-21 from 1.25.16 to 1.26.15 * 13:20 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-54 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:20 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-53 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:18 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-53 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:17 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-51 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:17 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-51 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:17 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-50 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:16 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-20 from 1.25.16 to 1.26.15 * 13:15 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-50 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:15 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-49 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:15 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-20 from 1.25.16 to 1.26.15 * 13:14 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-49 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:14 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-48 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:14 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-19 from 1.25.16 to 1.26.15 * 13:13 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-48 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:13 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-47 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:13 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-19 from 1.25.16 to 1.26.15 * 13:12 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-17 from 1.25.16 to 1.26.15 * 13:12 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-47 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:12 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-46 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:11 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-17 from 1.25.16 to 1.26.15 * 13:11 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-46 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:11 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-45 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:10 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-16 from 1.25.16 to 1.26.15 * 13:09 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-45 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:09 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-44 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:08 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-16 from 1.25.16 to 1.26.15 * 13:08 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-44 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:08 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-43 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:08 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-14 from 1.25.16 to 1.26.15 * 13:07 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-43 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:07 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-42 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:07 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-14 from 1.25.16 to 1.26.15 * 13:07 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-13 from 1.25.16 to 1.26.15 * 13:06 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-42 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:06 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-41 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:06 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-13 from 1.25.16 to 1.26.15 * 13:05 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-12 from 1.25.16 to 1.26.15 * 13:05 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-41 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:05 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-40 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:04 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-12 from 1.25.16 to 1.26.15 * 13:04 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-40 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:04 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-39 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:03 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-11 from 1.25.16 to 1.26.15 * 13:02 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-39 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:02 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-38 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:01 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-11 from 1.25.16 to 1.26.15 * 13:01 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-38 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:01 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-37 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:01 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-10 from 1.25.16 to 1.26.15 * 13:00 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-37 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:00 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-36 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:00 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-10 from 1.25.16 to 1.26.15 * 12:59 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-9 from 1.25.16 to 1.26.15 * 12:59 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-36 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 12:59 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-35 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 12:58 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-9 from 1.25.16 to 1.26.15 * 12:57 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-35 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 12:57 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-34 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 12:57 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-8 from 1.25.16 to 1.26.15 * 12:56 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-34 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 12:56 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-33 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 12:56 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-8 from 1.25.16 to 1.26.15 * 12:55 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-7 from 1.25.16 to 1.26.15 * 12:55 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-33 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 12:55 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-32 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 12:54 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-7 from 1.25.16 to 1.26.15 * 12:54 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-32 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 12:47 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-ingress-7 from 1.25.16 to 1.26.15 * 12:46 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-ingress-7 from 1.25.16 to 1.26.15 * 12:45 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-ingress-8 from 1.25.16 to 1.26.15 * 12:43 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-ingress-8 from 1.25.16 to 1.26.15 * 12:41 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-ingress-9 from 1.25.16 to 1.26.15 * 12:40 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-ingress-9 from 1.25.16 to 1.26.15 * 12:35 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-6 from 1.25.16 to 1.26.15 * 12:34 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-6 from 1.25.16 to 1.26.15 * 12:33 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-5 from 1.25.16 to 1.26.15 * 12:32 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-5 from 1.25.16 to 1.26.15 * 12:32 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-3 from 1.25.16 to 1.26.15 * 12:31 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-3 from 1.25.16 to 1.26.15 * 12:29 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-2 from 1.25.16 to 1.26.15 * 12:27 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-2 from 1.25.16 to 1.26.15 * 12:26 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-1 from 1.25.16 to 1.26.15 * 12:24 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-1 from 1.25.16 to 1.26.15 * 12:24 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-9 from 1.25.16 to 1.26.15 * 12:12 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-9 from 1.25.16 to 1.26.15 * 12:11 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-8 from 1.25.16 to 1.26.15 * 12:00 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-8 from 1.25.16 to 1.26.15 * 11:59 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-7 from 1.25.16 to 1.26.15 * 11:48 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-7 from 1.25.16 to 1.26.15 * 11:45 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.prepare_upgrade (exit_code=0) for cluster tools upgrade from 1.25.16 to 1.26.15 * 11:43 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.prepare_upgrade for cluster tools upgrade from 1.25.16 to 1.26.15 * 10:05 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 09:58 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 09:49 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission * 09:43 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 09:21 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 09:16 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 09:06 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 09:00 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 08:48 wmbot~dcaro@urcuchillay: END (ERROR) - Cookbook wmcs.toolforge.component.deploy (exit_code=97) for component components-api * 08:48 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component components-api === 2024-08-29 === * 16:32 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder * 16:26 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 08:00 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-nginx * 07:59 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-nginx === 2024-08-27 === * 12:06 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) * 12:06 sstefanova@cloudcumin1001: Updating container image docker-registry.tools.wmflabs.org/nginx-ingress-controller:v1.11.2 * 12:06 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry * 09:46 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker role in the tools cluster * 09:46 wmbot~dcaro@urcuchillay: Added a new k8s worker tools-k8s-worker-108.tools.eqiad1.wikimedia.cloud to the cluster * 09:36 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster * 09:05 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component calico * 08:59 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component calico * 08:57 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component calico * 08:56 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component calico * 08:55 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-104 ([[phab:T373243|T373243]]) * 08:53 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-104 ([[phab:T373243|T373243]]) * 08:38 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-nfs-52 ([[phab:T373243|T373243]]) * 08:37 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-nfs-52 ([[phab:T373243|T373243]]) * 08:35 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-nfs-51 ([[phab:T373243|T373243]]) * 08:34 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-nfs-51 ([[phab:T373243|T373243]]) * 08:33 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-nfs-25 ([[phab:T373243|T373243]]) * 08:31 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-nfs-25 ([[phab:T373243|T373243]]) * 08:31 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-nfs-18 ([[phab:T373243|T373243]]) * 08:29 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-nfs-18 ([[phab:T373243|T373243]]) * 08:29 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-nfs-15 ([[phab:T373243|T373243]]) * 08:26 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-nfs-15 ([[phab:T373243|T373243]]) * 08:26 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-nfs-4 ([[phab:T373243|T373243]]) * 08:24 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-nfs-4 ([[phab:T373243|T373243]]) * 08:19 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker role in the tools cluster * 08:19 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster === 2024-08-26 === * 21:13 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 21:13 wmbot~dcaro@urcuchillay: Added a new k8s worker-nfs tools-k8s-worker-nfs-64.tools.eqiad1.wikimedia.cloud to the cluster * 21:03 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 21:03 wmbot~dcaro@urcuchillay: END (ERROR) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=97) for a worker-nfs role in the tools cluster * 21:03 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 20:23 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 20:23 wmbot~dcaro@urcuchillay: Added a new k8s worker-nfs tools-k8s-worker-nfs-63.tools.eqiad1.wikimedia.cloud to the cluster * 20:13 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 20:13 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) * 20:13 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.quota_increase * 18:35 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 18:34 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 17:49 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 17:49 wmbot~dcaro@urcuchillay: Added a new k8s worker-nfs tools-k8s-worker-nfs-62.tools.eqiad1.wikimedia.cloud to the cluster * 17:38 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 17:38 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) * 17:38 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.quota_increase * 17:33 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 17:33 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 17:33 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) * 17:33 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.quota_increase * 17:30 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 17:29 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 17:04 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 17:04 wmbot~dcaro@urcuchillay: Added a new k8s worker-nfs tools-k8s-worker-nfs-61.tools.eqiad1.wikimedia.cloud to the cluster * 16:54 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 16:54 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 16:54 wmbot~dcaro@urcuchillay: Added a new k8s worker-nfs tools-k8s-worker-nfs-60.tools.eqiad1.wikimedia.cloud to the cluster * 16:42 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 16:30 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 16:26 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 16:14 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 16:14 wmbot~dcaro@urcuchillay: Added a new k8s worker-nfs tools-k8s-worker-nfs-58.tools.eqiad1.wikimedia.cloud to the cluster * 16:02 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 16:02 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 16:02 wmbot~dcaro@urcuchillay: Added a new k8s worker-nfs tools-k8s-worker-nfs-57.tools.eqiad1.wikimedia.cloud to the cluster * 15:50 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 15:49 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 15:48 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 15:44 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 15:39 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 15:38 wmbot~dcaro@urcuchillay: END (ERROR) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=97) for a worker-nfs role in the tools cluster * 15:35 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 15:33 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 15:32 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 15:15 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 15:10 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 14:03 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-4 ([[phab:T373243|T373243]]) * 13:12 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-4, tools-k8s-worker-nfs-15, tools-k8s-worker-nfs-18, tools-k8s-worker-nfs-25, tools-k8s-worker-nfs-51, tools-k8s-worker-nfs-52, tools-k8s-worker-104 ([[phab:T373243|T373243]]) * 13:05 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-4, tools-k8s-worker-nfs-15, tools-k8s-worker-nfs-18, tools-k8s-worker-nfs-25, tools-k8s-worker-nfs-51, tools-k8s-worker-nfs-52, tools-k8s-worker-104 ([[phab:T373243|T373243]]) * 12:53 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-104 ([[phab:T373243|T373243]]) * 12:53 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-104 ([[phab:T373243|T373243]]) * 12:44 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-104 ([[phab:T373243|T373243]]) * 12:42 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-104 ([[phab:T373243|T373243]]) * 11:06 dcaro: manually deleted the coredns pods that had been around for 4d * 09:08 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 09:03 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 09:02 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component api-gateway * 09:01 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 09:00 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component api-gateway * 08:58 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 08:18 dcaro: scale up cordens deployment to 4 replicas === 2024-08-21 === * 05:44 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 05:38 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 05:27 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder * 05:20 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 05:01 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 04:55 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 04:43 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 04:36 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 04:28 wmbot~raymond@ubuntu: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component volume-admission * 04:25 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 04:22 wmbot~raymond@ubuntu: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component volume-admission * 04:21 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 04:20 wmbot~raymond@ubuntu: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component volume-admission * 04:20 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 04:10 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission * 04:03 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission * 03:49 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 03:42 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 03:33 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 03:28 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 03:19 wmbot~raymond@ubuntu: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api * 03:17 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 03:13 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.component.deploy for component builds-api === 2024-08-19 === * 22:02 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-24 * 21:56 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-24 * 21:52 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-17 * 21:46 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-17 * 21:46 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-worker-nfs-17,tools-k8s-worker-nfs-24 * 21:46 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-17,tools-k8s-worker-nfs-24 === 2024-08-15 === * 06:30 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-20 * 06:24 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-20 === 2024-08-13 === * 09:54 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 09:49 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 07:39 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-6 * 07:33 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-6 === 2024-08-12 === * 15:33 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 15:27 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 12:31 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 12:25 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 11:51 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component tools-webservice * 11:46 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component tools-webservice * 10:30 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 10:24 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 09:57 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 09:50 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway === 2024-08-08 === * 16:57 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-api * 16:51 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api * 16:36 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 16:30 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 16:11 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 16:05 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component components-api === 2024-08-06 === * 09:50 wmbot~dcaro@urcuchillay: END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=1) * 09:50 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 09:50 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 09:28 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 09:20 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 09:20 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 09:20 wmbot~dcaro@urcuchillay: END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=255) * 09:19 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 09:19 wmbot~dcaro@urcuchillay: END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=255) * 09:19 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console === 2024-08-05 === * 13:35 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component components-api * 13:34 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component components-api * 11:42 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component api-gateway * 11:42 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component api-gateway * 09:18 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component api-gateway * 09:18 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component api-gateway * 08:38 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component api-gateway * 08:38 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component api-gateway === 2024-08-01 === * 20:42 bd808: Uncordoned tools-k8s-worker-nfs-55 following reboot * 20:40 bd808: Hard reboot of tools-k8s-worker-nfs-55 following drain cookbook run. Stuck pod remained stuck as expected. * 20:37 bd808@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=99) for node tools-k8s-worker-nfs-55 * 20:32 bd808: Draining and rebooting tools-k8s-worker-nfs-55 after reports of stuck pods via irc * 20:32 bd808@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-55 * 15:32 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component components-api * 15:31 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component components-api === 2024-07-31 === * 20:37 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-cli * 20:36 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-cli * 20:26 wmbot~dcaro@urcuchillay: END (ERROR) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=97) for component jobs-cli * 20:26 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-cli * 16:17 andrewbogott: changing login.tools.wmlabs.org to point to a newer bastion, tools-bastion-12, in response to [[phab:T371505|T371505]] * 11:38 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component api-gateway * 11:38 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component api-gateway * 11:33 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component components-api * 11:33 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component components-api * 10:07 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-22, tools-k8s-worker-nfs-26, tools-k8s-worker-nfs-43 * 09:49 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-22, tools-k8s-worker-nfs-26, tools-k8s-worker-nfs-43 === 2024-07-30 === * 18:08 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-cli * 18:06 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-cli * 18:06 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-cli * 18:05 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-cli * 18:02 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component jobs-cli * 18:02 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-cli * 18:02 wmbot~raymond@ubuntu: END (ERROR) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=97) for component jobs-cli * 18:01 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-cli * 17:59 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component jobs-cli * 17:59 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-cli * 17:49 wmbot~raymond@ubuntu: END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component jobs-cli * 17:49 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-cli * 17:40 wmbot~raymond@ubuntu: END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component jobs-cli * 17:39 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-cli * 17:37 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 17:36 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 16:34 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-23 * 16:28 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-23 === 2024-07-29 === * 18:24 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 18:23 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 18:06 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 18:05 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 16:24 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 16:24 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 14:05 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.rebuild_dbinstance (exit_code=0) * 14:03 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.rebuild_dbinstance * 13:19 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-cli * 13:18 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-cli * 12:08 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-cli * 12:07 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-cli * 12:01 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component builds-cli * 12:00 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-cli === 2024-07-25 === * 15:19 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component api-gateway * 15:19 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component api-gateway * 08:37 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component wmcs-k8s-metrics * 08:37 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component wmcs-k8s-metrics === 2024-07-24 === * 09:21 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component ingress-nginx * 09:21 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component ingress-nginx * 08:11 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component ingress-admission * 08:11 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component ingress-admission * 07:07 sstefanova@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component ingress-admission * 06:57 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component ingress-admission === 2024-07-23 === * 15:04 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component volume-admission * 15:04 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component volume-admission * 13:49 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component registry-admission * 13:49 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component registry-admission * 12:20 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 12:20 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 12:15 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api * 12:14 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api * 12:08 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 12:08 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 08:01 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api * 08:00 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api === 2024-07-22 === * 17:42 dcaro: moved the apt repo to service endpoint deb.svc.toolforge.org * 17:39 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-3 * 17:38 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-3 * 17:03 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 17:03 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 17:00 dcaro: moving the toolforge apt repo to tools-services-06 * 16:55 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-services-06.tools.eqiad1.wikimedia.cloud * 16:53 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-services-06.tools.eqiad1.wikimedia.cloud * 09:59 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 09:58 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 09:43 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 09:43 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api === 2024-07-19 === * 12:46 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) * 12:46 sstefanova@cloudcumin1001: Updating container image docker-registry.tools.wmflabs.org/kube-state-metrics:v2.9.2 * 12:46 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry * 10:03 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) * 10:02 sstefanova@cloudcumin1001: Updating container image docker-registry.tools.wmflabs.org/nginx-ingress-controller:v1.9.6 * 10:02 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry === 2024-07-18 === * 14:39 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 14:39 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 08:49 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api * 08:49 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api === 2024-07-17 === * 14:50 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 14:50 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 11:13 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 11:13 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 11:12 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component builds-builder * 11:12 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 10:44 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 10:44 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 10:25 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api * 10:24 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api * 10:13 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 10:13 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 09:07 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 09:07 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 08:26 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component ingress-nginx * 08:26 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component ingress-nginx === 2024-07-16 === * 15:03 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 15:03 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 14:12 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-ingress-7 from 1.24.17 to 1.25.16 * 14:11 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-ingress-7 from 1.24.17 to 1.25.16 * 14:11 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-ingress-8 from 1.24.17 to 1.25.16 * 14:10 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-ingress-8 from 1.24.17 to 1.25.16 * 14:09 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-ingress-9 from 1.24.17 to 1.25.16 * 14:08 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-ingress-9 from 1.24.17 to 1.25.16 * 11:36 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 11:35 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 11:33 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 11:31 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-28 from 1.24.17 to 1.25.16 * 11:30 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-28 from 1.24.17 to 1.25.16 * 11:30 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-27 from 1.24.17 to 1.25.16 * 11:28 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-27 from 1.24.17 to 1.25.16 * 11:28 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-26 from 1.24.17 to 1.25.16 * 11:27 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-26 from 1.24.17 to 1.25.16 * 11:26 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-25 from 1.24.17 to 1.25.16 * 11:25 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 11:25 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-25 from 1.24.17 to 1.25.16 * 11:24 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-24 from 1.24.17 to 1.25.16 * 11:23 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 11:23 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-24 from 1.24.17 to 1.25.16 * 11:23 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-23 from 1.24.17 to 1.25.16 * 11:22 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 11:22 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-23 from 1.24.17 to 1.25.16 * 11:21 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-22 from 1.24.17 to 1.25.16 * 11:20 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-22 from 1.24.17 to 1.25.16 * 11:16 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-107 from 1.24.17 to 1.25.16 * 11:15 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-107 from 1.24.17 to 1.25.16 * 11:15 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-106 from 1.24.17 to 1.25.16 * 11:14 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-106 from 1.24.17 to 1.25.16 * 11:13 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-105 from 1.24.17 to 1.25.16 * 11:12 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-105 from 1.24.17 to 1.25.16 * 11:12 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-21 from 1.24.17 to 1.25.16 * 11:11 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-21 from 1.24.17 to 1.25.16 * 11:10 sstefanova@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-nfs-worker-21 from 1.24.17 to 1.25.16 * 11:10 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-nfs-worker-21 from 1.24.17 to 1.25.16 * 11:08 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-21 * 11:02 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-21 * 10:59 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-104 from 1.24.17 to 1.25.16 * 10:58 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-104 from 1.24.17 to 1.25.16 * 10:57 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-103 from 1.24.17 to 1.25.16 * 10:57 aborrero@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-21 from 1.24.17 to 1.25.16 * 10:56 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-103 from 1.24.17 to 1.25.16 * 10:55 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-102 from 1.24.17 to 1.25.16 * 10:54 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-102 from 1.24.17 to 1.25.16 * 10:53 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-56 from 1.24.17 to 1.25.16 * 10:52 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-56 from 1.24.17 to 1.25.16 * 10:51 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-55 from 1.24.17 to 1.25.16 * 10:51 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-21 from 1.24.17 to 1.25.16 * 10:51 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-20 from 1.24.17 to 1.25.16 * 10:50 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-55 from 1.24.17 to 1.25.16 * 10:50 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-54 from 1.24.17 to 1.25.16 * 10:50 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-20 from 1.24.17 to 1.25.16 * 10:50 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-19 from 1.24.17 to 1.25.16 * 10:49 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-54 from 1.24.17 to 1.25.16 * 10:49 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-19 from 1.24.17 to 1.25.16 * 10:49 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-18 from 1.24.17 to 1.25.16 * 10:48 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-18 from 1.24.17 to 1.25.16 * 10:48 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-17 from 1.24.17 to 1.25.16 * 10:47 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-53 from 1.24.17 to 1.25.16 * 10:46 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-17 from 1.24.17 to 1.25.16 * 10:46 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-16 from 1.24.17 to 1.25.16 * 10:46 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-53 from 1.24.17 to 1.25.16 * 10:45 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-16 from 1.24.17 to 1.25.16 * 10:45 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-15 from 1.24.17 to 1.25.16 * 10:45 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-52 from 1.24.17 to 1.25.16 * 10:44 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-15 from 1.24.17 to 1.25.16 * 10:44 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-14 from 1.24.17 to 1.25.16 * 10:44 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-52 from 1.24.17 to 1.25.16 * 10:43 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-14 from 1.24.17 to 1.25.16 * 10:43 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-13 from 1.24.17 to 1.25.16 * 10:43 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-51 from 1.24.17 to 1.25.16 * 10:42 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-13 from 1.24.17 to 1.25.16 * 10:42 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-12 from 1.24.17 to 1.25.16 * 10:42 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-51 from 1.24.17 to 1.25.16 * 10:41 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-50 from 1.24.17 to 1.25.16 * 10:41 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-12 from 1.24.17 to 1.25.16 * 10:41 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-11 from 1.24.17 to 1.25.16 * 10:40 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-50 from 1.24.17 to 1.25.16 * 10:40 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-49 from 1.24.17 to 1.25.16 * 10:40 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-11 from 1.24.17 to 1.25.16 * 10:40 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-10 from 1.24.17 to 1.25.16 * 10:39 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-49 from 1.24.17 to 1.25.16 * 10:39 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-10 from 1.24.17 to 1.25.16 * 10:39 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-9 from 1.24.17 to 1.25.16 * 10:39 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-48 from 1.24.17 to 1.25.16 * 10:38 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-9 from 1.24.17 to 1.25.16 * 10:38 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-8 from 1.24.17 to 1.25.16 * 10:38 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-48 from 1.24.17 to 1.25.16 * 10:37 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-47 from 1.24.17 to 1.25.16 * 10:37 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-8 from 1.24.17 to 1.25.16 * 10:37 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-7 from 1.24.17 to 1.25.16 * 10:36 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-47 from 1.24.17 to 1.25.16 * 10:35 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-7 from 1.24.17 to 1.25.16 * 10:35 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-6 from 1.24.17 to 1.25.16 * 10:35 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-46 from 1.24.17 to 1.25.16 * 10:34 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-6 from 1.24.17 to 1.25.16 * 10:34 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-46 from 1.24.17 to 1.25.16 * 10:34 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-45 from 1.24.17 to 1.25.16 * 10:32 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-45 from 1.24.17 to 1.25.16 * 10:32 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-44 from 1.24.17 to 1.25.16 * 10:31 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-44 from 1.24.17 to 1.25.16 * 10:31 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-43 from 1.24.17 to 1.25.16 * 10:29 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-43 from 1.24.17 to 1.25.16 * 10:29 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-42 from 1.24.17 to 1.25.16 * 10:28 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-42 from 1.24.17 to 1.25.16 * 10:27 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-41 from 1.24.17 to 1.25.16 * 10:26 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-41 from 1.24.17 to 1.25.16 * 10:26 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-40 from 1.24.17 to 1.25.16 * 10:25 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-40 from 1.24.17 to 1.25.16 * 10:24 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-39 from 1.24.17 to 1.25.16 * 10:23 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-39 from 1.24.17 to 1.25.16 * 10:23 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-38 from 1.24.17 to 1.25.16 * 10:22 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-38 from 1.24.17 to 1.25.16 * 10:21 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-37 from 1.24.17 to 1.25.16 * 10:20 aborrero@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-6 from 1.24.17 to 1.25.16 * 10:20 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-37 from 1.24.17 to 1.25.16 * 10:19 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-36 from 1.24.17 to 1.25.16 * 10:18 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-36 from 1.24.17 to 1.25.16 * 10:17 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-35 from 1.24.17 to 1.25.16 * 10:16 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-35 from 1.24.17 to 1.25.16 * 10:16 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-34 from 1.24.17 to 1.25.16 * 10:15 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-34 from 1.24.17 to 1.25.16 * 10:14 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-6 from 1.24.17 to 1.25.16 * 10:14 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-5 from 1.24.17 to 1.25.16 * 10:14 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-admission * 10:14 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-33 from 1.24.17 to 1.25.16 * 10:14 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-admission * 10:13 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-5 from 1.24.17 to 1.25.16 * 10:13 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-3 from 1.24.17 to 1.25.16 * 10:13 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-33 from 1.24.17 to 1.25.16 * 10:12 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-32 from 1.24.17 to 1.25.16 * 10:12 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-3 from 1.24.17 to 1.25.16 * 10:12 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-2 from 1.24.17 to 1.25.16 * 10:11 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-32 from 1.24.17 to 1.25.16 * 10:11 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-2 from 1.24.17 to 1.25.16 * 10:11 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-31 from 1.24.17 to 1.25.16 * 10:11 aborrero@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=97) for node tools-k8s-worker-nfs-6 from 1.24.17 to 1.25.16 * 10:11 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-6 from 1.24.17 to 1.25.16 * 10:10 aborrero@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=97) for node tools-k8s-worker-nfs-5 from 1.24.17 to 1.25.16 * 10:10 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-5 from 1.24.17 to 1.25.16 * 10:10 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-4 from 1.24.17 to 1.25.16 * 10:10 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-31 from 1.24.17 to 1.25.16 * 10:10 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-30 from 1.24.17 to 1.25.16 * 10:09 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-4 from 1.24.17 to 1.25.16 * 10:08 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-30 from 1.24.17 to 1.25.16 * 10:08 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-29 from 1.24.17 to 1.25.16 * 10:07 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-29 from 1.24.17 to 1.25.16 * 09:52 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-1 from 1.24.17 to 1.25.16 * 09:51 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-1 from 1.24.17 to 1.25.16 * 09:50 aborrero@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-1 from 1.24.17 to 1.25.16 * 09:50 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-1 from 1.24.17 to 1.25.16 * 09:48 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-9 from 1.24.17 to 1.25.16 * 09:41 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-9 from 1.24.17 to 1.25.16 * 09:39 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-8 from 1.24.17 to 1.25.16 * 09:28 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-8 from 1.24.17 to 1.25.16 * 09:17 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-7 from 1.24.17 to 1.25.16 * 09:10 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-7 from 1.24.17 to 1.25.16 * 09:07 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.prepare_upgrade (exit_code=0) for cluster tools upgrade from 1.24.17 to 1.25.16 * 09:06 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.prepare_upgrade for cluster tools upgrade from 1.24.17 to 1.25.16 * 08:52 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-admission * 08:52 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-admission === 2024-07-15 === * 14:42 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 14:42 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 11:40 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 11:40 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 08:02 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component volume-admission * 08:02 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component volume-admission === 2024-07-11 === * 17:49 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 17:49 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 13:49 dcaro: deploy toolforge-jobs-framework 16.0.13 ([[phab:T369573|T369573]]) * 11:55 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-admission * 11:55 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-admission === 2024-07-10 === * 17:09 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 17:09 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 16:57 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api * 16:57 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api * 16:01 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component registry-admission * 16:01 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component registry-admission * 15:16 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 15:16 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 12:52 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 12:52 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 10:10 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api * 10:10 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api === 2024-07-09 === * 14:21 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component registry-admission * 14:21 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component registry-admission * 14:19 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 14:18 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers === 2024-07-08 === * 20:22 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-37 * 20:16 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-37 * 14:09 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component wmcs-k8s-metrics * 14:08 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component wmcs-k8s-metrics * 13:57 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-elastic-3 * 13:57 andrew@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-elastic-3 * 13:57 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-elastic-2 * 13:56 andrew@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-elastic-2 * 13:56 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-elastic-1 * 13:56 andrew@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-elastic-1 * 13:36 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component registry-admission * 13:36 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component registry-admission * 13:20 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component volume-admission * 13:20 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component volume-admission * 12:49 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component ingress-admission * 12:49 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component ingress-admission * 12:00 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 11:59 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 08:46 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 08:46 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api === 2024-07-05 === * 12:52 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component kyverno * 12:51 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component kyverno * 12:34 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component kyverno * 12:34 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component kyverno * 12:29 wmbot~arturo@nostromo: END (PASS) - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry (exit_code=0) * 12:29 wmbot~arturo@nostromo: Updating container image docker-registry.tools.wmflabs.org/bitnami-kubectl:1.26.4 * 12:29 wmbot~arturo@nostromo: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-reports-controller:v1.10.7 * 12:28 wmbot~arturo@nostromo: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-cleanup-controller:v1.10.7 * 12:28 wmbot~arturo@nostromo: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-background-controller:v1.10.7 * 12:28 wmbot~arturo@nostromo: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyvernopre:v1.10.7 * 12:28 wmbot~arturo@nostromo: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyverno:v1.10.7 * 12:27 wmbot~arturo@nostromo: START - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry * 12:27 wmbot~arturo@nostromo: END (FAIL) - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry (exit_code=99) * 12:26 wmbot~arturo@nostromo: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-reports-controller:v1.10.7 * 12:26 wmbot~arturo@nostromo: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-cleanup-controller:v1.10.7 * 12:26 wmbot~arturo@nostromo: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-background-controller:v1.10.7 * 12:26 wmbot~arturo@nostromo: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyvernopre:v1.10.7 * 12:26 wmbot~arturo@nostromo: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyverno:v1.10.7 * 12:26 wmbot~arturo@nostromo: START - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry * 12:23 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) * 12:23 sstefanova@cloudcumin1001: Updating container image docker-registry.tools.wmflabs.org/kube-state-metrics:v2.7.0 * 12:23 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry * 11:29 wmbot~arturo@nostromo: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) copy image from bitnami/kubectl:1.26.4 to docker-registry.tools.wmflabs.org/bitnami-kubectl:1.26.4 * 11:28 wmbot~arturo@nostromo: Updating container image docker-registry.tools.wmflabs.org/bitnami-kubectl:1.26.4 * 11:28 wmbot~arturo@nostromo: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry copy image from bitnami/kubectl:1.26.4 to docker-registry.tools.wmflabs.org/bitnami-kubectl:1.26.4 * 01:47 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 01:46 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api === 2024-07-04 === * 17:09 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component api-gateway * 17:09 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component api-gateway * 12:57 arturo: updating kubelet flags [[phab:T355881|T355881]] * 12:00 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api * 12:00 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api * 11:36 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component api-gateway * 11:36 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component api-gateway * 09:43 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 09:43 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 09:34 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 09:34 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 07:54 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component api-gateway * 07:53 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component api-gateway === 2024-07-03 === * 12:25 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 12:25 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 10:21 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component api-gateway * 10:21 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component api-gateway * 09:59 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component volume-admission * 09:59 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component volume-admission === 2024-07-02 === * 17:16 andrewbogott: draining (I hope) tools-elastic-3 and tools-elastic-1 for [[phab:T311905|T311905]] * 17:07 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component api-gateway * 17:07 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component api-gateway * 16:55 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 16:55 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 15:01 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api * 15:01 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api * 11:53 arturo: cleanup kubeadm configmap from TTLAfterFinished settings ([[phab:T349197|T349197]]) * 11:51 arturo: remove --feature-gates=TTLAfterFinished=true from kube-controller-manager static pod definition ([[phab:T349197|T349197]]) * 10:54 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 10:54 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 09:56 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component wmcs-k8s-metrics * 09:56 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component wmcs-k8s-metrics * 09:23 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component cert-manager * 09:22 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component cert-manager * 09:10 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 09:10 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder === 2024-07-01 === * 15:36 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 15:36 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 14:59 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component volume-admission * 14:59 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component volume-admission * 14:42 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 14:41 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 13:21 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component ingress-admission * 13:21 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component ingress-admission * 13:06 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component registry-admission * 13:06 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component registry-admission === 2024-06-28 === * 11:13 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component wmcs-k8s-metrics * 11:13 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component wmcs-k8s-metrics * 09:50 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component wmcs-k8s-metrics * 09:50 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component wmcs-k8s-metrics * 09:41 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component kyverno * 09:41 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component kyverno * 09:38 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 09:37 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 09:28 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component api-gateway * 09:28 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component api-gateway === 2024-06-27 === * 16:49 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-etcd-23 * 16:44 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-etcd-23 * 16:22 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-db-1 * 16:21 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-db-1 * 15:49 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=99) for server tools-db-1 * 15:49 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-db-1 * 15:48 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-db-3 * 15:46 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-db-3 * 15:40 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-etcd-24 * 15:37 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-etcd-24 * 15:36 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-etcd-22 * 15:33 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-etcd-22 * 15:03 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component cert-manager * 15:03 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component cert-manager * 14:51 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component ingress-nginx * 14:50 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component ingress-nginx * 11:02 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 11:02 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 10:02 arturo: drop all PSP definitions for all accounts ([[phab:T368142|T368142]]) * 10:02 arturo: disabled PodSecurityPolicy admission plugin from kubeadm configmap ([[phab:T368142|T368142]]) * 09:50 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 09:49 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 08:52 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 08:52 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers === 2024-06-26 === * 11:40 taavi: update pywikibot image to 9.2 [[phab:T363631|T363631]] * 10:43 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 10:43 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 10:18 arturo: deploying toolforge-webservice 0.103.9 ([[phab:T368463|T368463]]) * 09:18 arturo: setting kyverno policies to Enforce ([[phab:T368141|T368141]]) * 09:17 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 09:17 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 08:06 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-29 * 08:01 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-29 === 2024-06-25 === * 21:50 bd808: Live hacked /usr/lib/python3/dist-packages/toolsws/backends/kubernetes.py on login-buster.toolforge.org to remove the `-> dict[str, Any]` type annotations causing [[phab:T368463|T368463]] * 12:31 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-104 * 12:30 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-104 * 12:29 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-103 * 12:29 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-104 * 12:28 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-104 * 12:28 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-103 * 12:27 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-102 * 12:26 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-103 * 12:26 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-103 * 12:26 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-102 * 12:25 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-56 * 12:25 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-102 * 12:25 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-102 * 12:24 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-56 * 12:24 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-55 * 12:23 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-55 * 12:22 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-54 * 12:22 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-56 * 12:21 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-56 * 12:21 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-54 * 12:21 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-53 * 12:20 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-55 * 12:20 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-55 * 12:20 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-53 * 12:16 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-54 * 12:16 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=99) for server tools-k8s-worker-nfs-52 * 12:16 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-54 * 12:16 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-52 * 12:14 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component kyverno * 12:14 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component kyverno * 12:13 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-51 * 12:12 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-53 * 12:11 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-51 * 12:11 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-53 * 11:57 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-50 * 11:56 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-52 * 11:56 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-50 * 11:56 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=99) for server tools-k8s-worker-50 * 11:56 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-50 * 11:56 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-52 * 11:52 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-51 * 11:51 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=99) for server tools-k8s-worker-50 * 11:51 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-51 * 11:51 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-50 * 11:40 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-50 * 11:39 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-50 * 11:11 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-proxy-7 * 11:10 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-proxy-7 * 11:09 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.migrate_floating_ip (exit_code=0) for address 185.15.56.11 to server 'tools-proxy-8' * 11:09 taavi@cloudcumin1001: START - Cookbook wmcs.vps.migrate_floating_ip for address 185.15.56.11 to server 'tools-proxy-8' * 09:44 arturo: deploy toolforge-webservice 0.103.8 ([[phab:T362050|T362050]]) * 09:32 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-haproxy-6 * 09:30 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-haproxy-6 * 09:30 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-control-9 * 09:28 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-control-9 * 09:23 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 09:23 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 09:22 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-ingress-9 * 09:21 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-ingress-9 * 08:49 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-49 * 08:48 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-49 * 08:48 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-48 * 08:47 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-49 * 08:47 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-48 * 08:47 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-49 * 08:46 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-47 * 08:46 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-48 * 08:45 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-48 * 08:45 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-47 * 08:45 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-46 * 08:44 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-46 * 08:44 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-45 * 08:43 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-47 * 08:43 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-47 * 08:42 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-45 * 08:42 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-44 * 08:42 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-46 * 08:42 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-46 * 08:40 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-44 * 08:40 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-45 * 08:40 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-45 * 08:40 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=99) for server tools-k8s-worker-nfs-43 * 08:39 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-43 * 08:38 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-42 * 08:38 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-44 * 08:38 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-44 * 08:37 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-43 * 08:36 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-43 * 08:36 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-42 * 08:13 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=99) for node tools-k8s-worker-nfs-42 * 08:08 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-42 * 08:07 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=99) for node tools-k8s-worker-nfs-42 * 08:03 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-41 * 08:02 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-42 * 08:02 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-41 * 08:01 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-40 * 07:59 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-40 * 07:59 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-39 * 07:58 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-41 * 07:58 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-41 * 07:58 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-39 * 07:57 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-38 * 07:57 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-40 * 07:56 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-40 * 07:56 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-38 * 07:56 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-37 * 07:55 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-39 * 07:55 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-39 * 07:55 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-37 * 07:54 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-36 * 07:54 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-38 * 07:53 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-38 * 07:53 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-36 * 07:41 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-35 * 07:40 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-37 * 07:40 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-37 * 07:40 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-35 * 07:39 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-34 * 07:37 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-36 * 07:37 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-36 * 07:37 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-34 * 07:33 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-35 * 07:33 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-33 * 07:33 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-35 * 07:32 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-34 * 07:31 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-34 * 07:31 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-33 * 07:30 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-33 * 07:29 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-33 === 2024-06-24 === * 20:56 andrewbogott: rebooting tools-k8s-worker-nfs-36; it has lots of stuck processes which somehow didn't get unstuck when we did the post-nfs-migration reboots. * 15:55 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-32 * 15:53 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-32 * 15:52 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-31 * 15:52 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-32 * 15:51 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-31 * 15:51 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-32 * 15:49 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-30 * 15:49 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-31 * 15:48 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-31 * 15:48 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-30 * 15:47 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-29 * 15:47 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-30 * 15:46 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-30 * 15:46 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-29 * 15:45 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-28 * 15:45 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-29 * 15:45 arturo: deploy toolforge-webservice 0.103.7 ([[phab:T362050|T362050]]) * 15:44 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-29 * 15:44 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-28 * 15:43 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-27 * 15:42 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-28 * 15:42 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-27 * 15:42 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-28 * 15:33 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-27 * 15:32 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-27 * 15:18 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for all NFS workers * 14:38 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-sgebastion-10 * 14:37 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-sgebastion-10 * 14:36 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-bastion-13 * 14:34 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-bastion-13 * 14:32 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-bastion-12 * 14:30 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-bastion-12 * 14:30 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for all NFS workers * 14:25 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-nfs-2 * 14:24 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-nfs-2 * 13:57 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=99) for server tools-nfs-2 * 13:57 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-nfs-2 * 13:50 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_dbinstance_to_ovs (exit_code=0) for server tbd * 13:43 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_dbinstance_to_ovs for server tbd * 13:42 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-26 * 13:41 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-26 * 13:41 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-25 * 13:39 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-25 * 13:39 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-26 * 13:39 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-24 * 13:39 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-26 * 13:37 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-25 * 13:37 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-24 * 13:37 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-25 * 13:35 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-23 * 13:34 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-24 * 13:34 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-23 * 13:34 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-24 * 13:30 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-22 * 13:29 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-22 * 13:28 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-21 * 13:27 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-23 * 13:26 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-23 * 13:26 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-21 * 13:25 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-20 * 13:25 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-22 * 13:24 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-22 * 13:24 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-20 * 13:23 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-21 * 13:23 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-19 * 13:23 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-21 * 13:21 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-19 * 13:21 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-18 * 13:19 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-18 * 13:19 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-20 * 13:18 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-17 * 13:18 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-20 * 13:17 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-19 * 13:17 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-19 * 13:17 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-18 * 13:16 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-18 * 13:16 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-17 * 13:16 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=99) for node tools-k8s-worker-nfs-17 * 13:16 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-17 * 13:15 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=99) for node tools-k8s-worker-nfs-17 * 13:15 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-17 * 13:12 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-16 * 13:09 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-16 * 12:59 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-15 * 12:59 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-16 * 12:58 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-16 * 12:58 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-15 * 12:52 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-14 * 12:52 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-15 * 12:51 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-15 * 12:51 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-14 * 12:46 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-13 * 12:45 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-14 * 12:45 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-14 * 12:45 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-13 * 12:39 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-12 * 12:37 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-13 * 12:37 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-13 * 12:37 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-12 * 12:36 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-11 * 12:35 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-12 * 12:35 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-11 * 12:35 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-12 * 12:34 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-prometheus-7 * 12:33 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-11 * 12:32 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-11 * 12:32 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-prometheus-7 * 12:26 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-control-8 * 12:24 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-control-8 * 12:15 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-ingress-8 * 12:13 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-ingress-8 * 12:12 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component kyverno * 12:12 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component kyverno * 12:06 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-static-15 * 12:05 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-static-15 * 12:03 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-acme-chief-4 * 12:02 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-acme-chief-4 * 12:00 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-10 * 11:58 taavi@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=97) for node tools-k8s-worker-nfs-10 * 11:58 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-10 * 11:57 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-10 * 11:56 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=99) for node tools-k8s-worker-nfs-10 * 11:50 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-10 * 11:49 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component kyverno * 11:48 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component kyverno * 11:44 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-9 * 11:42 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-9 * 11:41 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-8 * 11:41 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-9 * 11:40 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-8 * 11:40 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-9 * 11:40 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-8 * 11:40 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-8 * 11:38 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-7 * 11:37 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=99) for node tools-k8s-worker-nfs-8 * 11:37 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-7 * 11:37 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-8 * 11:36 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-7 * 11:36 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-7 * 11:35 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-6 * 11:33 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-6 * 11:33 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-5 * 11:32 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-5 * 11:32 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-6 * 11:31 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-4 * 11:31 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-6 * 11:31 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-5 * 11:30 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-4 * 11:30 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-5 * 11:30 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-4 * 11:29 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-4 * 11:26 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-3 * 11:25 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-3 * 11:24 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-2 * 11:23 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-2 * 11:23 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-1 * 11:21 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-1 * 11:21 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-3 * 11:20 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-3 * 11:20 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-2 * 11:20 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-2 * 11:19 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-1 * 11:19 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-1 * 11:17 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=99) for node tools-k8s-worker-nfs-1 * 11:17 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-1 * 10:30 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-redis-5 * 10:28 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-redis-5 * 10:20 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-docker-registry-7 * 10:19 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-docker-registry-7 * 10:17 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 10:17 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 10:13 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-ingress-7 * 10:11 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-43 * 10:11 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-ingress-7 * 10:09 fnegri@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-43 * 10:08 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-control-7 * 10:06 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-control-7 * 10:04 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-redis-7 * 10:03 fnegri@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=99) for node tools-k8s-worker-nfs-43 * 10:02 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-redis-7 * 10:01 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-redis-6 * 09:59 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-redis-6 * 09:58 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-43 * 09:53 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-cumin-1 * 09:52 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-cumin-1 * 09:51 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-haproxy-5 * 09:50 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-haproxy-5 * 09:49 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-harbor-1 * 09:47 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-harbor-1 * 09:46 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker role in the tools cluster * 09:46 taavi@cloudcumin1001: Added a new k8s worker tools-k8s-worker-107.tools.eqiad1.wikimedia.cloud to the cluster * 09:40 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-prometheus-6 * 09:39 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-prometheus-6 * 09:35 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster * 09:35 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-puppetserver-01 * 09:34 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-puppetserver-01 * 09:33 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-puppetdb-2 * 09:32 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-puppetdb-2 * 09:31 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-mail-4 * 09:30 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker role in the tools cluster * 09:30 taavi@cloudcumin1001: Added a new k8s worker tools-k8s-worker-106.tools.eqiad1.wikimedia.cloud to the cluster * 09:30 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-mail-4 * 09:30 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-legacy-redirector-2 * 09:28 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-legacy-redirector-2 * 09:27 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-imagebuilder-2 * 09:26 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-imagebuilder-2 * 09:25 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-proxy-8 * 09:24 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-proxy-8 * 09:24 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-services-05 * 09:23 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-services-05 * 09:22 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-package-builder-04 * 09:21 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-package-builder-04 * 09:21 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster * 09:21 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-docker-registry-8 * 09:20 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker role in the tools cluster * 09:20 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster * 09:19 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-docker-registry-8 * 09:19 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-checker-5 * 09:18 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker role in the tools cluster * 09:18 taavi@cloudcumin1001: Added a new k8s worker tools-k8s-worker-105.tools.eqiad1.wikimedia.cloud to the cluster * 09:18 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-checker-5 * 09:09 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster * 09:08 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker role in the tools cluster * 09:07 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster === 2024-06-20 === * 13:09 arturo: re-deploy kyverno [[phab:T368044|T368044]] * 12:56 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component kyverno * 12:55 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component kyverno * 09:19 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 09:19 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 09:08 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 09:08 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers === 2024-06-19 === * 10:32 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api * 10:31 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api * 10:11 arturo: merging k8s HAproxy change https://gerrit.wikimedia.org/r/c/operations/puppet/+/1047113 * 04:18 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 04:17 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 04:16 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api * 04:15 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api === 2024-06-14 === * 14:47 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 14:47 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 14:38 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 14:38 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 08:15 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 08:14 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 07:35 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 07:35 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api === 2024-06-12 === * 19:41 bd808: Rebuilding all shared Docker containers. This will among other things apply the fix for [[phab:T367345|T367345]]. * 17:21 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component registry-admission * 17:21 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component registry-admission * 17:19 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component registry-admission * 17:19 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component registry-admission * 16:52 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 16:28 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 16:28 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 15:24 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component kyverno * 15:24 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component kyverno * 15:03 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 15:03 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 13:52 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api * 13:52 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api * 13:45 taavi: hard reboot tools-k8s-control-7 * 12:00 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 12:00 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers === 2024-06-11 === * 17:34 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for all NFS workers * 16:42 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 16:41 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 16:41 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 16:38 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 16:38 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 16:31 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 15:51 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for all NFS workers * 15:50 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for all NFS workers * 15:50 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for all NFS workers * 11:35 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 11:35 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 11:12 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 11:12 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 10:57 dcaro: cleaning old maintain-kubeusers configmaps * 10:45 dcaro: cleaning up old resourcequotas === 2024-06-10 === * 09:45 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component kyverno * 09:45 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component kyverno === 2024-06-07 === * 10:10 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 10:09 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 09:59 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api * 09:58 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api === 2024-06-06 === * 14:21 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 14:21 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 14:13 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 14:13 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 12:46 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 12:46 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 10:06 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 10:05 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers === 2024-06-05 === * 16:05 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 16:05 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 13:27 dcaro: deploying toolforge-webservice 0.103.6 * 12:58 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api * 12:58 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api * 08:44 dcaro: deploying toolforge-jobs-framework-cli 16.0.10 on tools-bastion-13 * 08:41 dcaro: deploying toolforge-jobs-framework-cli 16.0.10 on tools-bastion-12 === 2024-06-04 === * 16:12 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 16:12 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 12:47 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 12:47 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 12:19 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 12:19 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 12:09 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 12:08 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 10:32 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 10:32 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 09:26 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 09:26 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 08:12 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 08:12 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api === 2024-06-03 === * 16:26 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 16:26 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 16:05 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 16:04 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 16:01 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 16:01 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 16:00 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 16:00 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 15:58 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 15:57 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 14:11 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 14:11 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 12:41 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 12:41 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 10:16 wmbot~arturo@nostromo: END (PASS) - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry (exit_code=0) * 10:15 wmbot~arturo@nostromo: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-reports-controller:v1.10.7 * 10:15 wmbot~arturo@nostromo: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-cleanup-controller:v1.10.7 * 10:14 wmbot~arturo@nostromo: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-background-controller:v1.10.7 * 10:14 wmbot~arturo@nostromo: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyvernopre:v1.10.7 * 10:14 wmbot~arturo@nostromo: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyverno:v1.10.7 * 10:14 wmbot~arturo@nostromo: START - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry * 10:13 wmbot~arturo@nostromo: END (FAIL) - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry (exit_code=99) * 10:13 wmbot~arturo@nostromo: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyverno:v1.10.7 * 10:13 wmbot~arturo@nostromo: START - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry * 09:37 wmbot~arturo@nostromo: END (FAIL) - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry (exit_code=99) * 09:37 wmbot~arturo@nostromo: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyverno:v1.10.7 * 09:37 wmbot~arturo@nostromo: START - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry * 09:29 wmbot~arturo@nostromo: END (FAIL) - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry (exit_code=99) * 09:29 wmbot~arturo@nostromo: START - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry * 09:29 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component api-gateway * 09:29 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component api-gateway * 09:29 wmbot~arturo@nostromo: END (FAIL) - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry (exit_code=99) * 09:28 wmbot~arturo@nostromo: START - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry * 09:13 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component api-gateway * 09:13 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component api-gateway * 08:43 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component volume-admission * 08:43 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component volume-admission === 2024-05-29 === * 16:14 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 16:13 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 02:59 wmbot~raymond@ubuntu: END (ERROR) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=97) for component envvars-api * 02:59 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api === 2024-05-28 === * 10:44 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 10:44 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api === 2024-05-27 === * 15:50 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 15:50 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 09:22 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-9 * 09:21 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-9 === 2024-05-25 === * 21:33 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api * 21:32 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api * 20:38 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 20:37 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers === 2024-05-23 === * 13:22 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 13:21 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api === 2024-05-22 === * 16:36 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-9 * 16:36 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-9 === 2024-05-15 === * 14:17 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-worker-nfs-9 ([[phab:T364822|T364822]]) * 14:16 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-9 ([[phab:T364822|T364822]]) * 14:11 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-worker-nfs-9 ([[phab:T364822|T364822]]) * 14:10 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-9 ([[phab:T364822|T364822]]) * 10:26 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 10:26 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers === 2024-05-14 === * 13:28 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component api-gateway * 13:28 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component api-gateway * 07:48 dcaro: draining tools-k8s-worker-nfs-9 as it's stuck on IO * 07:48 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=99) for node tools-k8s-worker-nfs-9 * 07:48 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-9 === 2024-05-07 === * 16:23 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 16:23 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 12:21 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 12:21 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api === 2024-05-06 === * 12:05 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 12:04 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 08:24 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 08:24 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 07:24 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api * 07:23 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api === 2024-05-05 === * 07:06 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component ingress-nginx * 07:06 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component ingress-nginx === 2024-05-03 === * 15:41 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 15:40 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 12:46 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 12:46 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 10:17 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 10:16 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers === 2024-04-30 === * 10:56 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 10:55 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder === 2024-04-26 === * 08:59 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 08:59 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 08:57 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 08:56 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers === 2024-04-25 === * 12:57 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 12:57 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 09:48 taavi: update pywikibot script image to v9.1.0 [[phab:T363132|T363132]] === 2024-04-24 === * 15:30 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 15:29 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder === 2024-04-18 === * 09:46 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 09:46 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api === 2024-04-17 === * 20:49 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-50 * 20:48 andrewbogott: In response to stuck processes (NFS?), running sudo cookbook wmcs.toolforge.k8s.reboot --hostname-list tools-k8s-worker-nfs-50 --cluster-name tools * 20:48 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-50 * 15:21 dcaro: swapped login.toolforge.org to point to tools-bastion-13 * 10:48 dcaro: rebooting tools-k8s-worker-nfs-1 === 2024-04-16 === * 11:08 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-1 * 11:07 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-1 * 08:54 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.apt.copy_to_main_repo (exit_code=0) for package 'python3-toolforge-weld' version '1.5.0' * 08:54 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.apt.copy_to_main_repo for package 'python3-toolforge-weld' version '1.5.0' === 2024-04-15 === * 20:34 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 20:33 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 18:28 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 18:27 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 14:15 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 14:15 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 13:43 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 13:42 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 13:38 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api * 13:38 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api * 11:03 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 11:03 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 10:59 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 10:59 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 09:03 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 09:02 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api === 2024-04-12 === * 10:14 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-admission * 10:14 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-admission * 09:35 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component volume-admission * 09:34 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component volume-admission * 09:27 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api * 09:27 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api * 01:19 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 01:18 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 01:18 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component calico * 01:17 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component ingress-admission * 01:17 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component calico * 01:17 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component api-gateway * 01:16 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component ingress-admission * 01:16 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component api-gateway * 01:15 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component registry-admission * 01:14 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component registry-admission * 01:13 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 01:12 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 01:11 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers === 2024-04-11 === * 08:42 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 08:41 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api === 2024-04-09 === * 17:21 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=0) * 17:12 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node * 17:11 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) * 17:03 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node * 16:57 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=0) * 16:47 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node * 14:23 wmbot~dcaro@urcuchillay: END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=255) * 14:23 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 14:23 wmbot~dcaro@urcuchillay: END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=255) * 14:22 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 14:22 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=99) * 14:22 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 14:11 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 14:11 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 13:43 dcaro: deployed builds-builder 0.0.94 and removed builds-admission * 13:39 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 13:38 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 12:21 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 12:21 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 12:19 dcaro: deploying toolforge-jobs-cli 16.0.6 === 2024-04-08 === * 16:35 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=0) * 16:24 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node * 16:21 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) * 16:11 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node * 16:09 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) * 16:09 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node * 15:07 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=0) * 14:49 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T349207|T349207]]) * 14:49 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=0) * 14:32 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T349207|T349207]]) * 14:32 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=0) * 14:16 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T349207|T349207]]) * 14:14 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-21 * 14:13 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-21 * 13:56 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=99) * 13:54 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T349207|T349207]]) * 13:53 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-56 * 13:53 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 13:52 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-56 * 13:51 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=99) * 13:49 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T349207|T349207]]) * 13:49 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=99) * 13:47 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T349207|T349207]]) * 13:45 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=99) * 13:43 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T349207|T349207]]) * 13:40 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=99) * 13:37 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T349207|T349207]]) * 13:37 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=99) * 13:35 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T349207|T349207]]) * 13:32 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 13:32 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 13:31 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=99) * 13:29 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 13:29 wmbot~dcaro@urcuchillay: END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=255) * 13:29 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 13:29 wmbot~dcaro@urcuchillay: END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=255) * 13:29 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T349207|T349207]]) * 13:29 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 13:29 wmbot~dcaro@urcuchillay: END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=255) * 13:28 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 13:24 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=99) * 13:19 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T349207|T349207]]) * 13:12 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=99) * 13:12 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T349207|T349207]]) * 10:26 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 10:26 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 08:55 dcaro_: deploy toolforge-jobs-framework-cli 16.0.5 === 2024-04-05 === * 12:15 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 12:15 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api === 2024-04-03 === * 15:01 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 15:00 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 14:59 wmbot~raymond@ubuntu: END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component builds-api * 14:59 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 14:58 wmbot~raymond@ubuntu: END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component builds-api * 14:58 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 14:57 wmbot~raymond@ubuntu: END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component builds-api * 14:57 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 14:49 wmbot~raymond@ubuntu: END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component builds-api * 14:49 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 14:37 wmbot~raymond@ubuntu: END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component builds-api * 14:37 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 11:24 wmbot~taavi@runko: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-proxy-06 * 11:24 wmbot~taavi@runko: START - Cookbook wmcs.vps.remove_instance for instance tools-proxy-06 * 11:23 wmbot~taavi@runko: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-proxy-06 * 11:23 wmbot~taavi@runko: START - Cookbook wmcs.vps.remove_instance for instance tools-proxy-06 * 11:21 wmbot~taavi@runko: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-proxy-06 * 11:21 wmbot~taavi@runko: START - Cookbook wmcs.vps.remove_instance for instance tools-proxy-06 * 09:45 taavi: rebuilding prebuild images for [[phab:T361457|T361457]] === 2024-04-02 === * 12:39 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-db-2 ([[phab:T344717|T344717]]) * 12:38 fnegri@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-db-2 ([[phab:T344717|T344717]]) * 07:54 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-docker-registry-05 * 07:54 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-docker-registry-05 === 2024-03-28 === * 14:27 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-proxy-05 * 14:26 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-proxy-05 * 13:45 taavi: migrating toolforge.org floating IP from tools-proxy-06 to tools-proxy-7 [[phab:T361223|T361223]] * 13:36 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.create_instance_with_prefix (exit_code=0) with prefix 'tools-proxy' * 13:30 taavi@cloudcumin1001: START - Cookbook wmcs.vps.create_instance_with_prefix with prefix 'tools-proxy' * 13:25 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.create_instance_with_prefix (exit_code=0) with prefix 'tools-proxy' * 13:19 taavi@cloudcumin1001: START - Cookbook wmcs.vps.create_instance_with_prefix with prefix 'tools-proxy' * 12:12 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-docker-registry-06 * 12:12 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-docker-registry-06 * 11:08 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.create_instance_with_prefix (exit_code=0) with prefix 'tools-docker-registry' * 11:02 taavi@cloudcumin1001: START - Cookbook wmcs.vps.create_instance_with_prefix with prefix 'tools-docker-registry' === 2024-03-27 === * 12:20 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance toolserver-proxy-01 * 12:19 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance toolserver-proxy-01 === 2024-03-26 === * 16:50 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-docker-registry-7.tools.eqiad1.wikimedia.cloud * 16:47 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-docker-registry-7.tools.eqiad1.wikimedia.cloud * 16:41 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=99) on tools-docker-registry-7.tools.eqiad1.wikimedia.cloud * 16:39 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-docker-registry-7.tools.eqiad1.wikimedia.cloud * 16:36 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.create_instance_with_prefix (exit_code=0) with prefix 'tools-docker-registry' * 16:33 taavi@cloudcumin1001: START - Cookbook wmcs.vps.create_instance_with_prefix with prefix 'tools-docker-registry' * 12:55 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-bastion-13.tools.eqiad1.wikimedia.cloud * 12:54 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-bastion-13.tools.eqiad1.wikimedia.cloud * 12:50 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.create_instance_with_prefix (exit_code=0) with prefix 'tools-bastion' * 12:45 taavi@cloudcumin1001: START - Cookbook wmcs.vps.create_instance_with_prefix with prefix 'tools-bastion' * 12:44 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-sgebastion-11 * 12:43 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-sgebastion-11 * 10:24 taavi: point toolserver.org DNS to tools-legacy-redirector-2 [[phab:T311909|T311909]] === 2024-03-25 === * 18:24 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-legacy-redirector * 18:23 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-legacy-redirector * 14:29 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-legacy-redirector-2.tools.eqiad1.wikimedia.cloud * 14:27 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-legacy-redirector-2.tools.eqiad1.wikimedia.cloud * 14:20 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=99) on tools-legacy-redirector-2.tools.eqiad1.wikimedia.cloud * 14:19 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-legacy-redirector-2.tools.eqiad1.wikimedia.cloud * 14:18 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=99) on tools-legacy-redirector-2.tools.eqiad1.wikimedia.cloud * 14:18 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-legacy-redirector-2.tools.eqiad1.wikimedia.cloud === 2024-03-22 === * 11:43 dcaro: restarted sssd on tools-prometheus-6 as it was stopped (error) === 2024-03-21 === * 15:47 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_haproxy_node (exit_code=0) for node tools-k8s-haproxy-4 * 15:46 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_haproxy_node for node tools-k8s-haproxy-4 * 15:44 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_haproxy_node (exit_code=0) for node tools-k8s-haproxy-3 * 15:43 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_haproxy_node for node tools-k8s-haproxy-3 * 15:42 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_haproxy_node (exit_code=99) for node toolsbeta-k8s-haproxy-3 * 15:42 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_haproxy_node for node toolsbeta-k8s-haproxy-3 * 15:42 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_haproxy_node (exit_code=0) * 15:35 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_haproxy_node * 12:23 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_haproxy_node (exit_code=0) * 12:17 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_haproxy_node === 2024-03-20 === * 13:35 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-checker-04 * 13:34 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-checker-04 * 12:30 taavi: move checker service address to tools-checker-5 * 11:24 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 11:24 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 10:49 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-checker-5.tools.eqiad1.wikimedia.cloud * 10:45 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-checker-5.tools.eqiad1.wikimedia.cloud * 10:40 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=99) on tools-checker-5.tools.eqiad1.wikimedia.cloud * 10:39 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-checker-5.tools.eqiad1.wikimedia.cloud * 10:37 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.create_instance_with_prefix (exit_code=0) with prefix 'tools-checker' * 10:34 taavi@cloudcumin1001: START - Cookbook wmcs.vps.create_instance_with_prefix with prefix 'tools-checker' * 10:33 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.vps.create_instance_with_prefix (exit_code=99) with prefix 'tools-checker' * 10:33 taavi@cloudcumin1001: START - Cookbook wmcs.vps.create_instance_with_prefix with prefix 'tools-checker' * 10:32 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) * 10:32 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.quota_increase * 10:22 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.vps.create_instance_with_prefix (exit_code=99) with prefix 'tools-checker' * 10:21 taavi@cloudcumin1001: START - Cookbook wmcs.vps.create_instance_with_prefix with prefix 'tools-checker' === 2024-03-19 === * 21:28 taavi: kick off full container image rebuild for https://gerrit.wikimedia.org/r/1012753 (python3 backwards compat in lighttpd images) and https://gerrit.wikimedia.org/r/1010690 (add procps to base images) * 11:22 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-static-14 * 11:21 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-static-14 * 11:19 taavi: point dev.toolforge.org to tools-bastion-12 [[phab:T314665|T314665]] * 10:26 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 10:25 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 09:38 dcaro: pushed docker-registry.tools.wmflabs.org/cloud-cicd-py311bookworm-tox:latest and docker-registry.tools.wmflabs.org/cloud-cicd-debian-builder-bookworm:2024-03-24.1 ([[phab:T360405|T360405]]) === 2024-03-18 === * 13:32 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-ingress-9 from 1.23.17 to 1.24.17 ([[phab:T359638|T359638]]) * 13:31 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-ingress-9 from 1.23.17 to 1.24.17 ([[phab:T359638|T359638]]) * 13:31 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-ingress-7 from 1.23.17 to 1.24.17 ([[phab:T359638|T359638]]) * 13:30 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-ingress-7 from 1.23.17 to 1.24.17 ([[phab:T359638|T359638]]) * 13:30 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-ingress-8 from 1.23.17 to 1.24.17 ([[phab:T359638|T359638]]) * 13:29 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-ingress-8 from 1.23.17 to 1.24.17 ([[phab:T359638|T359638]]) * 13:14 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-104 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 13:13 taavi: restart harbor services after docker service restart * 13:13 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-104 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 13:13 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-103 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 13:12 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-103 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 13:12 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-102 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 13:11 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-102 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 13:03 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-56 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 13:02 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-56 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 13:02 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-55 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 13:01 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-55 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 13:01 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-54 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 13:00 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-54 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 13:00 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-53 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:59 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-53 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:59 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-52 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:58 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-52 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:58 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-51 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:57 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-51 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:57 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-50 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:56 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-50 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:56 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-49 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:55 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-49 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:54 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-48 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:53 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-48 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:53 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-47 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:52 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-47 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:52 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-46 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:51 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-46 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:51 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-45 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:50 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-45 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:50 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-44 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:49 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-44 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:49 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-43 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:48 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-43 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:48 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-42 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:47 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-42 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:47 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-41 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:46 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-41 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:45 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-21 from 1.23.17 to 1.24.17 ([[phab:T359638|T359638]]) * 12:44 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-21 from 1.23.17 to 1.24.17 ([[phab:T359638|T359638]]) * 12:36 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-40 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:35 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-40 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:35 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-39 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:34 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-39 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:34 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-38 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:33 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-filesystemtest-1 * 12:33 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-38 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:33 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-37 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:32 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-filesystemtest-1 * 12:32 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-37 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:32 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-36 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:31 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-36 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:31 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-35 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:30 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-35 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:29 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-34 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:28 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-34 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:28 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-33 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:27 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-33 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:27 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-32 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:26 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-32 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:26 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-31 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:25 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-31 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:25 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-30 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:24 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-30 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:24 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-29 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:23 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-29 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:23 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-28 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:22 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-28 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:22 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-27 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:21 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-27 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:21 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-26 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:20 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-26 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:20 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-25 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:19 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-25 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:19 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-24 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:18 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-24 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:18 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-23 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:17 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-acme-chief-4.tools.eqiad1.wikimedia.cloud * 12:15 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-23 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:15 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-22 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:14 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-acme-chief-4.tools.eqiad1.wikimedia.cloud * 12:11 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-22 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:11 aborrero@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-21 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:05 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-21 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:04 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-acme-chief-3.tools.eqiad1.wikimedia.cloud * 12:04 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-5 from 1.23.17 to 1.24.17 ([[phab:T359638|T359638]]) * 12:03 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-5 from 1.23.17 to 1.24.17 ([[phab:T359638|T359638]]) * 12:01 aborrero@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-5 from 1.23.17 to 1.24.17 ([[phab:T359638|T359638]]) * 12:01 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-acme-chief-3.tools.eqiad1.wikimedia.cloud * 12:00 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=99) on tools-acme-chief-3.tools.eqiad1.wikimedia.cloud * 12:00 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-acme-chief-3.tools.eqiad1.wikimedia.cloud * 11:56 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-5 from 1.23.17 to 1.24.17 ([[phab:T359638|T359638]]) * 11:55 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-20 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:54 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-20 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:54 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-19 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:53 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-19 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:53 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-18 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:52 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-18 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:52 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-17 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:51 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-17 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:51 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-16 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:50 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-16 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:50 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-15 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:49 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-15 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:49 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-14 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:48 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-14 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:48 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-13 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:47 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-13 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:47 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-12 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:46 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-12 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:46 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-11 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:45 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-11 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:45 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-10 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:43 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-10 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:43 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-9 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:42 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-9 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:42 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-8 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:41 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-8 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:41 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-7 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:40 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-7 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:40 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-6 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:39 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-6 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:39 aborrero@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-5 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:33 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-5 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:33 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-4 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:32 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-4 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:32 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-3 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:31 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-3 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:31 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-2 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:30 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-2 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:30 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-1 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:29 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-1 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:23 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-9 from 1.23.17 to 1.24.17 ([[phab:T359638|T359638]]) * 11:23 taavi: point tools-static proxy to tools-static-15 (bookworm) [[phab:T311913|T311913]] * 11:19 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-static-15.tools.eqiad1.wikimedia.cloud * 11:17 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-static-15.tools.eqiad1.wikimedia.cloud * 11:17 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=99) on tools-static-15.tools.eqiad1.wikimedia.cloud * 11:17 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-static-15.tools.eqiad1.wikimedia.cloud * 11:17 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-9 from 1.23.17 to 1.24.17 ([[phab:T359638|T359638]]) * 11:13 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-8 from 1.23.17 to 1.24.17 ([[phab:T359638|T359638]]) * 11:08 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-8 from 1.23.17 to 1.24.17 ([[phab:T359638|T359638]]) * 11:01 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component wmcs-k8s-metrics * 11:00 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component wmcs-k8s-metrics * 11:00 aborrero@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=97) for component jobs-api * 11:00 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 11:00 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-7 from 1.23.17 to 1.24.17 ([[phab:T359638|T359638]]) * 10:53 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-7 from 1.23.17 to 1.24.17 ([[phab:T359638|T359638]]) * 10:53 aborrero@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-7 from 1.23.17 to 1.24.17 ([[phab:T359638|T359638]]) * 10:53 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-7 from 1.23.17 to 1.24.17 ([[phab:T359638|T359638]]) * 10:47 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.prepare_upgrade (exit_code=0) for cluster tools upgrade from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 10:46 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.prepare_upgrade for cluster tools upgrade from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 10:04 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=99) on tools-bastion-12.tools.eqiad1.wikimedia.cloud * 10:03 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-bastion-12.tools.eqiad1.wikimedia.cloud * 09:27 taavi: deleted shutdown grid engine VMs [[phab:T314664|T314664]] === 2024-03-15 === * 10:50 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 10:50 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api === 2024-03-14 === * 17:26 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.apt.copy_to_main_repo (exit_code=0) for package 'misctools' version '1.48' * 17:26 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.apt.copy_to_main_repo for package 'misctools' version '1.48' * 15:16 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-docker-imagebuilder-01 * 15:16 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-docker-imagebuilder-01 * 15:11 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.vps.remove_instance (exit_code=99) for instance tools-docker-imagebuilder-01 * 15:11 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-docker-imagebuilder-01 * 15:10 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.vps.remove_instance (exit_code=99) for instance tools-docker-imagebuilder-01 * 15:09 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-docker-imagebuilder-01 * 11:02 taavi: stop grid related VMs [[phab:T314664|T314664]] * 11:01 taavi: disable grid access for remaining tools still running on the grid [[phab:T314664|T314664]] === 2024-03-13 === * 19:21 andrewbogott: shutting down old puppet infra: tools-puppetmaster-02 and tools-puppetdb-1. These can be deleted in a week or two presuming everything remains stable. === 2024-03-12 === * 12:38 taavi: hard reboot tools-prometheus-6 * 11:50 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 11:50 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers === 2024-03-11 === * 16:46 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component wmcs-k8s-metrics * 16:46 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component wmcs-k8s-metrics * 13:20 arturo: cached registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.6.0 as docker-registry.tools.wmflabs.org/kube-state-metrics:v2.6.0 in the docker registry for [[phab:T359798|T359798]] === 2024-03-09 === * 12:48 taavi: hard reboot tools-sgebastion-10 due to stuck NFS procs === 2024-03-08 === * 12:02 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 12:02 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers === 2024-03-07 === * 14:33 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 14:32 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 13:42 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 13:41 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api === 2024-03-06 === * 10:48 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_grid_node for tools-sgeweblight-10-32 * 10:47 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_grid_node (exit_code=1) for tools-sgeweblight-10-17, tools-sgeweblight-10-32 * 10:47 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_grid_node for tools-sgeweblight-10-17, tools-sgeweblight-10-32 * 10:34 taavi: rebuilding all docker images for https://gerrit.wikimedia.org/r/c/operations/docker-images/toollabs-images/+/1005952 ([[phab:T293552|T293552]]) + normal package updates * 09:43 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.grid.cleanup_queue_errors (exit_code=0) * 09:43 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.grid.cleanup_queue_errors * 09:42 taavi: reboot tools-sgeexec-10-20, -21, -23, sgeweblight-10-32 due to stuck nfs procs === 2024-03-05 === * 16:12 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-imagebuilder-2.tools.eqiad1.wikimedia.cloud * 16:11 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-imagebuilder-2.tools.eqiad1.wikimedia.cloud * 16:09 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 16:09 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 16:07 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) * 16:07 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.quota_increase * 16:06 taavi@cloudcumin1001: END (ERROR) - Cookbook wmcs.openstack.quota_increase (exit_code=97) ([[phab:T357901|T357901]]) * 16:06 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.quota_increase ([[phab:T357901|T357901]]) * 16:05 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=99) on tools-imagebuilder-2.tools.eqiad1.wikimedia.cloud * 16:04 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-imagebuilder-2.tools.eqiad1.wikimedia.cloud === 2024-03-04 === * 17:56 bd808@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 17:56 bd808@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 16:57 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 16:57 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 12:43 taavi: reboot tools-sgegrid-shadow due to high number of procs in D state === 2024-03-03 === * 10:38 dcaro: reboot tools-k8s-worker-nfs-55 got nfs lockup (logrotate in D state) === 2024-03-01 === * 21:14 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 21:14 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api === 2024-02-29 === * 14:36 dcaro: deploy webservice 0.103.3 === 2024-02-28 === * 11:57 dcaro: deploy tools-webservice 0.103.2 with probes ([[phab:T341919|T341919]]) * 00:46 bd808@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 00:46 bd808@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers === 2024-02-26 === * 09:54 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) ([[phab:T284656|T284656]]) * 09:54 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node ([[phab:T284656|T284656]]) * 09:35 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a control role in the tools cluster * 09:35 aborrero@cloudcumin1001: Added a new k8s control tools-k8s-control-9.tools.eqiad1.wikimedia.cloud to the cluster * 09:26 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a control role in the tools cluster ([[phab:T284656|T284656]]) === 2024-02-23 === * 14:19 taavi: remove isc-dhcp-server (server, not client) from tools-db-2 * 13:32 taavi: remove toolschecker alerts for grid engine jobs [[phab:T358333|T358333]] === 2024-02-22 === * 14:26 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api * 14:26 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api * 14:24 sstefanova@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component builds-api * 14:24 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 14:17 sstefanova@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component builds-api * 14:17 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 14:07 sstefanova@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component envvars-api * 14:07 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api * 14:03 sstefanova@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component envvars-api * 14:03 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api * 11:23 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) ([[phab:T284656|T284656]]) * 11:23 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node ([[phab:T284656|T284656]]) * 11:15 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker role in the tools cluster * 11:15 taavi@cloudcumin1001: Added a new k8s worker tools-k8s-worker-104.tools.eqiad1.wikimedia.cloud to the cluster * 11:06 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster * 10:52 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 10:51 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 09:39 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a control role in the tools cluster * 09:39 aborrero@cloudcumin1001: Added a new k8s control tools-k8s-control-8.tools.eqiad1.wikimedia.cloud to the cluster * 09:29 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a control role in the tools cluster ([[phab:T284656|T284656]]) * 08:04 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-51 * 08:03 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-51 * 08:03 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-38 * 08:03 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-38 * 08:02 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-25 * 08:02 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-25 === 2024-02-21 === * 17:07 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 17:07 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 15:48 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api * 15:48 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api * 14:41 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 14:40 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 14:34 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 14:34 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 14:21 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 14:20 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 09:40 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-control-4 * 09:39 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-control-4 * 09:20 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a control role in the tools cluster * 09:20 taavi@cloudcumin1001: Added a new k8s control tools-k8s-control-7.tools.eqiad1.wikimedia.cloud to the cluster * 09:10 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a control role in the tools cluster === 2024-02-20 === * 16:12 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker role in the tools cluster * 16:12 taavi@cloudcumin1001: Added a new k8s worker tools-k8s-worker-103.tools.eqiad1.wikimedia.cloud to the cluster * 16:05 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-102 * 16:05 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-102 * 16:03 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster * 15:50 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-101 * 15:50 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-101 * 15:49 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker role in the tools cluster * 15:48 taavi@cloudcumin1001: Added a new k8s worker tools-k8s-worker-102.tools.eqiad1.wikimedia.cloud to the cluster * 15:40 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster * 15:39 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-102 * 15:39 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-102 * 15:38 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker role in the tools cluster * 15:38 taavi@cloudcumin1001: Added a new k8s worker tools-k8s-worker-102.tools.eqiad1.wikimedia.cloud to the cluster * 15:29 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster * 15:23 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-k8s-worker-nfs-51.tools.eqiad1.wikimedia.cloud * 15:21 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-k8s-worker-nfs-51.tools.eqiad1.wikimedia.cloud * 12:57 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 12:57 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-56.tools.eqiad1.wikimedia.cloud to the cluster * 12:47 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 12:47 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-100 * 12:46 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-100 * 12:40 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 12:40 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-55.tools.eqiad1.wikimedia.cloud to the cluster * 12:30 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 12:30 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-99 * 12:29 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-99 * 12:29 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 12:29 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-54.tools.eqiad1.wikimedia.cloud to the cluster * 12:20 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 12:19 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-98 * 12:19 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-98 * 12:18 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 12:18 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-53.tools.eqiad1.wikimedia.cloud to the cluster * 12:09 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 12:06 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-97 * 12:05 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-97 * 11:56 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 11:56 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-52.tools.eqiad1.wikimedia.cloud to the cluster * 11:45 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 11:43 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-96 * 11:43 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-96 * 11:36 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 11:36 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-51.tools.eqiad1.wikimedia.cloud to the cluster * 11:26 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 11:26 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 11:26 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-50.tools.eqiad1.wikimedia.cloud to the cluster * 11:16 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 11:16 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 11:16 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-49.tools.eqiad1.wikimedia.cloud to the cluster * 11:05 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 11:05 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-95 * 11:04 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-95 * 10:58 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-94 * 10:57 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-94 * 10:57 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-93 * 10:56 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-93 * 10:56 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 10:56 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-48.tools.eqiad1.wikimedia.cloud to the cluster * 10:45 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 10:45 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-92 * 10:44 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-92 * 09:53 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-ingress-6 * 09:52 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-ingress-6 * 09:46 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a ingress role in the tools cluster * 09:46 taavi@cloudcumin1001: Added a new k8s ingress tools-k8s-ingress-9.tools.eqiad1.wikimedia.cloud to the cluster * 09:41 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 09:41 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-47.tools.eqiad1.wikimedia.cloud to the cluster * 09:37 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a ingress role in the tools cluster * 09:31 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 09:30 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-91 * 09:29 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-91 * 09:15 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 09:15 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-46.tools.eqiad1.wikimedia.cloud to the cluster * 09:05 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 09:02 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 09:00 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 08:59 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-90 * 08:59 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-90 * 08:57 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 08:57 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-45.tools.eqiad1.wikimedia.cloud to the cluster * 08:48 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 08:47 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-89 * 08:47 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-89 * 08:47 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 08:47 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-44.tools.eqiad1.wikimedia.cloud to the cluster * 08:38 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 08:37 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-88 * 08:36 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-88 === 2024-02-19 === * 19:04 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 19:03 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 13:17 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-ingress-5 * 13:16 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-ingress-5 * 13:09 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 13:09 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-43.tools.eqiad1.wikimedia.cloud to the cluster * 12:59 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 12:58 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-87 * 12:58 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-87 * 12:56 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 12:56 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-42.tools.eqiad1.wikimedia.cloud to the cluster * 12:46 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 12:45 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-86 * 12:44 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-86 * 12:44 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 12:44 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-41.tools.eqiad1.wikimedia.cloud to the cluster * 12:34 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 12:33 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) ([[phab:T357901|T357901]]) * 12:33 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.quota_increase ([[phab:T357901|T357901]]) * 12:33 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-k8s-worker-nfs-38.tools.eqiad1.wikimedia.cloud * 12:32 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-k8s-worker-nfs-38.tools.eqiad1.wikimedia.cloud * 12:24 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 12:23 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 12:20 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-85 * 12:19 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-85 * 12:18 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 12:18 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-40.tools.eqiad1.wikimedia.cloud to the cluster * 12:08 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 12:06 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-84 * 12:05 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-84 * 12:04 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 12:04 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-39.tools.eqiad1.wikimedia.cloud to the cluster * 11:54 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 11:53 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-83 * 11:53 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-83 * 11:50 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 11:50 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-38.tools.eqiad1.wikimedia.cloud to the cluster * 11:40 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 11:40 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-82 * 11:39 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-82 * 11:39 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 11:39 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-37.tools.eqiad1.wikimedia.cloud to the cluster * 11:28 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 11:28 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-81 * 11:27 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-81 * 09:03 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 09:03 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 08:57 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 08:57 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers === 2024-02-16 === * 15:28 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 15:27 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 12:21 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a ingress role in the tools cluster * 12:21 taavi@cloudcumin1001: Added a new k8s ingress tools-k8s-ingress-8.tools.eqiad1.wikimedia.cloud to the cluster * 12:14 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a ingress role in the tools cluster * 10:37 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 10:32 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 10:32 wmbot~dcaro@urcuchillay: END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=255) * 10:31 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 10:31 wmbot~dcaro@urcuchillay: END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=255) * 10:31 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 09:59 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 09:59 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-36.tools.eqiad1.wikimedia.cloud to the cluster * 09:49 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 09:49 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-80 * 09:49 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-80 * 09:45 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 09:45 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-35.tools.eqiad1.wikimedia.cloud to the cluster * 09:35 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 09:35 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-79 * 09:34 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-79 * 09:24 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 09:24 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-34.tools.eqiad1.wikimedia.cloud to the cluster * 09:13 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 09:06 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-78 * 09:05 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-78 * 09:05 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 09:05 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-33.tools.eqiad1.wikimedia.cloud to the cluster * 08:55 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 08:55 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-77 * 08:54 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-77 === 2024-02-15 === * 13:03 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-ingress-4 * 13:03 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-ingress-4 * 13:02 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 13:02 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-32.tools.eqiad1.wikimedia.cloud to the cluster * 12:51 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 12:51 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-76 * 12:50 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-76 * 12:44 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 12:44 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-31.tools.eqiad1.wikimedia.cloud to the cluster * 12:34 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 12:34 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-75 * 12:33 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-75 * 11:37 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a ingress role in the tools cluster * 11:37 taavi@cloudcumin1001: Added a new k8s ingress tools-k8s-ingress-7.tools.eqiad1.wikimedia.cloud to the cluster * 11:30 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a ingress role in the tools cluster * 11:30 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-k8s-ingress-7 * 11:29 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-k8s-ingress-7 * 11:29 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a ingress role in the tools cluster * 11:24 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a ingress role in the tools cluster === 2024-02-14 === * 19:32 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_grid_node for tools-sgeweblight-10-17, tools-sgeweblight-10-30 * 16:35 taavi: kill jobs user 'wikishizhao' is running directly on the grid per https://wikitech.wikimedia.org/wiki/Help:Toolforge/Rules #3 * 16:30 taavi: reboot tools-sgeexec-10-23 due to high load * 09:14 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-k8s-worker-nfs-25.tools.eqiad1.wikimedia.cloud * 09:13 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-k8s-worker-nfs-25.tools.eqiad1.wikimedia.cloud * 09:13 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 09:07 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 09:07 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 09:07 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-30.tools.eqiad1.wikimedia.cloud to the cluster * 08:56 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 08:56 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-74 * 08:55 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-74 * 08:54 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 08:54 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-29.tools.eqiad1.wikimedia.cloud to the cluster * 08:44 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 08:44 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-73 * 08:43 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-73 * 08:43 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 08:43 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-28.tools.eqiad1.wikimedia.cloud to the cluster * 08:33 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 08:33 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-72 * 08:32 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-72 * 08:32 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 08:32 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-27.tools.eqiad1.wikimedia.cloud to the cluster * 08:23 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 08:22 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-71 * 08:22 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-71 * 08:21 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 08:21 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-26.tools.eqiad1.wikimedia.cloud to the cluster * 08:09 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 08:08 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-70 * 08:07 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-70 * 08:05 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 08:05 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-25.tools.eqiad1.wikimedia.cloud to the cluster * 07:56 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 07:54 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-69 * 07:54 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-69 * 07:53 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 07:53 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-24.tools.eqiad1.wikimedia.cloud to the cluster * 07:44 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 07:43 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-68 * 07:43 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-68 === 2024-02-13 === * 15:42 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-67 * 15:41 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-67 * 15:41 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 15:41 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-23.tools.eqiad1.wikimedia.cloud to the cluster * 15:31 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 15:31 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-66 * 15:30 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-66 * 15:30 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 15:30 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-22.tools.eqiad1.wikimedia.cloud to the cluster * 15:19 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 15:17 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-65 * 15:17 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-65 * 09:36 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 09:36 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-21.tools.eqiad1.wikimedia.cloud to the cluster * 09:26 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 09:26 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-64 * 09:25 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-64 === 2024-02-12 === * 14:58 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 14:58 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-20.tools.eqiad1.wikimedia.cloud to the cluster * 14:48 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 14:48 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-62 * 14:47 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-62 * 14:47 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 14:47 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-19.tools.eqiad1.wikimedia.cloud to the cluster * 14:35 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 14:26 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-61 * 14:26 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-61 * 13:47 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-60 * 13:46 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-60 * 13:43 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 13:43 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-18.tools.eqiad1.wikimedia.cloud to the cluster * 13:35 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 13:34 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-59 * 13:33 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-59 * 13:33 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-58 * 13:32 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-58 * 13:22 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 13:22 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-17.tools.eqiad1.wikimedia.cloud to the cluster * 13:12 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 13:10 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-57 * 13:10 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-57 * 13:10 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-56 * 13:09 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-56 * 13:09 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 13:09 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-16.tools.eqiad1.wikimedia.cloud to the cluster * 12:59 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 12:59 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-55 * 12:58 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-55 * 12:58 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-54 * 12:57 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-54 * 12:56 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 12:56 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-15.tools.eqiad1.wikimedia.cloud to the cluster * 12:46 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 12:46 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-k8s-worker-nfs-15 * 12:45 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-k8s-worker-nfs-15 * 12:44 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 12:37 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 12:37 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-53 * 12:36 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-53 * 12:36 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-52 * 12:35 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-52 * 10:51 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 10:50 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 10:33 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 10:33 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers === 2024-02-11 === * 11:39 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.grid.cleanup_queue_errors (exit_code=0) * 11:39 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.grid.cleanup_queue_errors === 2024-02-09 === * 18:03 andrewbogott: updated the default security group, removing the 0.0.0.0/0 rule allowing port 22 access everywhere, replaced it with a 172.16.0.0/21 rule * 13:06 taavi: reboot tools-sgecron-2 due to high load * 10:34 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component image-config * 10:34 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component image-config * 09:56 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 09:56 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-14.tools.eqiad1.wikimedia.cloud to the cluster * 09:47 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 09:47 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-51 * 09:46 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-51 * 09:46 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-50 * 09:46 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-50 * 08:56 dcaro: restart tools-k8s-worker-50 due to D some stuck processes === 2024-02-08 === * 13:03 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.grid.cleanup_queue_errors (exit_code=0) * 13:03 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.grid.cleanup_queue_errors * 09:46 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 09:46 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-13.tools.eqiad1.wikimedia.cloud to the cluster * 09:35 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 09:34 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-49 * 09:33 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-49 * 09:33 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-48 * 09:33 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-48 * 09:32 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 09:32 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-12.tools.eqiad1.wikimedia.cloud to the cluster * 09:23 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 09:22 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-47 * 09:22 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-47 * 09:22 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-46 * 09:21 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-46 * 09:21 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 09:21 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-11.tools.eqiad1.wikimedia.cloud to the cluster * 09:13 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 09:11 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-45 * 09:11 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-45 * 09:10 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-44 * 09:10 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-44 * 09:10 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 09:10 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-10.tools.eqiad1.wikimedia.cloud to the cluster * 09:00 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 08:59 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 08:58 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 08:58 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-43 * 08:57 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-43 * 08:57 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-42 * 08:56 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-42 === 2024-02-07 === * 21:33 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for all workers * 18:00 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for all workers * 17:58 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-9 * 17:58 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-9 * 17:24 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for all workers * 17:23 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for all workers * 17:05 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for all workers * 17:05 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for all workers * 17:03 taavi@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=97) for all workers * 17:02 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for all workers * 17:01 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for all workers * 16:04 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for all workers === 2024-02-06 === * 13:09 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for all nodes ([[phab:T356507|T356507]]) * 11:50 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 11:50 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 11:16 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for all nodes ([[phab:T356507|T356507]]) === 2024-01-31 === * 14:13 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 14:12 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api === 2024-01-30 === * 19:24 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 19:24 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-9.tools.eqiad1.wikimedia.cloud to the cluster * 19:17 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 19:16 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-k8s-worker-nfs-9 * 19:16 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-k8s-worker-nfs-9 * 19:16 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 19:13 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 19:12 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 19:12 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-8.tools.eqiad1.wikimedia.cloud to the cluster * 19:04 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 19:04 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-k8s-worker-nfs-8 * 19:03 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-k8s-worker-nfs-8 * 18:51 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 18:48 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 18:48 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-k8s-worker-nfs-8 * 18:47 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-k8s-worker-nfs-8 * 18:46 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 18:42 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 18:41 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 18:41 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-7.tools.eqiad1.wikimedia.cloud to the cluster * 18:33 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 18:29 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-41 * 18:29 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-41 * 18:24 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-40 * 18:23 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-40 * 18:22 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-39 * 18:22 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-39 * 18:18 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-38 * 18:17 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-38 * 18:09 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-37 * 18:08 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-37 * 15:16 dcaro: restart harbor now that the db is clean ([[phab:T356037|T356037]]) * 15:14 dcaro: restart harbor now that the db is clean ([[phab:T3543|T3543]]) * 13:08 taavi: create no-op DMARC record [[phab:T354112|T354112]] * 12:39 dcaro: rebuilding all the toolforge images ([[phab:T354320|T354320]]) * 10:16 dcaro: restarting harbor and flushing redis to regenerate cache data ([[phab:T356037|T356037]]) * 09:33 dcaro: cleaning up old schedules on harbor ([[phab:T356037|T356037]]) === 2024-01-29 === * 19:46 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-36 * 19:46 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=99) for host tools-k8s-worker-36 * 19:46 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-36 * 14:36 wmbot~taavi@runko: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-mail-4.tools.eqiad1.wikimedia.cloud * 14:34 wmbot~taavi@runko: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-mail-4.tools.eqiad1.wikimedia.cloud * 12:06 wmbot~taavi@runko: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 12:06 wmbot~taavi@runko: Added a new k8s worker-nfs tools-k8s-worker-nfs-6.tools.eqiad1.wikimedia.cloud to the cluster * 11:55 wmbot~taavi@runko: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 11:51 wmbot~taavi@runko: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 11:51 wmbot~taavi@runko: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 11:37 wmbot~taavi@runko: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 11:37 wmbot~taavi@runko: Added a new k8s worker-nfs tools-k8s-worker-nfs-5.tools.eqiad1.wikimedia.cloud to the cluster * 11:26 wmbot~taavi@runko: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 11:23 wmbot~taavi@runko: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 11:22 wmbot~taavi@runko: Added a new k8s worker-nfs tools-k8s-worker-nfs-4.tools.eqiad1.wikimedia.cloud to the cluster * 11:12 wmbot~taavi@runko: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 11:12 wmbot~taavi@runko: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-35 * 11:10 wmbot~taavi@runko: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-35 * 11:10 wmbot~taavi@runko: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-34 * 11:09 wmbot~taavi@runko: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-34 * 11:09 wmbot~taavi@runko: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-33 * 11:07 wmbot~taavi@runko: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-33 * 11:06 wmbot~taavi@runko: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-32 * 11:04 wmbot~taavi@runko: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-32 * 11:01 wmbot~taavi@runko: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-31 * 10:59 wmbot~taavi@runko: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-30 * 10:57 wmbot~taavi@runko: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 10:56 wmbot~taavi@runko: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 10:51 wmbot~taavi@runko: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 10:51 wmbot~taavi@runko: Added a new k8s worker-nfs tools-k8s-worker-nfs-3.tools.eqiad1.wikimedia.cloud to the cluster * 10:46 blancadesal: increased harbor quota for wd-shex-infer to 2GiB * 10:44 blancadesal: increased harbor quota for lucaswerkmeister-test to 2GiB * 10:31 wmbot~taavi@runko: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 10:31 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.grid.cleanup_queue_errors (exit_code=0) * 10:31 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.grid.cleanup_queue_errors === 2024-01-26 === * 10:56 taavi: copy helmfile_0.144.0-1_all to bookworm-tools, bookworm-toolsbeta === 2024-01-25 === * 13:17 wmbot~taavi@runko: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 13:04 wmbot~taavi@runko: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 11:13 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 11:12 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder === 2024-01-24 === * 09:54 dcaro: deploy toolforge-jobs-framework-cli 16.0.1 === 2024-01-23 === * 19:11 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component wmcs-k8s-metrics * 19:11 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component wmcs-k8s-metrics * 14:51 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component wmcs-k8s-metrics * 14:51 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component wmcs-k8s-metrics * 14:43 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component wmcs-k8s-metrics * 14:43 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component wmcs-k8s-metrics * 13:31 taavi: rebooting tools-sgeexec-10-21, tools-sgeexec-10-22 * 12:58 dcaro: deployed toolforge-envvars-cli 0.0.4 * 10:23 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 10:23 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder === 2024-01-19 === * 15:40 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 15:40 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 12:11 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 12:10 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api === 2024-01-18 === * 12:24 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster * 12:21 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_grid_node for tools-sgeexec-10-17 === 2024-01-17 === * 18:16 dhinus: increase volume quotas for toolsdb [[phab:T344717|T344717]] * 18:14 fnegri@cloudcumin1001: END (FAIL) - Cookbook wmcs.openstack.quota_increase (exit_code=99) ([[phab:T344717|T344717]]) * 18:14 fnegri@cloudcumin1001: START - Cookbook wmcs.openstack.quota_increase ([[phab:T344717|T344717]]) * 14:34 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 14:34 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 08:56 taavi: update all pre-built docker images [[phab:T352886|T352886]] === 2024-01-15 === * 09:18 taavi: reboot stuck tools-k8s-worker-84 === 2024-01-12 === * 09:07 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.apt.copy_to_main_repo (exit_code=0) for package 'toolforge-builds-cli' version '0.0.12' * 09:07 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.apt.copy_to_main_repo for package 'toolforge-builds-cli' version '0.0.12' === 2024-01-11 === * 17:30 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 17:12 wm-bot2: dcaro@urcuchillay END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 17:12 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 15:14 wm-bot2: dcaro@urcuchillay END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 15:13 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder === 2024-01-10 === * 22:02 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 22:02 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 09:17 taavi: reboot tools-k8s-worker-98 === 2024-01-09 === * 23:37 andrewbogott: restarting harbor-db in an attempt to reform harbor -- [[phab:T354714|T354714]] * 23:30 andrewbogott: rebooting tools-harbor-1 in a feeble attempt to get it to work (docker-compose can't restart it) * 23:12 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component builds-builder * 23:12 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 23:11 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component builds.builder * 23:11 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds.builder * 17:31 wm-bot2: dcaro@urcuchillay END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 17:30 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 10:13 taavi: reboot tools-sgeexec-10-17 due to high load === 2024-01-08 === * 12:26 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_grid_node for tools-sgeweblight-10-27, tools-sgeweblight-10-28 * 10:51 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 10:51 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 10:17 taavi: reboot tools-sgeexec-10-21 === 2024-01-05 === * 14:55 wm-bot2: dcaro@urcuchillay END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 14:55 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 11:56 wm-bot2: dcaro@urcuchillay END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 11:55 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 10:29 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.grid.cleanup_queue_errors (exit_code=0) * 10:29 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.grid.cleanup_queue_errors === 2024-01-04 === * 10:11 dcaro: deploy toolforge-envvars-cli 0.0.3 === 2024-01-03 === * 21:22 andrewbogott: truncating 200 logfiles to 5M on tools nfs * 21:17 andrewbogott: deleting many stray core dumps throughout nfs storage === 2024-01-02 === * 11:06 dcaro: restart toolsdb database to flush connections ([[phab:T354176|T354176]]) * 10:42 dcaro: flushed the redis db on tools-harbor-1 ([[phab:T354176|T354176]]) * 10:37 dcaro: hard reboot tools-harbor-1 * 10:13 dhinus: hard reboot tools-harbor-1 === 2024-01-01 === * 15:55 andrewbogott: rebooting tools-harbor-1, [[phab:T354151|T354151]] ==Archives== * [[Nova Resource:Tools/SAL/Archive 1|Archive 1]] (2013-2014) * [[Nova Resource:Tools/SAL/Archive 2|Archive 2]] (2015-2017) * [[Nova Resource:Tools/SAL/Archive 3|Archive 3]] (2018-2019) * [[Nova Resource:Tools/SAL/Archive 4|Archive 4]] (2020-2021) * [[Nova Resource:Tools/SAL/Archive 5|Archive 5]] (2022-2023) </noinclude> {{SAL|Project Name=tools}} <noinclude>[[Category:SAL]]</noinclude> j5fgcmuqzw0m0706z6v4g4pvr6m69yj Deployments 0 4108 2249720 2249656 2024-12-01T03:49:28Z ScheduleDeploymentBot 37566 Add [[gerrit:1083434]] to Monday, December 02 UTC morning backport window 2249720 wikitext text/x-wiki {{Navigation MediaWiki deployment}} This page tracks '''upcoming''' '''deployments''' of software to the [[m:Special:SiteMatrix|Wikimedia Foundation servers]]. == Getting started == Ensure you joined the {{irc|wikimedia-operations}} IRC channel as all deployment-related communications happen there. If you need help, contact [[mw:Wikimedia Release Engineering Team|Release Engineering]] on IRC at {{irc|wikimedia-releng}}; and ping Tyler (<code>thcipriani</code>). * '''MediaWiki is deployed weekly''' through the [[/Train|Deployment Train]]. Other services follow their own schedule. * '''Times are pinned to San Francisco''', thus the UTC time changes in March and November per [[:en:Daylight saving time in the United States|DST]]. * '''Prefer regular [[Backport windows]]''' over adding new windows. To request deployment of a config change or backport, add your username and Gerrit URL to one of the backport windows on this page. You must be online in #wikimedia-operations on IRC during your deployment and install [[WikimediaDebug]] ahead of time. The #wikimedia-operations channel requires you to [[m:IRC/Instructions#Register your nickname, identify, and enforce|register your nickname]] before you can join. ** You can use the {{Clickable button 2|backport scheduling tool|url=https://schedule-deployment.toolforge.org/}} to more easily edit this page. * Tasks that meet [[/Inclusion criteria|Inclusion criteria]] '''require their own windows''', which includes long-running tasks. '''Schedule more time''' than you think you need to account for delays and set backs, we recommend one hour for most tasks. **To create or modify a recurring deploy window, send a patchset to [[gitlab:repos/releng/release/-/blob/main/make-deployment-calendar/deployments-calendar.yaml|deployments-calendar.yaml file]] in <code>repos/releng/release.git</code>. ** '''Announce''' changes to the [[mail:ops|ops mailing list]] ahead of time if they are likely to affect HTTP caching, introduce new cookies, or utilize new database tables. ** '''Announce''' deployments of major features to the community via [[meta:Tech/News/Next|Tech News]] and/or via other [[mediawikiwiki:Wikimedia_Product_Guidance/Communication_channels|Product communication channels]]. * '''Something went wrong?''' See [[Incident response]]. Is there a user-impacting problem? Communicate in the {{irc|wikimedia-operations}} IRC channel. If there is a Phabricator task, ensure [[phab:tag/wikimedia-incident/|#Wikimedia-Incident]] is tagged, and consider setting the [[mw:Phabricator/Project_management#Priority_levels|Unbreak Now]] priority. __TOC__ {{anchor|Next Week|Near Term|Near term|Near-term}}{{clear}} [[Category:Deployment]] {{Note|content=Subscribe in Google Calendar via <code>wikimedia.org_rudis09ii2mm5fk4hgdjeh1u64@group.calendar.google.com</code>.<br>This may not include one-off windows. '''If there are differences, then the wiki page is canonical and correct'''.}} ==Week of December 02== ==={{Deployment_day|date=2024-12-01}}=== {{Deployment calendar event card |when=2024-12-01 00:00 SF |length=24 |window=No deploys all day! See [[Deployments/Emergencies]] if things are broken. |who= |what=No Deploys }} ==={{Deployment_day|date=2024-12-02}}=== {{Deployment calendar event card |when=2024-12-02 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what= {{ircnick|sd0001|sd0001}} {{deploy|type=config|gerrit=1083434|title=votewiki, testwiki: add securepoll-edit-poll to electionadmin|status=}} - {{phabricator|T377531}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2024-12-02 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2024-12-02 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what= {{ircnick|Daimona|Daimona}} {{deploy|type=config|gerrit=1099233|title=Drop $wgWikimediaCampaignEventsEnableCommunityList|status=}} - {{phabricator|T380075}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2024-12-02 08:30 SF |length=0.5 |window=Wikimedia Portals Update |who={{ircnick|jan_drewniak|Jan Drewniak}} |what=Weekly window for the portals page: https://www.wikipedia.org/ }} {{Deployment calendar event card |when=2024-12-02 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2024-12-02 10:00 SF |length=0.5 |window=Wikidata Query Service weekly deploy |who={{ircnick|ryankemper|Ryan}} |what=... }} {{Deployment calendar event card |when=2024-12-02 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|cjming|Clare}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}} |what= {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2024-12-02 14:00 SF |length=2 |window=Weekly Security deployment window |who={{ircnick|Reedy|Sam}}, {{ircnick|sbassett|Scott}}, {{ircnick|Maryum|Maryum}}, {{ircnick|manfredi|Manfredi}} |what=Held deployment window for Security-team related deploys. }} {{Deployment calendar event card |when=2024-12-02 19:00 SF |length=1 |window=Automatic branching of MediaWiki, extensions, skins, and vendor – see [[Heterogeneous_deployment/Train_deploys]] |who=N/A |what=Branch <code>wmf/1.44.0-wmf.6</code> }} {{Deployment calendar event card |when=2024-12-02 20:00 SF |length=1 |window=Automatic deployment of of MediaWiki, extensions, skins, and vendor to testwikis only – see [[Heterogeneous_deployment/Train_deploys]] |who=N/A |what=Deploy <code>wmf/1.44.0-wmf.6</code> to testwikis }} {{Deployment calendar event card |when=2024-12-02 21:00 SF |length=1 |window=Automatic removal of all obsolete MediaWiki versions from the deployment and bare metal servers (except the most-recent obsolete version) |who=N/A |what=Runs <code>scap clean auto</code> }} {{Deployment calendar event card |when=2024-12-02 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2024-12-02 23:00 SF |length=0.5 |window=Primary database switchover |who={{ircnick|marostegui|Manuel Arostegui}}, {{ircnick|Amir1|Amir}}, {{ircnick|arnaudb|Arnaud}} |what=Held deployment window for database primary masters maintenance }} ==={{Deployment_day|date=2024-12-03}}=== {{Deployment calendar event card |when=2024-12-03 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what= {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2024-12-03 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2024-12-03 05:00 SF |length=1 |window=Mobileapps/RESTBase/Wikifeeds |who=Content Transform Team |what=Content transform team node services (mobileapps/wikifeeds) }} {{Deployment calendar event card |when=2024-12-03 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what= {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2024-12-03 08:00 SF |length=1 |window=SRE Collaboration Services office hours |who={{ircnick|eoghan|Eoghan}}, {{ircnick|jelto|Jelto}}, {{ircnick|arnoldokoth|Arnold}}, {{ircnick|mutante|Daniel}} |what=Services including Gerrit, Phorge (Phabricator), GitLab }} {{Deployment calendar event card |when=2024-12-03 09:00 SF |length=1 |window=[[Puppet request window]]<br/><small>'''(Max 6 patches)'''</small> |who={{ircnick|jhathaway|JHathaway}}, {{ircnick|rzl|Reuven}} |what= {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to Puppet change'' }} {{Deployment calendar event card |when=2024-12-03 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2024-12-03 11:00 SF |length=2 |window=MediaWiki train - Utc-7 Version |who={{ircnick|thcipriani|Tyler}}, {{ircnick|thcipriani|Tyler}} |what=[[mw:MediaWiki 1.44/Roadmap#Schedule for the deployments|1.44 schedule]] {{DeployOneWeekMini|1.44.0-wmf.5->1.44.0-wmf.6|1.44.0-wmf.5|1.44.0-wmf.5}} * group0 to [[mw:MediaWiki_1.44/wmf.6|1.44.0-wmf.6]] * '''Blockers: {{phabricator|T375665}}''' }} {{Deployment calendar event card |when=2024-12-03 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|cjming|Clare}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}} |what= {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2024-12-03 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} ==={{Deployment_day|date=2024-12-04}}=== {{Deployment calendar event card |when=2024-12-04 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what= {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2024-12-04 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2024-12-04 04:00 SF |length=1 |window=[[mw:Services|Services]] – [[Citoid]] / [[Zotero]] |who=Marielle ({{ircnick|mvolz}}) |what=See [[mw:Citoid|Citoid]] }} {{Deployment calendar event card |when=2024-12-04 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what= {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2024-12-04 07:00 SF |length=1 |window=Wikifunctions Services UTC Afternoon |who=Abstract Wikipedia team (Africa, Europe, Eastern Americas) |what=Wikifunctions back-end k8s services }} {{Deployment calendar event card |when=2024-12-04 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2024-12-04 11:00 SF |length=2 |window=MediaWiki train - Utc-7 Version |who={{ircnick|thcipriani|Tyler}}, {{ircnick|thcipriani|Tyler}} |what=[[mw:MediaWiki 1.44/Roadmap#Schedule for the deployments|1.44 schedule]] {{DeployOneWeekMini|1.44.0-wmf.6|1.44.0-wmf.5->1.44.0-wmf.6|1.44.0-wmf.5}} * group1 to [[mw:MediaWiki_1.44/wmf.6|1.44.0-wmf.6]] * '''Blockers: {{phabricator|T375665}}''' }} {{Deployment calendar event card |when=2024-12-04 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|cjming|Clare}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}} |what= {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2024-12-04 14:00 SF |length=1 |window=Wikifunctions Services UTC Late |who=Abstract Wikipedia team (North and South America) |what=Wikifunctions back-end k8s services }} {{Deployment calendar event card |when=2024-12-04 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2024-12-04 23:00 SF |length=0.5 |window=Primary database switchover |who={{ircnick|marostegui|Manuel Arostegui}}, {{ircnick|Amir1|Amir}}, {{ircnick|arnaudb|Arnaud}} |what=Held deployment window for database primary masters maintenance }} ==={{Deployment_day|date=2024-12-05}}=== {{Deployment calendar event card |when=2024-12-05 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what= {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2024-12-05 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2024-12-05 05:00 SF |length=1 |window=Mobileapps/RESTBase/Wikifeeds |who=Content Transform Team |what=Content transform team node services (mobileapps/wikifeeds) }} {{Deployment calendar event card |when=2024-12-05 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what= {{ircnick|joelyrookewmde|joelyrookewmde}} {{deploy|type=config|gerrit=1098045|title=Remove feature flag which controls wikibase item link location|status=}} - {{phabricator|T377809}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2024-12-05 08:00 SF |length=1 |window=Train log triage |who={{ircnick|thcipriani|Tyler}}, {{ircnick|thcipriani|Tyler}} |what=See [[Heterogeneous_deployment/Train_deploys#Breakage]] }} {{Deployment calendar event card |when=2024-12-05 09:00 SF |length=1 |window=[[Puppet request window]]<br/><small>'''(Max 6 patches)'''</small> |who={{ircnick|jhathaway|JHathaway}}, {{ircnick|rzl|Reuven}} |what= {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to Puppet change'' }} {{Deployment calendar event card |when=2024-12-05 10:00 SF |length=1 |window=Cloud Services/Technical Documentation weekly deploy (Toolhub, Developer portal, Striker) |who={{ircnick|bd808}} |what=... }} {{Deployment calendar event card |when=2024-12-05 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2024-12-05 11:00 SF |length=2 |window=MediaWiki train - Utc-7 Version |who={{ircnick|thcipriani|Tyler}}, {{ircnick|thcipriani|Tyler}} |what=[[mw:MediaWiki 1.44/Roadmap#Schedule for the deployments|1.44 schedule]] {{DeployOneWeekMini|1.44.0-wmf.6|1.44.0-wmf.6|1.44.0-wmf.5->1.44.0-wmf.6}} * group2 to [[mw:MediaWiki_1.44/wmf.6|1.44.0-wmf.6]] * '''Blockers: {{phabricator|T375665}}''' }} {{Deployment calendar event card |when=2024-12-05 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|cjming|Clare}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}} |what= {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2024-12-05 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} ==={{Deployment_day|date=2024-12-06}}=== {{Deployment calendar event card |when=2024-12-06 00:00 SF |length=24 |window=No deploys all day! See [[Deployments/Emergencies]] if things are broken. |who= |what=No Deploys }} {{Deployment calendar event card |when=2024-12-06 04:00 SF |length=0.5 |window=GitLab version upgrades |who={{ircnick|eoghan|Eoghan}}, {{ircnick|jelto|Jelto}}, {{ircnick|arnoldokoth|Arnold}}, {{ircnick|mutante|Daniel}} |what=GitLab version upgrades }} ==={{Deployment_day|date=2024-12-07}}=== {{Deployment calendar event card |when=2024-12-07 00:00 SF |length=24 |window=No deploys all day! See [[Deployments/Emergencies]] if things are broken. |who= |what=No Deploys }} ==Week of December 09== ==={{Deployment_day|date=2024-12-08}}=== {{Deployment calendar event card |when=2024-12-08 00:00 SF |length=24 |window=No deploys all day! See [[Deployments/Emergencies]] if things are broken. |who= |what=No Deploys }} ==={{Deployment_day|date=2024-12-09}}=== {{Deployment calendar event card |when=2024-12-09 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what= {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2024-12-09 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2024-12-09 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what= {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2024-12-09 08:30 SF |length=0.5 |window=Wikimedia Portals Update |who={{ircnick|jan_drewniak|Jan Drewniak}} |what=Weekly window for the portals page: https://www.wikipedia.org/ }} {{Deployment calendar event card |when=2024-12-09 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2024-12-09 10:00 SF |length=0.5 |window=Wikidata Query Service weekly deploy |who={{ircnick|ryankemper|Ryan}} |what=... }} {{Deployment calendar event card |when=2024-12-09 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|cjming|Clare}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}} |what= {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2024-12-09 14:00 SF |length=2 |window=Weekly Security deployment window |who={{ircnick|Reedy|Sam}}, {{ircnick|sbassett|Scott}}, {{ircnick|Maryum|Maryum}}, {{ircnick|manfredi|Manfredi}} |what=Held deployment window for Security-team related deploys. }} {{Deployment calendar event card |when=2024-12-09 19:00 SF |length=1 |window=Automatic branching of MediaWiki, extensions, skins, and vendor – see [[Heterogeneous_deployment/Train_deploys]] |who=N/A |what=Branch <code>wmf/1.44.0-wmf.7</code> }} {{Deployment calendar event card |when=2024-12-09 20:00 SF |length=1 |window=Automatic deployment of of MediaWiki, extensions, skins, and vendor to testwikis only – see [[Heterogeneous_deployment/Train_deploys]] |who=N/A |what=Deploy <code>wmf/1.44.0-wmf.7</code> to testwikis }} {{Deployment calendar event card |when=2024-12-09 21:00 SF |length=1 |window=Automatic removal of all obsolete MediaWiki versions from the deployment and bare metal servers (except the most-recent obsolete version) |who=N/A |what=Runs <code>scap clean auto</code> }} {{Deployment calendar event card |when=2024-12-09 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2024-12-09 23:00 SF |length=0.5 |window=Primary database switchover |who={{ircnick|marostegui|Manuel Arostegui}}, {{ircnick|Amir1|Amir}}, {{ircnick|arnaudb|Arnaud}} |what=Held deployment window for database primary masters maintenance }} ==={{Deployment_day|date=2024-12-10}}=== {{Deployment calendar event card |when=2024-12-10 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what= {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2024-12-10 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2024-12-10 05:00 SF |length=1 |window=Mobileapps/RESTBase/Wikifeeds |who=Content Transform Team |what=Content transform team node services (mobileapps/wikifeeds) }} {{Deployment calendar event card |when=2024-12-10 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what= {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2024-12-10 08:00 SF |length=1 |window=SRE Collaboration Services office hours |who={{ircnick|eoghan|Eoghan}}, {{ircnick|jelto|Jelto}}, {{ircnick|arnoldokoth|Arnold}}, {{ircnick|mutante|Daniel}} |what=Services including Gerrit, Phorge (Phabricator), GitLab }} {{Deployment calendar event card |when=2024-12-10 09:00 SF |length=1 |window=[[Puppet request window]]<br/><small>'''(Max 6 patches)'''</small> |who={{ircnick|jhathaway|JHathaway}}, {{ircnick|rzl|Reuven}} |what= {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to Puppet change'' }} {{Deployment calendar event card |when=2024-12-10 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2024-12-10 11:00 SF |length=2 |window=MediaWiki train - Utc-7 Version |who={{ircnick|thcipriani|Tyler}}, {{ircnick|thcipriani|Tyler}} |what=[[mw:MediaWiki 1.44/Roadmap#Schedule for the deployments|1.44 schedule]] {{DeployOneWeekMini|1.44.0-wmf.6->1.44.0-wmf.7|1.44.0-wmf.6|1.44.0-wmf.6}} * group0 to [[mw:MediaWiki_1.44/wmf.7|1.44.0-wmf.7]] * '''Blockers: {{phabricator|T375666}}''' }} {{Deployment calendar event card |when=2024-12-10 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|cjming|Clare}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}} |what= {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2024-12-10 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} ==={{Deployment_day|date=2024-12-11}}=== {{Deployment calendar event card |when=2024-12-11 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what= {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2024-12-11 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2024-12-11 04:00 SF |length=1 |window=[[mw:Services|Services]] – [[Citoid]] / [[Zotero]] |who=Marielle ({{ircnick|mvolz}}) |what=See [[mw:Citoid|Citoid]] }} {{Deployment calendar event card |when=2024-12-11 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what= {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2024-12-11 07:00 SF |length=1 |window=Wikifunctions Services UTC Afternoon |who=Abstract Wikipedia team (Africa, Europe, Eastern Americas) |what=Wikifunctions back-end k8s services }} {{Deployment calendar event card |when=2024-12-11 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2024-12-11 11:00 SF |length=2 |window=MediaWiki train - Utc-7 Version |who={{ircnick|thcipriani|Tyler}}, {{ircnick|thcipriani|Tyler}} |what=[[mw:MediaWiki 1.44/Roadmap#Schedule for the deployments|1.44 schedule]] {{DeployOneWeekMini|1.44.0-wmf.7|1.44.0-wmf.6->1.44.0-wmf.7|1.44.0-wmf.6}} * group1 to [[mw:MediaWiki_1.44/wmf.7|1.44.0-wmf.7]] * '''Blockers: {{phabricator|T375666}}''' }} {{Deployment calendar event card |when=2024-12-11 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|cjming|Clare}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}} |what= {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2024-12-11 14:00 SF |length=1 |window=Wikifunctions Services UTC Late |who=Abstract Wikipedia team (North and South America) |what=Wikifunctions back-end k8s services }} {{Deployment calendar event card |when=2024-12-11 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2024-12-11 23:00 SF |length=0.5 |window=Primary database switchover |who={{ircnick|marostegui|Manuel Arostegui}}, {{ircnick|Amir1|Amir}}, {{ircnick|arnaudb|Arnaud}} |what=Held deployment window for database primary masters maintenance }} ==={{Deployment_day|date=2024-12-12}}=== {{Deployment calendar event card |when=2024-12-12 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what= {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2024-12-12 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2024-12-12 05:00 SF |length=1 |window=Mobileapps/RESTBase/Wikifeeds |who=Content Transform Team |what=Content transform team node services (mobileapps/wikifeeds) }} {{Deployment calendar event card |when=2024-12-12 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what= {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2024-12-12 08:00 SF |length=1 |window=Train log triage |who={{ircnick|thcipriani|Tyler}}, {{ircnick|thcipriani|Tyler}} |what=See [[Heterogeneous_deployment/Train_deploys#Breakage]] }} {{Deployment calendar event card |when=2024-12-12 09:00 SF |length=1 |window=[[Puppet request window]]<br/><small>'''(Max 6 patches)'''</small> |who={{ircnick|jhathaway|JHathaway}}, {{ircnick|rzl|Reuven}} |what= {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to Puppet change'' }} {{Deployment calendar event card |when=2024-12-12 10:00 SF |length=1 |window=Cloud Services/Technical Documentation weekly deploy (Toolhub, Developer portal, Striker) |who={{ircnick|bd808}} |what=... }} {{Deployment calendar event card |when=2024-12-12 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2024-12-12 11:00 SF |length=2 |window=MediaWiki train - Utc-7 Version |who={{ircnick|thcipriani|Tyler}}, {{ircnick|thcipriani|Tyler}} |what=[[mw:MediaWiki 1.44/Roadmap#Schedule for the deployments|1.44 schedule]] {{DeployOneWeekMini|1.44.0-wmf.7|1.44.0-wmf.7|1.44.0-wmf.6->1.44.0-wmf.7}} * group2 to [[mw:MediaWiki_1.44/wmf.7|1.44.0-wmf.7]] * '''Blockers: {{phabricator|T375666}}''' }} {{Deployment calendar event card |when=2024-12-12 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|cjming|Clare}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}} |what= {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2024-12-12 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} ==={{Deployment_day|date=2024-12-13}}=== {{Deployment calendar event card |when=2024-12-13 00:00 SF |length=24 |window=No deploys all day! See [[Deployments/Emergencies]] if things are broken. |who= |what=No Deploys }} {{Deployment calendar event card |when=2024-12-13 04:00 SF |length=0.5 |window=GitLab version upgrades |who={{ircnick|eoghan|Eoghan}}, {{ircnick|jelto|Jelto}}, {{ircnick|arnoldokoth|Arnold}}, {{ircnick|mutante|Daniel}} |what=GitLab version upgrades }} ==={{Deployment_day|date=2024-12-14}}=== {{Deployment calendar event card |when=2024-12-14 00:00 SF |length=24 |window=No deploys all day! See [[Deployments/Emergencies]] if things are broken. |who= |what=No Deploys }} cimjchnjivc6ynnh2iozdoanumvx998 Server Admin Log/Archives 0 4673 2249716 2240663 2024-12-01T00:29:19Z JrandWP 37706 /* 2020s */ archive 2249716 wikitext text/x-wiki <noinclude>{{process header |previous=← [[Server Admin Log]] |title=Server Admin Log |section=(archives) }}</noinclude> <inputbox> type=fulltext prefix=Server Admin Log/ searchbuttonlabel=Search archives break=no </inputbox><noinclude> ==Archives== </noinclude> ===2000s=== <div style="column-count:2;-moz-column-count:2;-webkit-column-count:2"> * [[Server Admin Log/Archive 1|Archive 1: 2004 Jun - 2004 Sep]] * [[Server Admin Log/Archive 2|Archive 2: 2004 Oct - 2004 Nov]] * [[Server Admin Log/Archive 3|Archive 3: 2004 Dec - 2005 Mar]] * [[Server Admin Log/Archive 4|Archive 4: 2005 Apr - 2005 Jul]] * [[Server Admin Log/Archive 5|Archive 5: 2005 Aug - 2005 Oct]], <small>with revision history 2004-06-23 to 2005-11-25</small> * [[Server Admin Log/Archive 6|Archive 6: 2005 Nov - 2006 Feb]] * [[Server Admin Log/Archive 7|Archive 7: 2006 Mar - 2006 Jun]] * [[Server Admin Log/Archive 8|Archive 8: 2006 Jul - 2006 Sep]] * [[Server Admin Log/Archive 9|Archive 9: 2006 Oct - 2007 Jan]], <small>with revision history 2005-11-25 to 2007-02-21</small> * [[Server Admin Log/Archive 10|Archive 10: 2007 Feb - 2007 Jun]] * [[Server Admin Log/Archive 11|Archive 11: 2007 Jul - 2007 Dec]] * [[Server Admin Log/Archive 12|Archive 12: 2008 Jan - 2008 Jul]] * [[Server Admin Log/2008-08|Archive 12a: 2008 Aug]] * [[Server Admin Log/2008-09|Archive 12b: 2008 Sept]] * [[Server Admin Log/Archive 13|Archive 13: 2008 Oct - 2009 Jun]] * [[Server Admin Log/Archive 14|Archive 14: 2009 Jun - 2009 Dec]] </div> ===2010s=== <div style="column-count:2;-moz-column-count:2;-webkit-column-count:2"> * [[Server Admin Log/Archive 15|Archive 15: 2010 Jan - 2010 Jun]] * [[Server Admin Log/Archive 16|Archive 16: 2010 Jul - 2010 Oct]] * [[Server Admin Log/Archive 17|Archive 17: 2010 Nov - 2010 Dec]] * [[Server Admin Log/Archive 18|Archive 18: 2011 Jan - 2011 Jun]] * [[Server Admin Log/Archive 19|Archive 19: 2011 Jul - 2011 Dec]] * [[Server Admin Log/Archive 20|Archive 20: 2011 Dec - 2012 Jun]], <small>with revision history 2007-02-21 to 2012-03-27</small> * [[Server Admin Log/Archive 21|Archive 21: 2012 Jul - 2013 Jan]] * [[Server Admin Log/Archive 22|Archive 22: 2013 Jan - 2013 Jul]] * [[Server Admin Log/Archive 23|Archive 23: 2013 Aug - 2013 Dec]] * [[Server Admin Log/Archive 24|Archive 24: 2014 Jan - 2014 Mar]] * [[Server Admin Log/Archive 25|Archive 25: 2014 April - 2014 September]] * [[Server Admin Log/Archive 26|Archive 26: 2014 October - 2014 December]] * [[Server Admin Log/Archive 27|Archive 27: 2015 January - 2015 July]] * [[Server Admin Log/Archive 28|Archive 28: 2015 August - 2015 December]] * [[Server Admin Log/Archive 29|Archive 29: 2016 January - 2016 May]] * [[Server Admin Log/Archive 30|Archive 30: 2016 June - 2016 August]] * [[Server Admin Log/Archive 31|Archive 31: 2016 September - 2016 December]] * [[Server Admin Log/Archive 32|Archive 32: 2017 January - 2017 July]] * [[Server Admin Log/Archive 33|Archive 33: 2017 August - 2017 December]] * [[Server Admin Log/Archive 34|Archive 34: 2018 January - 2018 April]] * [[Server Admin Log/Archive 35|Archive 35: 2018 May - 2018 August]] * [[Server Admin Log/Archive 36|Archive 36: 2018 September - 2018 December]] * [[Server Admin Log/Archive 37|Archive 37: 2019 January - 2019 April]] * [[Server Admin Log/Archive 38|Archive 38: 2019 May - 2019 August]] * [[Server Admin Log/Archive 39|Archive 39: 2019 September - 2019 December]] </div> ===2020s=== <div style="column-count:2;-moz-column-count:2;-webkit-column-count:2"> * [[Server Admin Log/Archive 40|Archive 40: 2020 January - 2020 April]] * [[Server Admin Log/Archive 41|Archive 41: 2020 May - 2020 July]] * [[Server Admin Log/Archive 42|Archive 42: 2020 August - 2020 November]] * [[Server Admin Log/Archive 43|Archive 43: 2020 December]] * [[Server Admin Log/Archive 44|Archive 44: 2021 January - 2021 April]] * [[Server Admin Log/Archive 45|Archive 45: 2021 May - 2021 July]] * [[Server Admin Log/Archive 46|Archive 46: 2021 August - 2021 October]] * [[Server Admin Log/Archive 47|Archive 47: 2021 November - 2021 December]] * [[Server Admin Log/Archive 48|Archive 48: 2022 January]] * [[Server Admin Log/Archive 49|Archive 49: 2022 February]] * [[Server Admin Log/Archive 50|Archive 50: 2022 March]] * [[Server Admin Log/Archive 51|Archive 51: 2022 April 1-15]] * [[Server Admin Log/Archive 52|Archive 52: 2022 April 16-30]] * [[Server Admin Log/Archive 53|Archive 53: 2022 May]] * [[Server Admin Log/Archive 54|Archive 54: 2022 June]] * [[Server Admin Log/Archive 55|Archive 55: 2022 July]] * [[Server Admin Log/Archive 56|Archive 56: 2022 August]] * [[Server Admin Log/Archive 57|Archive 57: 2022 September]] * [[Server Admin Log/Archive 58|Archive 58: 2022 October]] * [[Server Admin Log/Archive 59|Archive 59: 2022 November 1-15]] * [[Server Admin Log/Archive 60|Archive 60: 2022 November 16-30]] * [[Server Admin Log/Archive 61|Archive 61: 2022 December]] * [[Server Admin Log/Archive 62|Archive 62: 2023 January]] * [[Server Admin Log/Archive 63|Archive 63: 2023 February]] * [[Server Admin Log/Archive 64|Archive 64: 2023 March]] * [[Server Admin Log/Archive 65|Archive 65: 2023 April]] * [[Server Admin Log/Archive 66|Archive 66: 2023 May]] * [[Server Admin Log/Archive 67|Archive 67: 2023 June]] * [[Server Admin Log/Archive 68|Archive 68: 2023 July]] * [[Server Admin Log/Archive 69|Archive 69: 2023 August 1-15]] * [[Server Admin Log/Archive 70|Archive 70: 2023 August 16-31]] * [[Server Admin Log/Archive 71|Archive 71: 2023 September]] * [[Server Admin Log/Archive 72|Archive 72: 2023 October]] * [[Server Admin Log/Archive 73|Archive 73: 2023 November]] * [[Server Admin Log/Archive 74|Archive 74: 2023 December]] * [[Server Admin Log/Archive 75|Archive 75: 2024 January]] * [[Server Admin Log/Archive 76|Archive 76: 2024 February]] * [[Server Admin Log/Archive 77|Archive 77: 2024 March]] * [[Server Admin Log/Archive 78|Archive 78: 2024 April]] * [[Server Admin Log/Archive 79|Archive 79: 2024 May 1-15]] * [[Server Admin Log/Archive 80|Archive 80: 2024 May 16-31]] * [[Server Admin Log/Archive 81|Archive 81: 2024 June 1-15]] * [[Server Admin Log/Archive 82|Archive 82: 2024 June 16-30]] * [[Server Admin Log/Archive 83|Archive 83: 2024 July]] * [[Server Admin Log/Archive 84|Archive 84: 2024 August]] * [[Server Admin Log/Archive 85|Archive 85: 2024 September]] * [[Server Admin Log/Archive 86|Archive 86: 2024 October]] * [[Server Admin Log/Archive 87|Archive 87: 2024 November]] </div> <!-- omg! --> </div><includeonly> [[Category:Server Admin Log archive]] </includeonly> njmtz9t0waw60kuf5le2zvipbn379k2 Server Admin Log 0 7919 2249709 2249708 2024-11-30T11:59:59Z Stashbot 7414 joal@deploy2002: Finished deploy [airflow-dags/analytics@fe37cfe]: Hotfix airflow analytics deploy [airflow-dags/analytics@fe37cfec] (duration: 01m 21s) 2249709 wikitext text/x-wiki == 2024-11-30 == * 11:59 joal@deploy2002: Finished deploy [airflow-dags/analytics@fe37cfe]: Hotfix airflow analytics deploy [airflow-dags/analytics@fe37cfec] (duration: 01m 21s) * 11:58 joal@deploy2002: Started deploy [airflow-dags/analytics@fe37cfe]: Hotfix airflow analytics deploy [airflow-dags/analytics@fe37cfec] == 2024-11-29 == * 16:55 jayme: puppet ca destroy mwmaint.discovery.wmnet - [[phab:T341859|T341859]] * 16:22 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2006.codfw.wmnet * 16:21 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1006.eqiad.wmnet * 16:15 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-gp2006.codfw.wmnet * 16:15 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-gp1006.eqiad.wmnet * 15:41 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2005.codfw.wmnet * 15:40 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1005.eqiad.wmnet * 15:34 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-gp1005.eqiad.wmnet * 15:34 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-gp2005.codfw.wmnet * 15:16 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2004.codfw.wmnet * 15:16 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1004.eqiad.wmnet * 15:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P71448 and previous config saved to /var/cache/conftool/dbconfig/20241129-151101-ladsgroup.json * 15:10 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-gp2004.codfw.wmnet * 15:09 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-gp1004.eqiad.wmnet * 14:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P71447 and previous config saved to /var/cache/conftool/dbconfig/20241129-145554-ladsgroup.json * 14:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P71446 and previous config saved to /var/cache/conftool/dbconfig/20241129-144047-ladsgroup.json * 14:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P71445 and previous config saved to /var/cache/conftool/dbconfig/20241129-142540-ladsgroup.json * 14:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1231 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P71444 and previous config saved to /var/cache/conftool/dbconfig/20241129-141931-ladsgroup.json * 14:19 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1231.eqiad.wmnet with reason: Maintenance * 14:19 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1231.eqiad.wmnet with reason: Maintenance * 14:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance * 14:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance * 14:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P71443 and previous config saved to /var/cache/conftool/dbconfig/20241129-141409-ladsgroup.json * 13:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P71442 and previous config saved to /var/cache/conftool/dbconfig/20241129-135902-ladsgroup.json * 13:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P71441 and previous config saved to /var/cache/conftool/dbconfig/20241129-134355-ladsgroup.json * 13:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P71440 and previous config saved to /var/cache/conftool/dbconfig/20241129-132848-ladsgroup.json * 13:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti1021.eqiad.wmnet * 13:23 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:22 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti1021.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 13:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1187 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P71439 and previous config saved to /var/cache/conftool/dbconfig/20241129-132136-ladsgroup.json * 13:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance * 13:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance * 13:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P71438 and previous config saved to /var/cache/conftool/dbconfig/20241129-132111-ladsgroup.json * 13:17 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti1021.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 13:13 jmm@cumin2002: START - Cookbook sre.dns.netbox * 13:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P71437 and previous config saved to /var/cache/conftool/dbconfig/20241129-130604-ladsgroup.json * 13:06 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti1021.eqiad.wmnet * 12:57 klausman@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 12:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P71436 and previous config saved to /var/cache/conftool/dbconfig/20241129-125057-ladsgroup.json * 12:42 klausman@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 12:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P71434 and previous config saved to /var/cache/conftool/dbconfig/20241129-123549-ladsgroup.json * 12:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti1015.eqiad.wmnet * 12:32 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:32 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti1015.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 12:31 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti1015.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 12:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1180 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P71433 and previous config saved to /var/cache/conftool/dbconfig/20241129-122735-ladsgroup.json * 12:27 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance * 12:27 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance * 12:27 jmm@cumin2002: START - Cookbook sre.dns.netbox * 12:21 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti1015.eqiad.wmnet * 12:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P71432 and previous config saved to /var/cache/conftool/dbconfig/20241129-121010-ladsgroup.json * 12:06 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab to new version * 12:04 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2085.codfw.wmnet with OS bullseye * 11:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P71431 and previous config saved to /var/cache/conftool/dbconfig/20241129-115501-ladsgroup.json * 11:44 moritzm: imported mapnik_4.0.3+ds2~wmf12u1 to component/maps [[phab:T216826|T216826]] * 11:43 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 11:40 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 11:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P71430 and previous config saved to /var/cache/conftool/dbconfig/20241129-113954-ladsgroup.json * 11:31 Dreamy_Jazz: Started MediaModeration scanning scripts to scan all wikis * 11:29 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 11:27 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2084.codfw.wmnet with OS bullseye * 11:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P71429 and previous config saved to /var/cache/conftool/dbconfig/20241129-112447-ladsgroup.json * 11:19 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:18 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1173 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P71428 and previous config saved to /var/cache/conftool/dbconfig/20241129-111554-ladsgroup.json * 11:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance * 11:15 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance * 11:01 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2084.codfw.wmnet with reason: host reimage * 11:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2214.codfw.wmnet with reason: Maintenance * 11:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2214.codfw.wmnet with reason: Maintenance * 10:57 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2084.codfw.wmnet with reason: host reimage * 10:45 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2084.codfw.wmnet with OS bullseye * 10:36 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 10:13 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 10:10 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 09:57 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 09:43 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 09:21 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 09:18 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 09:05 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 09:02 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab to new version * 08:54 moritzm: imported mapbox-polylabel 2.0.1-1~wmf12u1 to component/maps [[phab:T216826|T216826]] * 08:51 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:51 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:16 moritzm: imported mapbox-geometry_2.0.3-1~wmf12u1 to component/maps [[phab:T216826|T216826]] * 07:19 marostegui@cumin2002: dbctl commit (dc=all): 'db1223 (re)pooling @ 100%: Repooling after corruption', diff saved to https://phabricator.wikimedia.org/P71427 and previous config saved to /var/cache/conftool/dbconfig/20241129-071905-root.json * 07:10 aqu@deploy2002: Finished deploy [airflow-dags/analytics@656d6df]: Generate canary events faster in Airflow (duration: 03m 15s) * 07:06 aqu@deploy2002: Started deploy [airflow-dags/analytics@656d6df]: Generate canary events faster in Airflow * 07:03 marostegui@cumin2002: dbctl commit (dc=all): 'db1223 (re)pooling @ 75%: Repooling after corruption', diff saved to https://phabricator.wikimedia.org/P71426 and previous config saved to /var/cache/conftool/dbconfig/20241129-070333-root.json * 06:48 marostegui@cumin2002: dbctl commit (dc=all): 'db1223 (re)pooling @ 50%: Repooling after corruption', diff saved to https://phabricator.wikimedia.org/P71425 and previous config saved to /var/cache/conftool/dbconfig/20241129-064801-root.json * 06:28 marostegui@cumin2002: dbctl commit (dc=all): 'Repool', diff saved to https://phabricator.wikimedia.org/P71424 and previous config saved to /var/cache/conftool/dbconfig/20241129-062833-marostegui.json * 06:27 marostegui@cumin2002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db1223 quickly with 2 steps - Fixed corruption * 06:26 marostegui@cumin2002: START - Cookbook sre.mysql.pool db1223 quickly with 2 steps - Fixed corruption * 05:52 taavi@cumin1002: dbctl commit (dc=all): 'depool db1223, replication broken', diff saved to https://phabricator.wikimedia.org/P71423 and previous config saved to /var/cache/conftool/dbconfig/20241129-055245-taavi.json * 04:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T328817|T328817]])', diff saved to https://phabricator.wikimedia.org/P71422 and previous config saved to /var/cache/conftool/dbconfig/20241129-045409-ladsgroup.json * 04:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P71421 and previous config saved to /var/cache/conftool/dbconfig/20241129-043902-ladsgroup.json * 04:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P71420 and previous config saved to /var/cache/conftool/dbconfig/20241129-042355-ladsgroup.json * {{safesubst:SAL entry|1=04:20 tstarling@deploy2002: Finished scap sync-world: Backport for [[gerrit:1098915{{!}}addWiki.php tweaks]], [[gerrit:1098916{{!}}Run dumpInterwiki.php locally with no changes]], [[gerrit:1098917{{!}}Prepare id.wikivoyage.org for installation (T380726 T352113)]], [[gerrit:1099065{{!}}dumpInterwiki: read from preinstall.dblist (T352113)]], [[gerrit:1099066{{!}}addWiki: Move DB_ADMIN to core]], [[gerrit:1099064{{!}}addWiki: Add UpdateSearchIndexCon}} * 04:12 tstarling@deploy2002: tstarling: Continuing with sync * {{safesubst:SAL entry|1=04:12 tstarling@deploy2002: tstarling: Backport for [[gerrit:1098915{{!}}addWiki.php tweaks]], [[gerrit:1098916{{!}}Run dumpInterwiki.php locally with no changes]], [[gerrit:1098917{{!}}Prepare id.wikivoyage.org for installation (T380726 T352113)]], [[gerrit:1099065{{!}}dumpInterwiki: read from preinstall.dblist (T352113)]], [[gerrit:1099066{{!}}addWiki: Move DB_ADMIN to core]], [[gerrit:1099064{{!}}addWiki: Add UpdateSearchIndexConfig]], [[gerrit}} * 04:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T328817|T328817]])', diff saved to https://phabricator.wikimedia.org/P71419 and previous config saved to /var/cache/conftool/dbconfig/20241129-040846-ladsgroup.json * 04:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2229 ([[phab:T328817|T328817]])', diff saved to https://phabricator.wikimedia.org/P71418 and previous config saved to /var/cache/conftool/dbconfig/20241129-040547-ladsgroup.json * 04:05 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2229.codfw.wmnet with reason: Maintenance * 04:05 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2229.codfw.wmnet with reason: Maintenance * 04:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2224 ([[phab:T328817|T328817]])', diff saved to https://phabricator.wikimedia.org/P71417 and previous config saved to /var/cache/conftool/dbconfig/20241129-040523-ladsgroup.json * {{safesubst:SAL entry|1=04:01 tstarling@deploy2002: Started scap sync-world: Backport for [[gerrit:1098915{{!}}addWiki.php tweaks]], [[gerrit:1098916{{!}}Run dumpInterwiki.php locally with no changes]], [[gerrit:1098917{{!}}Prepare id.wikivoyage.org for installation (T380726 T352113)]], [[gerrit:1099065{{!}}dumpInterwiki: read from preinstall.dblist (T352113)]], [[gerrit:1099066{{!}}addWiki: Move DB_ADMIN to core]], [[gerrit:1099064{{!}}addWiki: Add UpdateSearchIndexConf}} * 03:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2224', diff saved to https://phabricator.wikimedia.org/P71416 and previous config saved to /var/cache/conftool/dbconfig/20241129-035016-ladsgroup.json * 03:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2224', diff saved to https://phabricator.wikimedia.org/P71415 and previous config saved to /var/cache/conftool/dbconfig/20241129-033509-ladsgroup.json * 03:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2224 ([[phab:T328817|T328817]])', diff saved to https://phabricator.wikimedia.org/P71414 and previous config saved to /var/cache/conftool/dbconfig/20241129-032002-ladsgroup.json * 03:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2224 ([[phab:T328817|T328817]])', diff saved to https://phabricator.wikimedia.org/P71413 and previous config saved to /var/cache/conftool/dbconfig/20241129-031705-ladsgroup.json * 03:16 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2224.codfw.wmnet with reason: Maintenance * 03:16 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2224.codfw.wmnet with reason: Maintenance * 03:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 ([[phab:T328817|T328817]])', diff saved to https://phabricator.wikimedia.org/P71412 and previous config saved to /var/cache/conftool/dbconfig/20241129-031642-ladsgroup.json * 03:04 tstarling@deploy2002: scap failed: <KeyError> '1 dbs from /srv/mediawiki-staging/wikiversions.json are missing from /srv/mediawiki-staging/dblists/all.dblist: idwikivoyage' (scap version: 4.129.0) (duration: 00m 00s) * 03:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P71411 and previous config saved to /var/cache/conftool/dbconfig/20241129-030133-ladsgroup.json * 02:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P71410 and previous config saved to /var/cache/conftool/dbconfig/20241129-024625-ladsgroup.json * 02:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 ([[phab:T328817|T328817]])', diff saved to https://phabricator.wikimedia.org/P71409 and previous config saved to /var/cache/conftool/dbconfig/20241129-023118-ladsgroup.json * 02:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2217 ([[phab:T328817|T328817]])', diff saved to https://phabricator.wikimedia.org/P71408 and previous config saved to /var/cache/conftool/dbconfig/20241129-022822-ladsgroup.json * 02:28 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2217.codfw.wmnet with reason: Maintenance * 02:28 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2217.codfw.wmnet with reason: Maintenance * 02:27 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2197.codfw.wmnet with reason: Maintenance * 02:26 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2197.codfw.wmnet with reason: Maintenance * 02:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 ([[phab:T328817|T328817]])', diff saved to https://phabricator.wikimedia.org/P71407 and previous config saved to /var/cache/conftool/dbconfig/20241129-022645-ladsgroup.json * 02:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P71406 and previous config saved to /var/cache/conftool/dbconfig/20241129-021138-ladsgroup.json * 01:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P71405 and previous config saved to /var/cache/conftool/dbconfig/20241129-015631-ladsgroup.json * 01:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 ([[phab:T328817|T328817]])', diff saved to https://phabricator.wikimedia.org/P71404 and previous config saved to /var/cache/conftool/dbconfig/20241129-014124-ladsgroup.json * 01:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2193 ([[phab:T328817|T328817]])', diff saved to https://phabricator.wikimedia.org/P71403 and previous config saved to /var/cache/conftool/dbconfig/20241129-013912-ladsgroup.json * 01:39 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2193.codfw.wmnet with reason: Maintenance * 01:38 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2193.codfw.wmnet with reason: Maintenance * 01:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 ([[phab:T328817|T328817]])', diff saved to https://phabricator.wikimedia.org/P71402 and previous config saved to /var/cache/conftool/dbconfig/20241129-013850-ladsgroup.json * 01:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P71401 and previous config saved to /var/cache/conftool/dbconfig/20241129-012343-ladsgroup.json * 01:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P71400 and previous config saved to /var/cache/conftool/dbconfig/20241129-010835-ladsgroup.json * 00:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 ([[phab:T328817|T328817]])', diff saved to https://phabricator.wikimedia.org/P71399 and previous config saved to /var/cache/conftool/dbconfig/20241129-005328-ladsgroup.json * 00:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2180 ([[phab:T328817|T328817]])', diff saved to https://phabricator.wikimedia.org/P71398 and previous config saved to /var/cache/conftool/dbconfig/20241129-005117-ladsgroup.json * 00:51 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2180.codfw.wmnet with reason: Maintenance * 00:50 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2180.codfw.wmnet with reason: Maintenance * 00:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 ([[phab:T328817|T328817]])', diff saved to https://phabricator.wikimedia.org/P71397 and previous config saved to /var/cache/conftool/dbconfig/20241129-005054-ladsgroup.json * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P71396 and previous config saved to /var/cache/conftool/dbconfig/20241129-003547-ladsgroup.json * 00:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P71395 and previous config saved to /var/cache/conftool/dbconfig/20241129-002040-ladsgroup.json * 00:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 ([[phab:T328817|T328817]])', diff saved to https://phabricator.wikimedia.org/P71394 and previous config saved to /var/cache/conftool/dbconfig/20241129-000533-ladsgroup.json * 00:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2169 ([[phab:T328817|T328817]])', diff saved to https://phabricator.wikimedia.org/P71393 and previous config saved to /var/cache/conftool/dbconfig/20241129-000234-ladsgroup.json * 00:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance * 00:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance * 00:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158 ([[phab:T328817|T328817]])', diff saved to https://phabricator.wikimedia.org/P71392 and previous config saved to /var/cache/conftool/dbconfig/20241129-000211-ladsgroup.json == 2024-11-28 == * 23:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P71391 and previous config saved to /var/cache/conftool/dbconfig/20241128-234704-ladsgroup.json * 23:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P71390 and previous config saved to /var/cache/conftool/dbconfig/20241128-233426-ladsgroup.json * 23:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P71389 and previous config saved to /var/cache/conftool/dbconfig/20241128-233157-ladsgroup.json * 23:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P71388 and previous config saved to /var/cache/conftool/dbconfig/20241128-231919-ladsgroup.json * 23:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158 ([[phab:T328817|T328817]])', diff saved to https://phabricator.wikimedia.org/P71387 and previous config saved to /var/cache/conftool/dbconfig/20241128-231650-ladsgroup.json * 23:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2158 ([[phab:T328817|T328817]])', diff saved to https://phabricator.wikimedia.org/P71386 and previous config saved to /var/cache/conftool/dbconfig/20241128-231350-ladsgroup.json * 23:13 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance * 23:13 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance * 23:13 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2158.codfw.wmnet with reason: Maintenance * 23:13 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2158.codfw.wmnet with reason: Maintenance * 23:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 ([[phab:T328817|T328817]])', diff saved to https://phabricator.wikimedia.org/P71385 and previous config saved to /var/cache/conftool/dbconfig/20241128-231312-ladsgroup.json * 23:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P71384 and previous config saved to /var/cache/conftool/dbconfig/20241128-230412-ladsgroup.json * 22:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P71383 and previous config saved to /var/cache/conftool/dbconfig/20241128-225805-ladsgroup.json * 22:50 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1178 gradually with 4 steps - Maint over ([[phab:T361627|T361627]]) * 22:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P71381 and previous config saved to /var/cache/conftool/dbconfig/20241128-224905-ladsgroup.json * 22:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P71380 and previous config saved to /var/cache/conftool/dbconfig/20241128-224258-ladsgroup.json * 22:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1165 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P71379 and previous config saved to /var/cache/conftool/dbconfig/20241128-223959-ladsgroup.json * 22:39 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance * 22:39 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance * 22:39 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance * 22:39 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance * 22:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 ([[phab:T328817|T328817]])', diff saved to https://phabricator.wikimedia.org/P71377 and previous config saved to /var/cache/conftool/dbconfig/20241128-222751-ladsgroup.json * 22:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2151 ([[phab:T328817|T328817]])', diff saved to https://phabricator.wikimedia.org/P71376 and previous config saved to /var/cache/conftool/dbconfig/20241128-222250-ladsgroup.json * 22:22 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2151.codfw.wmnet with reason: Maintenance * 22:22 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2151.codfw.wmnet with reason: Maintenance * away: UTC late deploys done * {{safesubst:SAL entry|1=22:17 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1098990{{!}}Localisation updates (November 26) (T372175)]], [[gerrit:1098956{{!}}extend account creation lookup service to cover forced creations by others (T378401)]], [[gerrit:1098965{{!}}extend account creation backfill script to forced account creations by others (T378401)]], [[gerrit:1098929{{!}}ReportIncident: Setup $wgReportIncidentLocalLinks for ptwiki pilot depl}} * 22:07 tgr@deploy2002: tgr, ariel, matmarex, mszabo: Continuing with sync * 22:05 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool db1178 gradually with 4 steps - Maint over ([[phab:T361627|T361627]]) * {{safesubst:SAL entry|1=21:53 tgr@deploy2002: tgr, ariel, matmarex, mszabo: Backport for [[gerrit:1098990{{!}}Localisation updates (November 26) (T372175)]], [[gerrit:1098956{{!}}extend account creation lookup service to cover forced creations by others (T378401)]], [[gerrit:1098965{{!}}extend account creation backfill script to forced account creations by others (T378401)]], [[gerrit:1098929{{!}}ReportIncident: Setup $wgReportIncidentLocalLinks for ptwiki pilot}} * 21:51 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1178.eqiad.wmnet with reason: Schema change ([[phab:T361627|T361627]]) * 21:51 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1178.eqiad.wmnet with reason: Schema change ([[phab:T361627|T361627]]) * 21:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1178 depool ([[phab:T361627|T361627]])', diff saved to https://phabricator.wikimedia.org/P71373 and previous config saved to /var/cache/conftool/dbconfig/20241128-215026-ladsgroup.json * {{safesubst:SAL entry|1=21:39 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1098990{{!}}Localisation updates (November 26) (T372175)]], [[gerrit:1098956{{!}}extend account creation lookup service to cover forced creations by others (T378401)]], [[gerrit:1098965{{!}}extend account creation backfill script to forced account creations by others (T378401)]], [[gerrit:1098929{{!}}ReportIncident: Setup $wgReportIncidentLocalLinks for ptwiki pilot deplo}} * 21:25 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1098617{{!}}Reader Survey: Undeploy on enwiki (T378660)]], [[gerrit:1098627{{!}}Reader Survey: Deploy on multiple wikis (T378660)]] (duration: 14m 43s) * 21:18 tgr@deploy2002: tgr, dani: Continuing with sync * 21:17 aqu@deploy2002: Finished deploy [airflow-dags/analytics@6d38940]: Generate canary events faster in Airflow (duration: 01m 39s) * 21:16 tgr@deploy2002: tgr, dani: Backport for [[gerrit:1098617{{!}}Reader Survey: Undeploy on enwiki (T378660)]], [[gerrit:1098627{{!}}Reader Survey: Deploy on multiple wikis (T378660)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:15 aqu@deploy2002: Started deploy [airflow-dags/analytics@6d38940]: Generate canary events faster in Airflow * 21:10 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1098617{{!}}Reader Survey: Undeploy on enwiki (T378660)]], [[gerrit:1098627{{!}}Reader Survey: Deploy on multiple wikis (T378660)]] * 20:30 kharlan@deploy2002: Finished scap sync-world: Backport for [[gerrit:1098929{{!}}ReportIncident: Setup $wgReportIncidentLocalLinks for ptwiki pilot deploy (T380277)]] (duration: 13m 08s) * 20:23 kharlan@deploy2002: kharlan, mszabo: Continuing with sync * 20:23 kharlan@deploy2002: kharlan, mszabo: Backport for [[gerrit:1098929{{!}}ReportIncident: Setup $wgReportIncidentLocalLinks for ptwiki pilot deploy (T380277)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:16 kharlan@deploy2002: Started scap sync-world: Backport for [[gerrit:1098929{{!}}ReportIncident: Setup $wgReportIncidentLocalLinks for ptwiki pilot deploy (T380277)]] * 19:50 kamila@cumin1002: END (PASS) - Cookbook sre.k8s.roll-reimage-nodes (exit_code=0) rolling reimage on P<nowiki>{</nowiki>wikikube-worker[1276-1277].eqiad.wmnet<nowiki>}</nowiki> and (A:wikikube-staging-master-codfw or A:wikikube-staging-worker-codfw or A:wikikube-staging-master-eqiad or A:wikikube-staging-worker-eqiad or A:wikikube-master-codfw or A:wikikube-worker-codfw or A:wikikube-master-eqiad or A:wikikube-worker-eqiad or A:ml-serve-master-eqiad or * 19:50 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1277.eqiad.wmnet with OS bookworm * 19:31 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1277.eqiad.wmnet with reason: host reimage * 19:27 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1277.eqiad.wmnet with reason: host reimage * 19:08 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1277.eqiad.wmnet with OS bookworm * 18:28 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1276.eqiad.wmnet with OS bookworm * 18:09 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1276.eqiad.wmnet with reason: host reimage * 18:06 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1276.eqiad.wmnet with reason: host reimage * 17:47 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1276.eqiad.wmnet with OS bookworm * 17:45 kamila@cumin1002: START - Cookbook sre.k8s.roll-reimage-nodes rolling reimage on P<nowiki>{</nowiki>wikikube-worker[1276-1277].eqiad.wmnet<nowiki>}</nowiki> and (A:wikikube-staging-master-codfw or A:wikikube-staging-worker-codfw or A:wikikube-staging-master-eqiad or A:wikikube-staging-worker-eqiad or A:wikikube-master-codfw or A:wikikube-worker-codfw or A:wikikube-master-eqiad or A:wikikube-worker-eqiad or A:ml-serve-master-eqiad or A:ml-serve-worker- * 17:06 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2081.codfw.wmnet with OS bullseye * 16:51 Emperor: depool/restart swift/repool ms-fe2014 * 16:51 Emperor: depool/restart swift/repool ms-fe2009 * 16:44 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 16:41 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 16:39 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2209.codfw.wmnet with reason: Maintenance * 16:39 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2209.codfw.wmnet with reason: Maintenance * 16:37 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1189.eqiad.wmnet with reason: Maintenance * 16:37 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1189.eqiad.wmnet with reason: Maintenance * 16:28 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 16:27 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2227.codfw.wmnet with reason: Maintenance * 16:27 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2227.codfw.wmnet with reason: Maintenance * 16:24 gmodena@deploy2002: Finished deploy [airflow-dags/analytics@d7c0f58]: webrequest_frontend post deployment fixes (duration: 02m 22s) * 16:24 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2088.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 16:24 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2088.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 16:23 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2087.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 16:23 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2087.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 16:23 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 16:22 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 16:22 gmodena@deploy2002: Started deploy [airflow-dags/analytics@d7c0f58]: webrequest_frontend post deployment fixes * 16:22 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2085.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 16:22 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2085.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 16:21 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2084.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 16:21 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2084.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 16:21 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2083.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 16:20 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2083.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 16:20 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 16:19 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 16:19 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2081.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 16:19 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2081.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 16:17 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2205.codfw.wmnet with reason: Maintenance * 16:17 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2205.codfw.wmnet with reason: Maintenance * 16:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2194.codfw.wmnet with reason: Maintenance * 16:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2194.codfw.wmnet with reason: Maintenance * 16:01 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 16:01 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 15:58 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2190.codfw.wmnet with reason: Maintenance * 15:58 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2190.codfw.wmnet with reason: Maintenance * 15:48 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2177.codfw.wmnet with reason: Maintenance * 15:48 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2177.codfw.wmnet with reason: Maintenance * 15:46 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host idp-test2004.wikimedia.org * 15:39 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2186.codfw.wmnet with reason: Maintenance * 15:39 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2186.codfw.wmnet with reason: Maintenance * 15:39 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2156.codfw.wmnet with reason: Maintenance * 15:39 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2156.codfw.wmnet with reason: Maintenance * 15:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test2004.wikimedia.org * 15:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test2005.wikimedia.org * 15:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test2005.wikimedia.org * 15:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P71371 and previous config saved to /var/cache/conftool/dbconfig/20241128-153202-ladsgroup.json * 15:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2149.codfw.wmnet with reason: Maintenance * 15:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2149.codfw.wmnet with reason: Maintenance * 15:27 gmodena@deploy2002: Finished deploy [analytics/refinery@ac87303] (hadoop-test): Gobblin config changes [analytics/refinery@ac873037] (duration: 00m 26s) * 15:26 gmodena@deploy2002: Started deploy [analytics/refinery@ac87303] (hadoop-test): Gobblin config changes [analytics/refinery@ac873037] * 15:25 gmodena@deploy2002: Finished deploy [analytics/refinery@ac87303] (thin): Gobblin config changes THIN [analytics/refinery@ac873037] (duration: 00m 30s) * 15:25 gmodena@deploy2002: Started deploy [analytics/refinery@ac87303] (thin): Gobblin config changes THIN [analytics/refinery@ac873037] * 15:21 moritzm: removing ganeti1018 from active Ganeti nodes [[phab:T378921|T378921]] * 15:20 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2139.codfw.wmnet with reason: Maintenance * 15:20 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/tegola-vector-tiles: sync * 15:20 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2139.codfw.wmnet with reason: Maintenance * 15:19 gmodena@deploy2002: Finished deploy [analytics/refinery@ac87303]: Gobblin config changes [analytics/refinery@ac873037] (duration: 03m 05s) * 15:19 elukey@deploy2002: helmfile [staging] START helmfile.d/services/tegola-vector-tiles: sync * 15:18 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance * 15:18 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance * 15:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2032', diff saved to https://phabricator.wikimedia.org/P71370 and previous config saved to /var/cache/conftool/dbconfig/20241128-151655-ladsgroup.json * 15:16 gmodena@deploy2002: Started deploy [analytics/refinery@ac87303]: Gobblin config changes [analytics/refinery@ac873037] * 15:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1018.eqiad.wmnet * 15:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1240.eqiad.wmnet with reason: Maintenance * 15:15 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1240.eqiad.wmnet with reason: Maintenance * 15:13 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1223.eqiad.wmnet with reason: Maintenance * 15:12 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1223.eqiad.wmnet with reason: Maintenance * 15:10 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 15:10 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 15:10 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1212.eqiad.wmnet with reason: Maintenance * 15:10 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1212.eqiad.wmnet with reason: Maintenance * 15:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1198.eqiad.wmnet with reason: Maintenance * 15:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1198.eqiad.wmnet with reason: Maintenance * 15:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1175.eqiad.wmnet with reason: Maintenance * 15:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1175.eqiad.wmnet with reason: Maintenance * 15:04 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1166.eqiad.wmnet with reason: Maintenance * 15:04 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1166.eqiad.wmnet with reason: Maintenance * 15:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1157.eqiad.wmnet with reason: Maintenance * 15:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1157.eqiad.wmnet with reason: Maintenance * 15:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2032', diff saved to https://phabricator.wikimedia.org/P71369 and previous config saved to /var/cache/conftool/dbconfig/20241128-150148-ladsgroup.json * 15:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1150.eqiad.wmnet with reason: Maintenance * 15:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1150.eqiad.wmnet with reason: Maintenance * 15:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2214.codfw.wmnet with reason: Maintenance * 14:59 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2214.codfw.wmnet with reason: Maintenance * 14:59 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1201.eqiad.wmnet with reason: Maintenance * 14:59 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1201.eqiad.wmnet with reason: Maintenance * 14:59 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2229.codfw.wmnet with reason: Maintenance * 14:58 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2229.codfw.wmnet with reason: Maintenance * 14:58 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2224.codfw.wmnet with reason: Maintenance * 14:58 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2224.codfw.wmnet with reason: Maintenance * 14:58 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2217.codfw.wmnet with reason: Maintenance * 14:57 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2217.codfw.wmnet with reason: Maintenance * 14:57 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2197.codfw.wmnet with reason: Maintenance * 14:57 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2197.codfw.wmnet with reason: Maintenance * 14:57 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2193.codfw.wmnet with reason: Maintenance * 14:57 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2193.codfw.wmnet with reason: Maintenance * 14:56 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2180.codfw.wmnet with reason: Maintenance * 14:56 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2180.codfw.wmnet with reason: Maintenance * 14:56 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2169.codfw.wmnet with reason: Maintenance * 14:56 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2169.codfw.wmnet with reason: Maintenance * 14:55 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2187.codfw.wmnet with reason: Maintenance * 14:55 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2187.codfw.wmnet with reason: Maintenance * 14:55 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2158.codfw.wmnet with reason: Maintenance * 14:55 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2158.codfw.wmnet with reason: Maintenance * 14:55 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2151.codfw.wmnet with reason: Maintenance * 14:54 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2151.codfw.wmnet with reason: Maintenance * 14:54 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance * 14:54 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance * 14:54 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1231.eqiad.wmnet with reason: Maintenance * {{safesubst:SAL entry|1=14:54 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1098623{{!}}Use `useformat` query param for device detection or mobile domain (m.) (T380646 T375788)]], [[gerrit:1098913{{!}}ReportIncident: Enable instrumentation on labs (T372823)]], [[gerrit:1098509{{!}}Enable message group subscription feature for some wikis (T372386)]], [[gerrit:1098622{{!}}Use `useformat` query param for device detection or mobile domain (m.)}} * 14:54 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1231.eqiad.wmnet with reason: Maintenance * 14:53 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1225.eqiad.wmnet with reason: Maintenance * 14:53 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1225.eqiad.wmnet with reason: Maintenance * 14:53 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1187.eqiad.wmnet with reason: Maintenance * 14:53 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1187.eqiad.wmnet with reason: Maintenance * 14:53 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1180.eqiad.wmnet with reason: Maintenance * 14:52 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1180.eqiad.wmnet with reason: Maintenance * 14:52 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1173.eqiad.wmnet with reason: Maintenance * 14:52 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1173.eqiad.wmnet with reason: Maintenance * 14:52 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1168.eqiad.wmnet with reason: Maintenance * 14:52 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1168.eqiad.wmnet with reason: Maintenance * 14:51 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance * 14:51 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance * 14:51 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1165.eqiad.wmnet with reason: Maintenance * 14:51 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1165.eqiad.wmnet with reason: Maintenance * 14:47 urbanecm@deploy2002: urbanecm, tgr, abi, mszabo: Continuing with sync * 14:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P71352 and previous config saved to /var/cache/conftool/dbconfig/20241128-144641-ladsgroup.json * 14:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es2032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P71351 and previous config saved to /var/cache/conftool/dbconfig/20241128-144039-ladsgroup.json * 14:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2032.codfw.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2032.codfw.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P71350 and previous config saved to /var/cache/conftool/dbconfig/20241128-144012-ladsgroup.json * 14:39 urbanecm: [urbanecm@deploy2002 ~]$ while read wiki; do echo "== $wiki"; mwscript-k8s extensions/Flow/maintenance/FlowMoveBoardsToSubpages.php -- --wiki=$wiki; done < wikis.txt # wikis.txt is at P71349 # [[phab:T378827|T378827]] * 14:36 urbanecm: [urbanecm@deploy2002 ~]$ mwscript-k8s -f extensions/Flow/maintenance/FlowMoveBoardsToSubpages.php -- --wiki=bswiki # [[phab:T378827|T378827]] * 14:33 moritzm: installing node-es-module-lexer updates from Bookworm point release * {{safesubst:SAL entry|1=14:28 urbanecm@deploy2002: urbanecm, tgr, abi, mszabo: Backport for [[gerrit:1098623{{!}}Use `useformat` query param for device detection or mobile domain (m.) (T380646 T375788)]], [[gerrit:1098913{{!}}ReportIncident: Enable instrumentation on labs (T372823)]], [[gerrit:1098509{{!}}Enable message group subscription feature for some wikis (T372386)]], [[gerrit:1098622{{!}}Use `useformat` query param for device detection or mobile domain (m.}} * 14:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2028', diff saved to https://phabricator.wikimedia.org/P71347 and previous config saved to /var/cache/conftool/dbconfig/20241128-142505-ladsgroup.json * 14:25 Dreamy_Jazz: Started MediaModeration scanning scripts to run again over all wikis * {{safesubst:SAL entry|1=14:23 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1098623{{!}}Use `useformat` query param for device detection or mobile domain (m.) (T380646 T375788)]], [[gerrit:1098913{{!}}ReportIncident: Enable instrumentation on labs (T372823)]], [[gerrit:1098509{{!}}Enable message group subscription feature for some wikis (T372386)]], [[gerrit:1098622{{!}}Use `useformat` query param for device detection or mobile domain (m.) (}} * 14:22 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1098561{{!}}Allow IRS to record server-side interaction events (T380599)]], [[gerrit:1098939{{!}}Revert^2 "Add contact form for U4C"]] (duration: 14m 07s) * 14:22 Dreamy_Jazz: Restarted MediaModeration scanning script - https://wikitech.wikimedia.org/wiki/MediaModeration * 14:15 urbanecm@deploy2002: nmw03, mszabo, urbanecm: Continuing with sync * 14:14 moritzm: installing apr security updates * 14:14 urbanecm@deploy2002: nmw03, mszabo, urbanecm: Backport for [[gerrit:1098561{{!}}Allow IRS to record server-side interaction events (T380599)]], [[gerrit:1098939{{!}}Revert^2 "Add contact form for U4C"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2028', diff saved to https://phabricator.wikimedia.org/P71346 and previous config saved to /var/cache/conftool/dbconfig/20241128-140958-ladsgroup.json * 14:08 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1098561{{!}}Allow IRS to record server-side interaction events (T380599)]], [[gerrit:1098939{{!}}Revert^2 "Add contact form for U4C"]] * 14:06 klausman@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 13:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P71345 and previous config saved to /var/cache/conftool/dbconfig/20241128-135451-ladsgroup.json * 13:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es2028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P71344 and previous config saved to /var/cache/conftool/dbconfig/20241128-134859-ladsgroup.json * 13:48 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2028.codfw.wmnet with reason: Maintenance * 13:48 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2028.codfw.wmnet with reason: Maintenance * 12:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P71343 and previous config saved to /var/cache/conftool/dbconfig/20241128-124957-ladsgroup.json * 12:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2031', diff saved to https://phabricator.wikimedia.org/P71342 and previous config saved to /var/cache/conftool/dbconfig/20241128-123451-ladsgroup.json * 12:23 klausman@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 12:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2031', diff saved to https://phabricator.wikimedia.org/P71340 and previous config saved to /var/cache/conftool/dbconfig/20241128-121943-ladsgroup.json * 12:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P71339 and previous config saved to /var/cache/conftool/dbconfig/20241128-120437-ladsgroup.json * 12:04 ladsgroup@deploy2002: Finished scap sync-world: Backport for [[gerrit:1098914{{!}}Bump ratio of new parsercache key spec to 2 (T373037)]] (duration: 12m 37s) * 12:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71338 and previous config saved to /var/cache/conftool/dbconfig/20241128-120031-ladsgroup.json * 11:57 ladsgroup@deploy2002: ladsgroup: Continuing with sync * 11:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es2031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P71337 and previous config saved to /var/cache/conftool/dbconfig/20241128-115741-ladsgroup.json * 11:57 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2031.codfw.wmnet with reason: Maintenance * 11:57 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2031.codfw.wmnet with reason: Maintenance * 11:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P71336 and previous config saved to /var/cache/conftool/dbconfig/20241128-115715-ladsgroup.json * 11:57 ladsgroup@deploy2002: ladsgroup: Backport for [[gerrit:1098914{{!}}Bump ratio of new parsercache key spec to 2 (T373037)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 11:51 ladsgroup@deploy2002: Started scap sync-world: Backport for [[gerrit:1098914{{!}}Bump ratio of new parsercache key spec to 2 (T373037)]] * 11:48 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2237 gradually with 4 steps - Maint over ([[phab:T379813|T379813]]) * 11:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236', diff saved to https://phabricator.wikimedia.org/P71334 and previous config saved to /var/cache/conftool/dbconfig/20241128-114524-ladsgroup.json * 11:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2033', diff saved to https://phabricator.wikimedia.org/P71333 and previous config saved to /var/cache/conftool/dbconfig/20241128-114208-ladsgroup.json * 11:31 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1018.eqiad.wmnet * 11:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236', diff saved to https://phabricator.wikimedia.org/P71330 and previous config saved to /var/cache/conftool/dbconfig/20241128-113017-ladsgroup.json * 11:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2033', diff saved to https://phabricator.wikimedia.org/P71329 and previous config saved to /var/cache/conftool/dbconfig/20241128-112701-ladsgroup.json * 11:21 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1018.eqiad.wmnet * 11:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71327 and previous config saved to /var/cache/conftool/dbconfig/20241128-111510-ladsgroup.json * 11:14 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1018.eqiad.wmnet * 11:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1236 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71326 and previous config saved to /var/cache/conftool/dbconfig/20241128-111300-ladsgroup.json * 11:12 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1236.eqiad.wmnet with reason: Maintenance * 11:12 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1236.eqiad.wmnet with reason: Maintenance * 11:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P71325 and previous config saved to /var/cache/conftool/dbconfig/20241128-111154-ladsgroup.json * 11:11 moritzm: removing ganeti1022 from active Ganeti nodes [[phab:T378921|T378921]] * 11:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1022.eqiad.wmnet * 11:10 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1201.eqiad.wmnet with reason: Maintenance * 11:10 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1201.eqiad.wmnet with reason: Maintenance * 11:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2204.codfw.wmnet with reason: Maintenance * 11:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2204.codfw.wmnet with reason: Maintenance * 11:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es2033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P71324 and previous config saved to /var/cache/conftool/dbconfig/20241128-110457-ladsgroup.json * 11:04 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2033.codfw.wmnet with reason: Maintenance * 11:04 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2033.codfw.wmnet with reason: Maintenance * 11:03 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool db2237 gradually with 4 steps - Maint over ([[phab:T379813|T379813]]) * 10:51 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Fix commit bug - oblivian@cumin1002" * 10:51 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix commit bug - oblivian@cumin1002 * 10:51 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix commit bug - oblivian@cumin1002 * 10:51 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Fix commit bug - oblivian@cumin1002" * 10:32 isaranto@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 10:27 isaranto@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 09:36 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@34b35a5] (releasing): Update Jenkins version on releases1003.eqiad.wmnet (duration: 01m 22s) * 09:35 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@34b35a5] (releasing): Update Jenkins version on releases1003.eqiad.wmnet * 09:31 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@34b35a5] (releasing): Update Jenkins version on releases2003.codfw.wmnet (duration: 01m 27s) * 09:30 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@34b35a5] (releasing): Update Jenkins version on releases2003.codfw.wmnet * 09:23 isaranto@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 09:22 hashar@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.5 refs [[phab:T375664|T375664]] * 09:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2200.codfw.wmnet with reason: Maintenance * 09:13 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2200.codfw.wmnet with reason: Maintenance * 09:09 isaranto@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 09:06 isaranto@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 09:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2198.codfw.wmnet with reason: Maintenance * 09:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2198.codfw.wmnet with reason: Maintenance * 09:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71319 and previous config saved to /var/cache/conftool/dbconfig/20241128-090035-ladsgroup.json * 08:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P71318 and previous config saved to /var/cache/conftool/dbconfig/20241128-084528-ladsgroup.json * 08:43 isaranto@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 08:41 isaranto@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 08:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P71317 and previous config saved to /var/cache/conftool/dbconfig/20241128-083021-ladsgroup.json * 08:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71316 and previous config saved to /var/cache/conftool/dbconfig/20241128-081514-ladsgroup.json * 08:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2195 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71315 and previous config saved to /var/cache/conftool/dbconfig/20241128-080244-ladsgroup.json * 08:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2195.codfw.wmnet with reason: Maintenance * 08:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2195.codfw.wmnet with reason: Maintenance * 08:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71314 and previous config saved to /var/cache/conftool/dbconfig/20241128-080221-ladsgroup.json * 07:56 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1022.eqiad.wmnet * 07:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P71313 and previous config saved to /var/cache/conftool/dbconfig/20241128-074714-ladsgroup.json * 07:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P71312 and previous config saved to /var/cache/conftool/dbconfig/20241128-073207-ladsgroup.json * 07:23 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "CSRF token support - oblivian@cumin1002" * 07:23 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: CSRF token support - oblivian@cumin1002 * 07:23 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: CSRF token support - oblivian@cumin1002 * 07:22 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "CSRF token support - oblivian@cumin1002" * 07:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71310 and previous config saved to /var/cache/conftool/dbconfig/20241128-071700-ladsgroup.json * 07:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2181 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71309 and previous config saved to /var/cache/conftool/dbconfig/20241128-070231-ladsgroup.json * 07:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2181.codfw.wmnet with reason: Maintenance * 07:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2181.codfw.wmnet with reason: Maintenance * 07:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71308 and previous config saved to /var/cache/conftool/dbconfig/20241128-070209-ladsgroup.json * 07:02 kartik@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 06:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P71307 and previous config saved to /var/cache/conftool/dbconfig/20241128-064702-ladsgroup.json * 06:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P71306 and previous config saved to /var/cache/conftool/dbconfig/20241128-063155-ladsgroup.json * 06:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71305 and previous config saved to /var/cache/conftool/dbconfig/20241128-061647-ladsgroup.json * 06:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2167 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71304 and previous config saved to /var/cache/conftool/dbconfig/20241128-060418-ladsgroup.json * 06:04 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2167.codfw.wmnet with reason: Maintenance * 06:04 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2167.codfw.wmnet with reason: Maintenance * 06:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71303 and previous config saved to /var/cache/conftool/dbconfig/20241128-060355-ladsgroup.json * 05:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P71302 and previous config saved to /var/cache/conftool/dbconfig/20241128-054847-ladsgroup.json * 05:48 kartik@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 05:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P71301 and previous config saved to /var/cache/conftool/dbconfig/20241128-053340-ladsgroup.json * 05:29 tstarling@deploy2002: Finished scap sync-world: Backport for [[gerrit:1098652{{!}}Add frwiki on labs for new addWiki.php test]] (duration: 13m 41s) * 05:23 tstarling@deploy2002: tstarling: Continuing with sync * 05:22 tstarling@deploy2002: tstarling: Backport for [[gerrit:1098652{{!}}Add frwiki on labs for new addWiki.php test]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 05:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71300 and previous config saved to /var/cache/conftool/dbconfig/20241128-051833-ladsgroup.json * 05:16 tstarling@deploy2002: Started scap sync-world: Backport for [[gerrit:1098652{{!}}Add frwiki on labs for new addWiki.php test]] * 05:06 kartik@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 05:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2166 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71299 and previous config saved to /var/cache/conftool/dbconfig/20241128-050352-ladsgroup.json * 05:03 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2166.codfw.wmnet with reason: Maintenance * 05:03 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2166.codfw.wmnet with reason: Maintenance * 05:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71298 and previous config saved to /var/cache/conftool/dbconfig/20241128-050329-ladsgroup.json * 04:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P71297 and previous config saved to /var/cache/conftool/dbconfig/20241128-044822-ladsgroup.json * 04:41 eileen: civicrm upgraded from {{Gerrit|ed67a1b2}} to {{Gerrit|be7e5d33}} * 04:36 eileen: * civicrm upgraded from {{Gerrit|40f4f1a3}} to {{Gerrit|ed67a1b2}} * 04:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P71296 and previous config saved to /var/cache/conftool/dbconfig/20241128-043314-ladsgroup.json * 04:26 eileen: * civicrm upgraded from {{Gerrit|7ade5fd7}} to {{Gerrit|40f4f1a3}} * 04:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71294 and previous config saved to /var/cache/conftool/dbconfig/20241128-041807-ladsgroup.json * 04:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2164 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71292 and previous config saved to /var/cache/conftool/dbconfig/20241128-040326-ladsgroup.json * 04:03 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db2186.codfw.wmnet with reason: Maintenance * 04:03 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 16:00:00 on db2186.codfw.wmnet with reason: Maintenance * 04:03 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2164.codfw.wmnet with reason: Maintenance * 04:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2164.codfw.wmnet with reason: Maintenance * 04:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71291 and previous config saved to /var/cache/conftool/dbconfig/20241128-040248-ladsgroup.json * 03:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P71290 and previous config saved to /var/cache/conftool/dbconfig/20241128-034741-ladsgroup.json * 03:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P71289 and previous config saved to /var/cache/conftool/dbconfig/20241128-033234-ladsgroup.json * 03:22 eileen: config revision changed from {{Gerrit|f284fd46}} to {{Gerrit|a3175f86}} (like for real this time) * 03:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71288 and previous config saved to /var/cache/conftool/dbconfig/20241128-031726-ladsgroup.json * 03:14 eileen: onfig revision changed from {{Gerrit|f284fd46}} to {{Gerrit|a3175f86}} * 03:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2163 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71287 and previous config saved to /var/cache/conftool/dbconfig/20241128-030213-ladsgroup.json * 03:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2163.codfw.wmnet with reason: Maintenance * 03:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2163.codfw.wmnet with reason: Maintenance * 03:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71286 and previous config saved to /var/cache/conftool/dbconfig/20241128-030151-ladsgroup.json * 02:53 eileen: civicrm upgraded from {{Gerrit|c8c461b9}} to {{Gerrit|7ade5fd7}} * 02:46 eileen: * civicrm upgraded from {{Gerrit|80f03357}} to {{Gerrit|c8c461b9}} * 02:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P71285 and previous config saved to /var/cache/conftool/dbconfig/20241128-024644-ladsgroup.json * 02:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P71284 and previous config saved to /var/cache/conftool/dbconfig/20241128-023136-ladsgroup.json * 02:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71283 and previous config saved to /var/cache/conftool/dbconfig/20241128-021629-ladsgroup.json * 02:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2162 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71282 and previous config saved to /var/cache/conftool/dbconfig/20241128-020143-ladsgroup.json * 02:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2162.codfw.wmnet with reason: Maintenance * 02:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2162.codfw.wmnet with reason: Maintenance * 02:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71281 and previous config saved to /var/cache/conftool/dbconfig/20241128-020120-ladsgroup.json * 01:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P71280 and previous config saved to /var/cache/conftool/dbconfig/20241128-014613-ladsgroup.json * 01:38 eileen: civicrm upgraded from {{Gerrit|3b1ed162}} to {{Gerrit|80f03357}} * 01:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P71279 and previous config saved to /var/cache/conftool/dbconfig/20241128-013106-ladsgroup.json * 01:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71278 and previous config saved to /var/cache/conftool/dbconfig/20241128-011559-ladsgroup.json * 01:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2161 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71277 and previous config saved to /var/cache/conftool/dbconfig/20241128-010112-ladsgroup.json * 01:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2161.codfw.wmnet with reason: Maintenance * 01:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2161.codfw.wmnet with reason: Maintenance * 01:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71276 and previous config saved to /var/cache/conftool/dbconfig/20241128-010049-ladsgroup.json * 00:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P71275 and previous config saved to /var/cache/conftool/dbconfig/20241128-004542-ladsgroup.json * 00:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P71274 and previous config saved to /var/cache/conftool/dbconfig/20241128-003035-ladsgroup.json * 00:16 tstarling@deploy2002: Finished scap sync-world: Backport for [[gerrit:1094126{{!}}Move default main page text for new wikis to config (T352113)]], [[gerrit:1096839{{!}}Introduce preinstall.dblist for wikis that haven't been installed yet (T352113)]] (duration: 14m 42s) * 00:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71273 and previous config saved to /var/cache/conftool/dbconfig/20241128-001528-ladsgroup.json * 00:09 tstarling@deploy2002: tstarling: Continuing with sync * 00:07 tstarling@deploy2002: tstarling: Backport for [[gerrit:1094126{{!}}Move default main page text for new wikis to config (T352113)]], [[gerrit:1096839{{!}}Introduce preinstall.dblist for wikis that haven't been installed yet (T352113)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 00:01 tstarling@deploy2002: Started scap sync-world: Backport for [[gerrit:1094126{{!}}Move default main page text for new wikis to config (T352113)]], [[gerrit:1096839{{!}}Introduce preinstall.dblist for wikis that haven't been installed yet (T352113)]] * 00:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2154 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71272 and previous config saved to /var/cache/conftool/dbconfig/20241128-000046-ladsgroup.json * 00:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2154.codfw.wmnet with reason: Maintenance * 00:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2154.codfw.wmnet with reason: Maintenance * 00:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71271 and previous config saved to /var/cache/conftool/dbconfig/20241128-000023-ladsgroup.json == 2024-11-27 == * 23:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P71270 and previous config saved to /var/cache/conftool/dbconfig/20241127-234518-ladsgroup.json * 23:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P71269 and previous config saved to /var/cache/conftool/dbconfig/20241127-233011-ladsgroup.json * 23:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71267 and previous config saved to /var/cache/conftool/dbconfig/20241127-231504-ladsgroup.json * 23:09 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1098633{{!}}Fix mobile domain logic for login.wikimedia.org (T380646)]] (duration: 18m 07s) * 23:02 tgr@deploy2002: tgr: Continuing with sync * 23:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2152 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71264 and previous config saved to /var/cache/conftool/dbconfig/20241127-230159-ladsgroup.json * 23:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2152.codfw.wmnet with reason: Maintenance * 23:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2152.codfw.wmnet with reason: Maintenance * 22:56 tgr@deploy2002: tgr: Backport for [[gerrit:1098633{{!}}Fix mobile domain logic for login.wikimedia.org (T380646)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 22:52 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance * 22:52 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance * 22:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71263 and previous config saved to /var/cache/conftool/dbconfig/20241127-225159-ladsgroup.json * 22:51 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1098633{{!}}Fix mobile domain logic for login.wikimedia.org (T380646)]] * 22:46 cjming: end of UTC late backport window * 22:44 cjming@deploy2002: Finished scap sync-world: Backport for [[gerrit:1098572{{!}}Turn on Parsoid Read views on jawikivoyage (T380769)]] (duration: 15m 22s) * 22:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P71262 and previous config saved to /var/cache/conftool/dbconfig/20241127-223652-ladsgroup.json * 22:35 cjming@deploy2002: cscott, cjming: Continuing with sync * 22:35 cjming@deploy2002: cscott, cjming: Backport for [[gerrit:1098572{{!}}Turn on Parsoid Read views on jawikivoyage (T380769)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 22:29 cjming@deploy2002: Started scap sync-world: Backport for [[gerrit:1098572{{!}}Turn on Parsoid Read views on jawikivoyage (T380769)]] * 22:27 cjming@deploy2002: Finished scap sync-world: Backport for [[gerrit:1098581{{!}}Bump wikimedia/parsoid to 0.21.0-a9 (T373035 T380664)]], [[gerrit:1098583{{!}}Bump wikimedia/parsoid to 0.21.0-a9 (T380664)]] (duration: 42m 38s) * 22:26 bking@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - bking@cumin1002" * 22:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P71261 and previous config saved to /var/cache/conftool/dbconfig/20241127-222145-ladsgroup.json * 22:13 cjming@deploy2002: arlolra, cjming: Continuing with sync * 22:11 bking@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:09 cjming@deploy2002: arlolra, cjming: Backport for [[gerrit:1098581{{!}}Bump wikimedia/parsoid to 0.21.0-a9 (T373035 T380664)]], [[gerrit:1098583{{!}}Bump wikimedia/parsoid to 0.21.0-a9 (T380664)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 22:07 bking@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71260 and previous config saved to /var/cache/conftool/dbconfig/20241127-220638-ladsgroup.json * 21:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1226 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71259 and previous config saved to /var/cache/conftool/dbconfig/20241127-215407-ladsgroup.json * 21:54 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:53 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:45 cjming@deploy2002: Started scap sync-world: Backport for [[gerrit:1098581{{!}}Bump wikimedia/parsoid to 0.21.0-a9 (T373035 T380664)]], [[gerrit:1098583{{!}}Bump wikimedia/parsoid to 0.21.0-a9 (T380664)]] * 21:43 cjming@deploy2002: Finished scap sync-world: Backport for [[gerrit:1098567{{!}}Revert "Normalize ref html before comparison" (T380977)]] (duration: 12m 49s) * 21:40 bking@cumin1002: START - Cookbook sre.hosts.reimage for host wdqs1027.eqiad.wmnet with OS bullseye * 21:38 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:38 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71258 and previous config saved to /var/cache/conftool/dbconfig/20241127-213759-ladsgroup.json * 21:37 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1026.eqiad.wmnet with OS bullseye * 21:37 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - bking@cumin2002" * 21:37 cjming@deploy2002: cjming, cscott: Continuing with sync * 21:37 cjming@deploy2002: cjming, cscott: Backport for [[gerrit:1098567{{!}}Revert "Normalize ref html before comparison" (T380977)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:31 cjming@deploy2002: Started scap sync-world: Backport for [[gerrit:1098567{{!}}Revert "Normalize ref html before comparison" (T380977)]] * 21:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P71257 and previous config saved to /var/cache/conftool/dbconfig/20241127-212252-ladsgroup.json * 21:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P71256 and previous config saved to /var/cache/conftool/dbconfig/20241127-211704-ladsgroup.json * 21:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P71255 and previous config saved to /var/cache/conftool/dbconfig/20241127-210745-ladsgroup.json * 21:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2029', diff saved to https://phabricator.wikimedia.org/P71254 and previous config saved to /var/cache/conftool/dbconfig/20241127-210157-ladsgroup.json * 20:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71253 and previous config saved to /var/cache/conftool/dbconfig/20241127-205238-ladsgroup.json * 20:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2029', diff saved to https://phabricator.wikimedia.org/P71252 and previous config saved to /var/cache/conftool/dbconfig/20241127-204650-ladsgroup.json * 20:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2237.codfw.wmnet with reason: Optimize ([[phab:T379813|T379813]]) * 20:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2237.codfw.wmnet with reason: Optimize ([[phab:T379813|T379813]]) * 20:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2237 depool ([[phab:T379813|T379813]])', diff saved to https://phabricator.wikimedia.org/P71251 and previous config saved to /var/cache/conftool/dbconfig/20241127-204450-ladsgroup.json * 20:38 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - bking@cumin2002" * 20:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1214 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71250 and previous config saved to /var/cache/conftool/dbconfig/20241127-203724-ladsgroup.json * 20:37 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:36 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71249 and previous config saved to /var/cache/conftool/dbconfig/20241127-203650-ladsgroup.json * 20:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P71248 and previous config saved to /var/cache/conftool/dbconfig/20241127-203143-ladsgroup.json * 20:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es2029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P71247 and previous config saved to /var/cache/conftool/dbconfig/20241127-202446-ladsgroup.json * 20:24 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2029.codfw.wmnet with reason: Maintenance * 20:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2029.codfw.wmnet with reason: Maintenance * 20:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2034 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P71246 and previous config saved to /var/cache/conftool/dbconfig/20241127-202420-ladsgroup.json * 20:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P71245 and previous config saved to /var/cache/conftool/dbconfig/20241127-202143-ladsgroup.json * 20:20 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1026.eqiad.wmnet with reason: host reimage * 20:18 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1026.eqiad.wmnet with reason: host reimage * 20:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2034', diff saved to https://phabricator.wikimedia.org/P71244 and previous config saved to /var/cache/conftool/dbconfig/20241127-200913-ladsgroup.json * 20:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P71243 and previous config saved to /var/cache/conftool/dbconfig/20241127-200636-ladsgroup.json * 19:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2034', diff saved to https://phabricator.wikimedia.org/P71242 and previous config saved to /var/cache/conftool/dbconfig/20241127-195406-ladsgroup.json * 19:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71241 and previous config saved to /var/cache/conftool/dbconfig/20241127-195129-ladsgroup.json * 19:50 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1026.eqiad.wmnet with OS bullseye * 19:50 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host wdqs1025.eqiad.wmnet with OS bullseye * 19:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2034 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P71240 and previous config saved to /var/cache/conftool/dbconfig/20241127-193858-ladsgroup.json * 19:36 moritzm: imported jenkins 2.479.2 to thirdparty/ci for bullseye-wikimedia * 19:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1211 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71239 and previous config saved to /var/cache/conftool/dbconfig/20241127-193529-ladsgroup.json * 19:35 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71238 and previous config saved to /var/cache/conftool/dbconfig/20241127-193507-ladsgroup.json * 19:34 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1025.eqiad.wmnet with OS bullseye * 19:32 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host wdqs1025.eqiad.wmnet with OS bullseye * 19:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es2034 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P71237 and previous config saved to /var/cache/conftool/dbconfig/20241127-193202-ladsgroup.json * 19:31 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2034.codfw.wmnet with reason: Maintenance * 19:31 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2034.codfw.wmnet with reason: Maintenance * 19:25 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1027.eqiad.wmnet with OS bullseye * 19:24 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1025.eqiad.wmnet with OS bullseye * 19:23 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wdqs1025.eqiad.wmnet with OS bullseye * 19:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P71236 and previous config saved to /var/cache/conftool/dbconfig/20241127-192000-ladsgroup.json * 19:18 brett@cumin2002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site magru [reason: repool magru, [[phab:T376737|T376737]]] * 19:18 brett@cumin2002: START - Cookbook sre.dns.admin DNS admin: pool site magru [reason: repool magru, [[phab:T376737|T376737]]] * 19:17 mforns@deploy2002: Finished deploy [airflow-dags/analytics@99032bf]: regular weekly train (duration: 03m 10s) * 19:14 mforns@deploy2002: Started deploy [airflow-dags/analytics@99032bf]: regular weekly train * 19:13 mutante: disabled puppet on R:scap::target (180 hosts) for a short time - deploying gerrit:1092841 * 19:09 brett@puppetserver1001: conftool action : set/pooled=yes; selector: dc=magru,service=cdn * 19:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P71235 and previous config saved to /var/cache/conftool/dbconfig/20241127-190453-ladsgroup.json * 19:02 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1025.eqiad.wmnet with OS bullseye * 18:56 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wdqs1025.eqiad.wmnet with OS bullseye * 18:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71233 and previous config saved to /var/cache/conftool/dbconfig/20241127-184946-ladsgroup.json * 18:47 fabfur@cumin1002: conftool action : set/pooled=yes; selector: cluster=dnsbox,dc=magru * 18:38 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 16 hosts * 18:38 fabfur@cumin1002: START - Cookbook sre.hosts.remove-downtime for 16 hosts * 18:38 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs7003.magru.wmnet * 18:38 fabfur@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs7003.magru.wmnet * 18:38 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs7002.magru.wmnet * 18:38 fabfur@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs7002.magru.wmnet * 18:38 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs7001.magru.wmnet * 18:38 fabfur@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs7001.magru.wmnet * 18:38 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for dns7002.wikimedia.org * 18:38 fabfur@cumin1002: START - Cookbook sre.hosts.remove-downtime for dns7002.wikimedia.org * 18:37 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for dns7001.wikimedia.org * 18:37 fabfur@cumin1002: START - Cookbook sre.hosts.remove-downtime for dns7001.wikimedia.org * 18:37 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dns7001.wikimedia.org with reason: [[phab:T380307|T380307]] * 18:37 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on dns7001.wikimedia.org with reason: [[phab:T380307|T380307]] * 18:36 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1025.eqiad.wmnet with OS bullseye * 18:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1209 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71232 and previous config saved to /var/cache/conftool/dbconfig/20241127-183455-ladsgroup.json * 18:34 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:34 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71231 and previous config saved to /var/cache/conftool/dbconfig/20241127-183432-ladsgroup.json * 18:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P71230 and previous config saved to /var/cache/conftool/dbconfig/20241127-181925-ladsgroup.json * 18:05 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wdqs1027.eqiad.wmnet with OS bullseye * 18:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P71229 and previous config saved to /var/cache/conftool/dbconfig/20241127-180418-ladsgroup.json * 17:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71228 and previous config saved to /var/cache/conftool/dbconfig/20241127-174911-ladsgroup.json * 17:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1203 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71227 and previous config saved to /var/cache/conftool/dbconfig/20241127-173426-ladsgroup.json * 17:34 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:34 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71226 and previous config saved to /var/cache/conftool/dbconfig/20241127-173403-ladsgroup.json * 17:33 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply * 17:33 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply * 17:33 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventstreams: apply * 17:32 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply * 17:32 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply * 17:32 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply * 17:32 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/eventstreams: apply * 17:31 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply * 17:31 jiji@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-main: apply * 17:31 jiji@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams: apply * 17:28 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply * 17:27 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply * 17:27 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply * 17:27 jiji@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply * 17:25 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:24 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 17:24 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:23 jiji@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 17:20 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/benthos-cache-invalidator: apply * 17:19 jiji@deploy2002: helmfile [staging] START helmfile.d/services/benthos-cache-invalidator: apply * 17:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P71225 and previous config saved to /var/cache/conftool/dbconfig/20241127-171857-ladsgroup.json * 17:17 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7010.magru.wmnet with OS bullseye * 17:16 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/benthos-cache-invalidator: apply * 17:16 jiji@deploy2002: helmfile [staging] START helmfile.d/services/benthos-cache-invalidator: apply * 17:14 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kafka-main1007.eqiad.wmnet * 17:14 jiji@cumin1002: START - Cookbook sre.hosts.remove-downtime for kafka-main1007.eqiad.wmnet * 17:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P71224 and previous config saved to /var/cache/conftool/dbconfig/20241127-170350-ladsgroup.json * 16:56 jiji@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 16:55 jiji@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'. * 16:55 jiji@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 16:55 jiji@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 16:55 jiji@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'. * 16:54 jiji@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'. * 16:54 jiji@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 16:54 jiji@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 16:54 jiji@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 16:53 jiji@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 16:53 jiji@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 16:53 jiji@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 16:53 jiji@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 16:52 jiji@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 16:52 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 16:52 jiji@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'. * 16:52 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 16:52 jiji@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'. * 16:51 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7010.magru.wmnet with reason: host reimage * 16:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71222 and previous config saved to /var/cache/conftool/dbconfig/20241127-164843-ladsgroup.json * 16:47 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7010.magru.wmnet with reason: host reimage * 16:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1192 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71221 and previous config saved to /var/cache/conftool/dbconfig/20241127-163407-ladsgroup.json * 16:34 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:33 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71220 and previous config saved to /var/cache/conftool/dbconfig/20241127-163344-ladsgroup.json * 16:27 jiji@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-main-eqiad * 16:26 fabfur@cumin1002: START - Cookbook sre.hosts.reimage for host cp7010.magru.wmnet with OS bullseye * 16:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P71218 and previous config saved to /var/cache/conftool/dbconfig/20241127-161837-ladsgroup.json * 16:16 jiji@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-main-eqiad * 16:12 effie: roll restarting kafka-main brokers - [[phab:T363214|T363214]] * 16:11 moritzm: installing distro-info-data updates from bookworm point release * 16:11 fabfur@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:11 fabfur@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp70101 - fabfur@cumin1002" * 16:11 fabfur@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp70101 - fabfur@cumin1002" * 16:05 fabfur@cumin1002: START - Cookbook sre.dns.netbox * 16:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P71217 and previous config saved to /var/cache/conftool/dbconfig/20241127-160330-ladsgroup.json * 15:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71216 and previous config saved to /var/cache/conftool/dbconfig/20241127-154823-ladsgroup.json * 15:41 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1027.eqiad.wmnet with OS bullseye * 15:41 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1026.eqiad.wmnet with OS bullseye * 15:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71215 and previous config saved to /var/cache/conftool/dbconfig/20241127-153316-ladsgroup.json * 15:33 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:32 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:32 ecarg@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 15:31 ecarg@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 15:30 ecarg@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 15:30 ecarg@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 15:28 ecarg@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 15:27 ecarg@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 15:22 ecarg@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 15:22 ecarg@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 15:21 ecarg@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 15:20 ecarg@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 15:09 ecarg@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 15:08 ecarg@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 15:08 Krinkle: krinkle@webperf2003: `sudo apt-get install kafkacat` (matching webperf1003, for ad-hoc debugging) * 15:05 kart_: Updated recommendation-api to 2024-11-27-142924-production ([[phab:T380838|T380838]], [[phab:T379036|T379036]], [[phab:T380699|T380699]]) * 15:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1003.eqiad.wmnet to plain * 15:03 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1003.eqiad.wmnet to plain * 15:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1022.eqiad.wmnet * 15:02 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1022.eqiad.wmnet * 15:01 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1003.eqiad.wmnet to drbd * 14:59 kartik@deploy2002: helmfile [ml-serve-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 14:58 kartik@deploy2002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 14:51 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1003.eqiad.wmnet to drbd * 14:48 kartik@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 14:43 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1022.eqiad.wmnet * 14:39 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1022.eqiad.wmnet * 14:35 moritzm: rebalance magru01 following switch of VMs back to DRBD [[phab:T376737|T376737]] * 14:33 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on doh[7001-7002].wikimedia.org with reason: site is depooled, maintenance * 14:33 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on doh[7001-7002].wikimedia.org with reason: site is depooled, maintenance * 14:33 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1097309{{!}}[GrowthExperiments] Undefine wgGEDatabaseCluster (T354939)]] (duration: 12m 21s) * 14:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of doh7001.wikimedia.org to drbd * 14:26 urbanecm@deploy2002: urbanecm: Continuing with sync * 14:26 urbanecm@deploy2002: urbanecm: Backport for [[gerrit:1097309{{!}}[GrowthExperiments] Undefine wgGEDatabaseCluster (T354939)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:25 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on cloudvirt1061.eqiad.wmnet with reason: cloudvirt1061 needs maintenance [[phab:T380673|T380673]] * 14:25 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on cloudvirt1061.eqiad.wmnet with reason: cloudvirt1061 needs maintenance [[phab:T380673|T380673]] * 14:24 urbanecm: Purge https://en.wikipedia.org/static/images/mobile/copyright/wikiquote-wordmark-az.svg ([[phab:T380974|T380974]]) * 14:21 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wdqs1027.eqiad.wmnet with OS bullseye * 14:21 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wdqs1026.eqiad.wmnet with OS bullseye * 14:21 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1097309{{!}}[GrowthExperiments] Undefine wgGEDatabaseCluster (T354939)]] * 14:20 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1098076{{!}}Enable ParserMigration compact indicator on all wikis (T363484)]], [[gerrit:1093405{{!}}Deploy Parsoid Read Views to de/ru wikivoyage and dagwiki (T375394 T380401)]], [[gerrit:1098019{{!}}Updated wordmark for Azerbaijani Wikiquote (T380974)]] (duration: 17m 20s) * 14:20 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of doh7001.wikimedia.org to drbd * 14:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of durum7001.magru.wmnet to drbd * 14:13 urbanecm@deploy2002: urbanecm, cscott, nmw03: Continuing with sync * 14:09 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of durum7001.magru.wmnet to drbd * 14:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir7001.magru.wmnet to drbd * 14:08 urbanecm@deploy2002: urbanecm, cscott, nmw03: Backport for [[gerrit:1098076{{!}}Enable ParserMigration compact indicator on all wikis (T363484)]], [[gerrit:1093405{{!}}Deploy Parsoid Read Views to de/ru wikivoyage and dagwiki (T375394 T380401)]], [[gerrit:1098019{{!}}Updated wordmark for Azerbaijani Wikiquote (T380974)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:03 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1098076{{!}}Enable ParserMigration compact indicator on all wikis (T363484)]], [[gerrit:1093405{{!}}Deploy Parsoid Read Views to de/ru wikivoyage and dagwiki (T375394 T380401)]], [[gerrit:1098019{{!}}Updated wordmark for Azerbaijani Wikiquote (T380974)]] * 13:59 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir7001.magru.wmnet to drbd * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of install7001.wikimedia.org to drbd * 13:45 moritzm: installing php8.2 security updates * 13:40 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of install7001.wikimedia.org to drbd * 13:39 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of netflow7001.magru.wmnet to drbd * 13:38 mszabo@deploy2002: Finished scap sync-world: Backport for [[gerrit:1098506{{!}}private: Add stub for wgReportIncidentZendeskSubjectLine (T380868)]], [[gerrit:1098480{{!}}Configure IRS Zendesk integration (T380908)]], [[gerrit:1093389{{!}}Configure instrument for the Incident Reporting System (T372823)]] (duration: 13m 53s) * 13:31 mszabo@deploy2002: mszabo: Continuing with sync * 13:30 mszabo@deploy2002: mszabo: Backport for [[gerrit:1098506{{!}}private: Add stub for wgReportIncidentZendeskSubjectLine (T380868)]], [[gerrit:1098480{{!}}Configure IRS Zendesk integration (T380908)]], [[gerrit:1093389{{!}}Configure instrument for the Incident Reporting System (T372823)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 13:28 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of netflow7001.magru.wmnet to drbd * 13:27 moritzm: rebalance magru02 following switch of VMs back to DRBD [[phab:T376737|T376737]] * 13:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of durum7002.magru.wmnet to drbd * 13:24 mszabo@deploy2002: Started scap sync-world: Backport for [[gerrit:1098506{{!}}private: Add stub for wgReportIncidentZendeskSubjectLine (T380868)]], [[gerrit:1098480{{!}}Configure IRS Zendesk integration (T380908)]], [[gerrit:1093389{{!}}Configure instrument for the Incident Reporting System (T372823)]] * 13:20 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wdqs1026.eqiad.wmnet with OS bullseye * 13:20 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wdqs1025.eqiad.wmnet with OS bullseye * 13:20 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wdqs1027.eqiad.wmnet with OS bullseye * 13:16 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of durum7002.magru.wmnet to drbd * 13:15 kartik@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 13:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of doh7002.wikimedia.org to drbd * 13:05 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of doh7002.wikimedia.org to drbd * 13:05 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of prometheus7001.magru.wmnet to drbd * 12:56 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on kafka-main[1002,1007].eqiad.wmnet with reason: Hardware refresh * 12:56 jiji@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on kafka-main[1002,1007].eqiad.wmnet with reason: Hardware refresh * 12:50 moritzm: installing ghostscript security updates * 12:39 kartik@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 12:38 effie: start replacing kafka-main1002 with kafka-main1007 - [[phab:T363214|T363214]] * 12:24 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply * 12:24 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply * 12:24 kart_: Updated cxserver to 2024-11-20-121713-production ([[phab:T377966|T377966]], [[phab:T357950|T357950]]) * 12:22 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 12:22 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 12:20 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 12:20 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply * 12:18 moritzm: installing python-cryptography security updates * 12:14 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 12:13 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply * 12:12 moritzm: installing openssl security updates * 12:08 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/zotero: apply * 12:07 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/zotero: apply * 12:06 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/zotero: apply * 12:06 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of prometheus7001.magru.wmnet to drbd * 12:06 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/zotero: apply * 12:05 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/zotero: apply * 12:05 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/zotero: apply * 12:05 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 12:05 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply * 12:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on ganeti2042.codfw.wmnet with reason: broken CPU * 12:03 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on ganeti2042.codfw.wmnet with reason: broken CPU * 11:54 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of bast7001.wikimedia.org to drbd * 11:45 ladsgroup@deploy2002: Finished scap sync-world: Backport for [[gerrit:1098484{{!}}Bump ratio of new parsercache key spec to 3 (T373037)]] (duration: 12m 51s) * 11:38 ladsgroup@deploy2002: ladsgroup: Continuing with sync * 11:38 ladsgroup@deploy2002: ladsgroup: Backport for [[gerrit:1098484{{!}}Bump ratio of new parsercache key spec to 3 (T373037)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 11:34 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of bast7001.wikimedia.org to drbd * 11:32 ladsgroup@deploy2002: Started scap sync-world: Backport for [[gerrit:1098484{{!}}Bump ratio of new parsercache key spec to 3 (T373037)]] * 11:29 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir7002.magru.wmnet to drbd * 11:21 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dns7002.wikimedia.org with reason: [[phab:T376737|T376737]] * 11:21 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on dns7002.wikimedia.org with reason: [[phab:T376737|T376737]] * 11:21 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dns7001.wikimedia.org with reason: [[phab:T376737|T376737]] * 11:20 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on dns7001.wikimedia.org with reason: [[phab:T376737|T376737]] * 11:19 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on lvs[7001-7003].magru.wmnet with reason: [[phab:T376737|T376737]] * 11:19 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on lvs[7001-7003].magru.wmnet with reason: [[phab:T376737|T376737]] * 11:19 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 16 hosts with reason: [[phab:T376737|T376737]] * 11:19 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on 16 hosts with reason: [[phab:T376737|T376737]] * 11:18 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir7002.magru.wmnet to drbd * 11:16 xSavitar: [[phab:T380875|T380875]] Ran mwscript-k8s --comment="[[phab:T380875|T380875]]" -f -- extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=metawiki --logwiki=metawiki 'EMBakeryEquipment' 'Janapanna' * 11:15 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti7002.magru.wmnet to cluster magru02 and group B4 * 11:13 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti7002.magru.wmnet to cluster magru02 and group B4 * 11:13 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs7001.magru.wmnet with reason: [[phab:T376737|T376737]] * 11:13 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs7001.magru.wmnet with reason: [[phab:T376737|T376737]] * 11:04 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'article-models' for release 'main' . * 11:03 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti7001.magru.wmnet to cluster magru01 and group B3 * 11:02 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti7001.magru.wmnet to cluster magru01 and group B3 * 10:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti7002.magru.wmnet * 10:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti7002.magru.wmnet * 10:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti7001.magru.wmnet * 10:10 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp7008.mgmt.magru.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 10:08 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti7001.magru.wmnet * 10:01 fabfur@cumin1002: START - Cookbook sre.hosts.provision for host cp7008.mgmt.magru.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 10:00 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti7002 * 09:59 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti7002 * 09:59 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti7001 * 09:58 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti7001 * 09:55 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp7006.mgmt.magru.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 09:48 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "readded ganeti nodes in magru - jmm@cumin2002 - [[phab:T376737|T376737]]" * 09:48 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "readded ganeti nodes in magru - jmm@cumin2002 - [[phab:T376737|T376737]]" * 09:46 fabfur@cumin1002: START - Cookbook sre.hosts.provision for host cp7006.mgmt.magru.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 09:45 hashar@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.5 refs [[phab:T375664|T375664]] * 09:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti7001.magru.wmnet with OS bookworm * 09:31 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 09:30 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 09:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti7001.magru.wmnet with reason: host reimage * 09:06 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1098413{{!}}ext.uls.inputsettings: Use arrow functions (T380431)]] (duration: 16m 06s) * 09:05 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti7001.magru.wmnet with reason: host reimage * 08:59 kartik@deploy2002: abi, kartik: Continuing with sync * 08:55 kartik@deploy2002: abi, kartik: Backport for [[gerrit:1098413{{!}}ext.uls.inputsettings: Use arrow functions (T380431)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:50 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1098413{{!}}ext.uls.inputsettings: Use arrow functions (T380431)]] * 08:45 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti7001.magru.wmnet with OS bookworm * 08:38 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1098415{{!}}Fix illegal access of typed property. (T380724)]] (duration: 21m 02s) * 08:31 kartik@deploy2002: kartik, abi: Continuing with sync * 08:24 kartik@deploy2002: kartik, abi: Backport for [[gerrit:1098415{{!}}Fix illegal access of typed property. (T380724)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti7002.magru.wmnet with OS bookworm * 08:24 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 08:23 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 08:17 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1098415{{!}}Fix illegal access of typed property. (T380724)]] * 08:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti7002.magru.wmnet with reason: host reimage * 07:57 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti7002.magru.wmnet with reason: host reimage * 07:34 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti7002.magru.wmnet with OS bookworm * 07:24 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 07:23 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. == 2024-11-26 == * 23:29 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns7002.wikimedia.org with OS bookworm * 23:29 brett@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002" * 23:28 brett@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002" * 23:28 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs7001.magru.wmnet with OS bullseye * 23:28 brett@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002" * 23:23 brett@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002" * 23:13 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7010.magru.wmnet with OS bullseye * 23:13 brett@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002" * 23:12 brett@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002" * 23:03 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs7001.magru.wmnet with reason: host reimage * 23:00 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs7001.magru.wmnet with reason: host reimage * 22:54 reedy@deploy2002: Finished scap sync-world: Backport for [[gerrit:1098161{{!}}Add CodeMirror to BetaFeaturesAllowList (T376735)]] (duration: 31m 35s) * 22:51 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns7002.wikimedia.org with reason: host reimage * 22:48 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns7002.wikimedia.org with reason: host reimage * 22:48 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7010.magru.wmnet with reason: host reimage * 22:45 reedy@deploy2002: musikanimal, reedy: Continuing with sync * 22:44 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7010.magru.wmnet with reason: host reimage * 22:40 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs7001.magru.wmnet with OS bullseye * 22:37 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7002.magru.wmnet with OS bullseye * 22:37 brett@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002" * 22:32 brett@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002" * 22:28 reedy@deploy2002: musikanimal, reedy: Backport for [[gerrit:1098161{{!}}Add CodeMirror to BetaFeaturesAllowList (T376735)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 22:26 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'configure' for AS: 4800 * 22:25 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp7010.magru.wmnet with OS bullseye * 22:24 brett@cumin2002: START - Cookbook sre.hosts.reimage for host dns7002.wikimedia.org with OS bookworm * 22:24 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dns7002.wikimedia.org with OS bullseye * 22:24 cmooney@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 4800 * 22:22 reedy@deploy2002: Started scap sync-world: Backport for [[gerrit:1098161{{!}}Add CodeMirror to BetaFeaturesAllowList (T376735)]] * 22:21 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns7002.wikimedia.org with reason: host reimage * 22:21 reedy@deploy2002: Finished scap sync-world: Backport for [[gerrit:1097484{{!}}Nov 26 2024: Vector 2022 Deployments (T379799)]] (duration: 19m 52s) * 22:18 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns7002.wikimedia.org with reason: host reimage * 22:15 cmooney@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 262979 * 22:14 cmooney@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 262979 * 22:11 reedy@deploy2002: jdlrobson, reedy: Continuing with sync * 22:08 reedy@deploy2002: jdlrobson, reedy: Backport for [[gerrit:1097484{{!}}Nov 26 2024: Vector 2022 Deployments (T379799)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 22:01 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7002.magru.wmnet with reason: host reimage * 22:01 reedy@deploy2002: Started scap sync-world: Backport for [[gerrit:1097484{{!}}Nov 26 2024: Vector 2022 Deployments (T379799)]] * 22:00 reedy@deploy2002: Finished scap sync-world: Backport for [[gerrit:1097591{{!}}Add BetaFeature for CodeMirror 6 (T376735)]] (duration: 40m 05s) * 21:58 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7004.magru.wmnet with OS bullseye * 21:58 brett@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002" * 21:58 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7002.magru.wmnet with reason: host reimage * 21:57 brett@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002" * 21:56 brett@cumin2002: START - Cookbook sre.hosts.reimage for host dns7002.wikimedia.org with OS bullseye * 21:46 reedy@deploy2002: musikanimal, reedy: Continuing with sync * 21:44 reedy@deploy2002: musikanimal, reedy: Backport for [[gerrit:1097591{{!}}Add BetaFeature for CodeMirror 6 (T376735)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:38 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp7002.magru.wmnet with OS bullseye * 21:35 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp7002.magru.wmnet with OS bullseye * 21:35 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns7002.wikimedia.org with OS bookworm * 21:35 brett@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cp7002.magru.wmnet dns7002.magru.wmnet on all recursors * 21:35 brett@cumin2002: START - Cookbook sre.dns.wipe-cache cp7002.magru.wmnet dns7002.magru.wmnet on all recursors * 21:35 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs7001.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:34 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp7010.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:32 robh@cumin2002: START - Cookbook sre.hosts.provision for host lvs7001.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:32 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp7010.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:32 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:32 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru shuffle - robh@cumin2002" * 21:32 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7004.magru.wmnet with reason: host reimage * 21:32 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru shuffle - robh@cumin2002" * 21:30 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs7001 * 21:30 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7003.magru.wmnet with OS bullseye * 21:30 brett@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002" * 21:30 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs7001 * 21:30 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp7010 * 21:30 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp7010 * 21:29 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7004.magru.wmnet with reason: host reimage * 21:28 robh@cumin2002: START - Cookbook sre.dns.netbox * 21:28 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:26 brett@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002" * 21:25 robh@cumin2002: START - Cookbook sre.dns.netbox * 21:24 damilare: civicrm upgraded from {{Gerrit|59d340cd}} to {{Gerrit|3b1ed162}} * 21:23 damilare: SmashPig upgraded from {{Gerrit|131e92a5}} to {{Gerrit|79b463b4}} * 21:22 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts lvs7001.magru.wmnet * 21:22 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:21 brett@cumin2002: START - Cookbook sre.hosts.reimage for host dns7002.wikimedia.org with OS bookworm * 21:20 reedy@deploy2002: Started scap sync-world: Backport for [[gerrit:1097591{{!}}Add BetaFeature for CodeMirror 6 (T376735)]] * 21:20 robh@cumin2002: START - Cookbook sre.dns.netbox * 21:20 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cp7010.magru.wmnet * 21:20 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:20 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp7010.magru.wmnet decommissioned, removing all IPs except the asset tag one - robh@cumin2002" * 21:19 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp7010.magru.wmnet decommissioned, removing all IPs except the asset tag one - robh@cumin2002" * 21:17 reedy@deploy2002: Synchronized wmf-config/core-Permissions.php: [[phab:T380753|T380753]] (duration: 11m 23s) * 21:16 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp7002.magru.wmnet with OS bullseye * 21:15 robh@cumin2002: START - Cookbook sre.dns.netbox * 21:10 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp7004.magru.wmnet with OS bullseye * 21:08 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp7004.magru.wmnet with OS bullseye * 21:08 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp7010.magru.wmnet * 21:08 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts lvs7001.magru.wmnet * 21:04 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dns7002.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:04 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp7002.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:02 brett@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cp7003.magru.wmnet cp7004.magru.wmnet on all recursors * 21:02 brett@cumin2002: START - Cookbook sre.dns.wipe-cache cp7003.magru.wmnet cp7004.magru.wmnet on all recursors * 21:02 robh@cumin2002: START - Cookbook sre.hosts.provision for host dns7002.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:02 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp7002.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:01 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:01 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru shuffle - robh@cumin2002" * 21:01 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dns7002 * 21:01 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru shuffle - robh@cumin2002" * 21:01 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host dns7002 * 21:01 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp7002 * 21:01 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp7002 * 20:58 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7003.magru.wmnet with reason: host reimage * 20:54 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7003.magru.wmnet with reason: host reimage * 20:54 robh@cumin2002: START - Cookbook sre.dns.netbox * 20:50 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cp7002.magru.wmnet * 20:50 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:47 robh@cumin2002: START - Cookbook sre.dns.netbox * 20:47 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts dns7002.wikimedia.org * 20:47 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:47 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns7002.wikimedia.org decommissioned, removing all IPs except the asset tag one - robh@cumin2002" * 20:47 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns7002.wikimedia.org decommissioned, removing all IPs except the asset tag one - robh@cumin2002" * 20:44 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp7004.magru.wmnet with OS bullseye * 20:43 robh@cumin2002: START - Cookbook sre.dns.netbox * 20:39 dzahn@cumin2002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: security release {{Gerrit|20241126}} * 20:37 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp7002.magru.wmnet * 20:37 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts dns7002.wikimedia.org * 20:35 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp7003.magru.wmnet with OS bullseye * 20:34 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp7004.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:32 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1027.eqiad.wmnet with OS bullseye * 20:32 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1026.eqiad.wmnet with OS bullseye * 20:32 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1025.eqiad.wmnet with OS bullseye * 20:32 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti7002.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:31 swfrench@deploy2002: Finished scap sync-world: Backport for [[gerrit:1076848{{!}}debug.json: add support for mwdebug-next (T372605)]] (duration: 14m 21s) * 20:26 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp7004.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:26 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru shuffle - robh@cumin2002" * 20:26 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru shuffle - robh@cumin2002" * 20:25 robh@cumin2002: START - Cookbook sre.hosts.provision for host ganeti7002.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti7002 * 20:25 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti7002 * 20:24 swfrench@deploy2002: swfrench: Continuing with sync * 20:23 robh@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti7002 * 20:23 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti7002 * 20:23 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti7002.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:23 swfrench@deploy2002: swfrench: Backport for [[gerrit:1076848{{!}}debug.json: add support for mwdebug-next (T372605)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:22 robh@cumin2002: START - Cookbook sre.hosts.provision for host ganeti7002.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:21 robh@cumin2002: START - Cookbook sre.dns.netbox * 20:21 dzahn@cumin2002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: security release {{Gerrit|20241126}} * 20:17 swfrench@deploy2002: Started scap sync-world: Backport for [[gerrit:1076848{{!}}debug.json: add support for mwdebug-next (T372605)]] * 20:16 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cp7004.magru.wmnet * 20:16 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:16 aokoth@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Security Update * 20:14 robh@cumin2002: START - Cookbook sre.dns.netbox * 20:13 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ganeti7002.magru.wmnet * 20:13 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:13 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti7002.magru.wmnet decommissioned, removing all IPs except the asset tag one - robh@cumin2002" * 20:13 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti7002.magru.wmnet decommissioned, removing all IPs except the asset tag one - robh@cumin2002" * 20:11 hashar@deploy2002: Finished scap sync-world: Backport for [[gerrit:1098120{{!}}Avoid exception on mTemplateIds/mTemplate array discrepancy (T380862)]] (duration: 15m 23s) * 20:09 robh@cumin2002: START - Cookbook sre.dns.netbox * 20:08 aokoth@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Security Update * 20:07 aokoth@cumin1002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1003.wikimedia.org with reason: Security Update * 20:07 aokoth@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Security Update * 20:04 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp7004.magru.wmnet * 20:04 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti7002.magru.wmnet * 20:02 hashar@deploy2002: hashar: Continuing with sync * 20:02 hashar@deploy2002: hashar: Backport for [[gerrit:1098120{{!}}Avoid exception on mTemplateIds/mTemplate array discrepancy (T380862)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:00 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp7003.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:59 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti7001.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:55 hashar@deploy2002: Started scap sync-world: Backport for [[gerrit:1098120{{!}}Avoid exception on mTemplateIds/mTemplate array discrepancy (T380862)]] * 19:52 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp7003.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:52 robh@cumin2002: START - Cookbook sre.hosts.provision for host ganeti7001.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:51 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:51 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru shuffle - robh@cumin2002" * 19:50 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru shuffle - robh@cumin2002" * 19:49 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti7001 * 19:49 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti7001 * 19:49 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp7003 * 19:49 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp7003 * 19:46 robh@cumin2002: START - Cookbook sre.dns.netbox * 19:43 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cp7003.magru.wmnet * 19:43 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:43 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp7003.magru.wmnet decommissioned, removing all IPs except the asset tag one - robh@cumin2002" * 19:42 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp7003.magru.wmnet decommissioned, removing all IPs except the asset tag one - robh@cumin2002" * 19:33 robh@cumin2002: START - Cookbook sre.dns.netbox * 19:27 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp7003.magru.wmnet * 19:27 urbanecm: [urbanecm@mwmaint2002 ~]$ foreachwiki userOptions.php --delete-defaults growthexperiments-homepage-variant # [[phab:T379146|T379146]], logging to /home/urbanecm/T379146.log * 19:26 urbanecm: mwscript-k8s -f userOptions.php -- --wiki=enwiki --old=oldimpact --delete 'growthexperiments-homepage-variant' # [[phab:T379146|T379146]] * 19:23 robh@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti7001.magru.wmnet * 19:23 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:23 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti7001.magru.wmnet decommissioned, removing all IPs except the asset tag one - robh@cumin2002" * 19:23 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti7001.magru.wmnet decommissioned, removing all IPs except the asset tag one - robh@cumin2002" * 19:22 eileen: civicrm upgraded from {{Gerrit|eec961a3}} to {{Gerrit|59d340cd}} * 19:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2215 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P71191 and previous config saved to /var/cache/conftool/dbconfig/20241126-192112-ladsgroup.json * 19:12 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wdqs1027.eqiad.wmnet with OS bullseye * 19:12 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wdqs1026.eqiad.wmnet with OS bullseye * 19:12 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wdqs1025.eqiad.wmnet with OS bullseye * 19:11 robh@cumin2002: START - Cookbook sre.dns.netbox * 19:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2215 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P71190 and previous config saved to /var/cache/conftool/dbconfig/20241126-190607-ladsgroup.json * 18:55 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti7001.magru.wmnet * 18:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2215 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P71189 and previous config saved to /var/cache/conftool/dbconfig/20241126-185101-ladsgroup.json * 18:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2215 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P71188 and previous config saved to /var/cache/conftool/dbconfig/20241126-183556-ladsgroup.json * 18:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2215 repool', diff saved to https://phabricator.wikimedia.org/P71187 and previous config saved to /var/cache/conftool/dbconfig/20241126-183547-ladsgroup.json * 18:34 ladsgroup@cumin1002: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) db2215 gradually with 4 steps - Maint over * 18:33 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool db2215 gradually with 4 steps - Maint over * 18:25 jayme@cumin2002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on D<nowiki>{</nowiki>wikikube-ctrl200[1-3].codfw.wmnet<nowiki>}</nowiki> and (A:wikikube-worker-codfw or A:wikikube-master-codfw) * 18:10 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2215.codfw.wmnet with reason: Maintenance * 18:10 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2215.codfw.wmnet with reason: Maintenance * 17:58 jayme@cumin2002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on D<nowiki>{</nowiki>wikikube-ctrl200[1-3].codfw.wmnet<nowiki>}</nowiki> and (A:wikikube-worker-codfw or A:wikikube-master-codfw) * 17:47 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[1313-1327].eqiad.wmnet * 17:47 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[1313-1327].eqiad.wmnet * 17:35 claime: homer 'cr*eqiad*' commit '[[phab:T380350|T380350]]' * 17:35 claime: homer 'lsw1-e7-eqiad*' commit '[[phab:T380350|T380350]]' * 17:34 claime: homer 'lsw1-f6-eqiad*' commit '[[phab:T380350|T380350]]' * 17:34 claime: homer 'lsw1-f5-eqiad*' commit '[[phab:T380350|T380350]]' * 17:33 claime: homer 'lsw1-e5-eqiad*' commit '[[phab:T380350|T380350]]' * 17:32 claime: homer 'lsw1-e6-eqiad*' commit '[[phab:T380350|T380350]]' * 17:31 claime: homer 'lsw1-f7-eqiad*' commit '[[phab:T380350|T380350]]' * 17:28 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1321.eqiad.wmnet with OS bookworm * 17:25 jayme@cumin2002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on D<nowiki>{</nowiki>wikikube-ctrl100[1-3].eqiad.wmnet<nowiki>}</nowiki> and (A:wikikube-worker-eqiad or A:wikikube-master-eqiad) * 17:23 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1325.eqiad.wmnet with OS bookworm * 17:20 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1326.eqiad.wmnet with OS bookworm * 17:17 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1324.eqiad.wmnet with OS bookworm * 17:13 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1327.eqiad.wmnet with OS bookworm * 17:10 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1322.eqiad.wmnet with OS bookworm * 17:09 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1321.eqiad.wmnet with reason: host reimage * 17:06 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1321.eqiad.wmnet with reason: host reimage * 17:05 ladsgroup@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1020.eqiad.wmnet,service=s8 * 17:04 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1325.eqiad.wmnet with reason: host reimage * 17:03 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1323.eqiad.wmnet with OS bookworm * 17:01 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1326.eqiad.wmnet with reason: host reimage * 16:59 jayme@cumin2002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on D<nowiki>{</nowiki>wikikube-ctrl100[1-3].eqiad.wmnet<nowiki>}</nowiki> and (A:wikikube-worker-eqiad or A:wikikube-master-eqiad) * 16:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1324.eqiad.wmnet with reason: host reimage * 16:54 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1327.eqiad.wmnet with reason: host reimage * 16:51 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1322.eqiad.wmnet with reason: host reimage * 16:49 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1327.eqiad.wmnet with reason: host reimage * 16:49 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1326.eqiad.wmnet with reason: host reimage * 16:49 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1325.eqiad.wmnet with reason: host reimage * 16:48 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1324.eqiad.wmnet with reason: host reimage * 16:48 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1322.eqiad.wmnet with reason: host reimage * 16:46 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1321.eqiad.wmnet with OS bookworm * 16:45 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1323.eqiad.wmnet with reason: host reimage * 16:45 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:42 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1321.eqiad.wmnet with OS bookworm * 16:42 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 16:41 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1323.eqiad.wmnet with reason: host reimage * 16:40 urbanecm: `mwscript-k8s -f userOptions.php -- --wiki=enwiki --old=control --delete 'growthexperiments-homepage-variant'` # [[phab:T379146|T379146]], [[phab:T377631|T377631]] * 16:30 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1327.eqiad.wmnet with OS bookworm * 16:30 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1326.eqiad.wmnet with OS bookworm * 16:30 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1325.eqiad.wmnet with OS bookworm * 16:29 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1324.eqiad.wmnet with OS bookworm * 16:29 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1322.eqiad.wmnet with OS bookworm * 16:29 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1321.eqiad.wmnet with OS bookworm * 16:28 cgoubert@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-worker1321.eqiad.wmnet with OS bookworm * 16:28 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1327.eqiad.wmnet with OS bookworm * 16:27 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1326.eqiad.wmnet with OS bookworm * 16:27 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1325.eqiad.wmnet with OS bookworm * 16:27 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1324.eqiad.wmnet with OS bookworm * 16:26 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1322.eqiad.wmnet with OS bookworm * 16:22 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1323.eqiad.wmnet with OS bookworm * 16:20 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1323.eqiad.wmnet with OS bookworm * 15:52 moritzm: installing intel-microcode security updates * 15:49 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1327.eqiad.wmnet with OS bookworm * 15:49 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1326.eqiad.wmnet with OS bookworm * 15:48 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1325.eqiad.wmnet with OS bookworm * 15:48 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1324.eqiad.wmnet with OS bookworm * 15:47 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1323.eqiad.wmnet with OS bookworm * 15:46 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1322.eqiad.wmnet with OS bookworm * 15:45 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1321.eqiad.wmnet with OS bookworm * 15:42 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns7001.wikimedia.org with OS bookworm * 15:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2042.codfw.wmnet * 15:39 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1027.eqiad.wmnet with OS bullseye * 15:39 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1025.eqiad.wmnet with OS bullseye * 15:39 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1026.eqiad.wmnet with OS bullseye * 15:35 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2042.codfw.wmnet * 15:34 moritzm: installing wireshark security updates * 15:33 ladsgroup@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db2215 gradually with 4 steps - Maint over * 15:33 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool db2215 gradually with 4 steps - Maint over * 15:27 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply * 15:25 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply * 15:24 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1020.eqiad.wmnet * 15:22 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1020.eqiad.wmnet * 15:21 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1020.eqiad.wmnet * 15:19 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1020.eqiad.wmnet * 15:16 dcausse@deploy2002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply * 15:16 dcausse@deploy2002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply * 15:11 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns7001.wikimedia.org with reason: host reimage * away: UTC afternoon deploys done * 15:08 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply * 15:08 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply * 15:07 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns7001.wikimedia.org with reason: host reimage * 15:05 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1095082{{!}}Allow simulating the SUL3 shared domain settings via env var (T380575)]] (duration: 26m 23s) * 14:58 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on doh[7001-7002].wikimedia.org,durum[7001-7002].magru.wmnet with reason: site is depooled, maintenance * 14:58 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on doh[7001-7002].wikimedia.org,durum[7001-7002].magru.wmnet with reason: site is depooled, maintenance * 14:58 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh[7001-7002].wikimedia.org,durum[7001-7002].magru.wmnet with reason: site is depooled, maintenance * 14:58 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh[7001-7002].wikimedia.org,durum[7001-7002].magru.wmnet with reason: site is depooled, maintenance * 14:56 tgr@deploy2002: tgr: Continuing with sync * 14:44 tgr@deploy2002: tgr: Backport for [[gerrit:1095082{{!}}Allow simulating the SUL3 shared domain settings via env var (T380575)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:43 fabfur@cumin1002: START - Cookbook sre.hosts.reimage for host dns7001.wikimedia.org with OS bookworm * 14:43 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns7001.wikimedia.org with OS bullseye * 14:43 fabfur@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1002" * 14:40 fabfur@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1002" * 14:39 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1095082{{!}}Allow simulating the SUL3 shared domain settings via env var (T380575)]] * 14:31 mlitn@deploy2002: Finished scap sync-world: Backport for [[gerrit:1097983{{!}}Fix incorrect 'this']] (duration: 12m 36s) * 14:25 mlitn@deploy2002: mlitn: Continuing with sync * 14:25 mlitn@deploy2002: mlitn: Backport for [[gerrit:1097983{{!}}Fix incorrect 'this']] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:19 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wdqs1027.eqiad.wmnet with OS bullseye * 14:19 mlitn@deploy2002: Started scap sync-world: Backport for [[gerrit:1097983{{!}}Fix incorrect 'this']] * 14:19 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wdqs1025.eqiad.wmnet with OS bullseye * 14:19 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wdqs1026.eqiad.wmnet with OS bullseye * 14:19 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-search: apply * 14:18 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-search: apply * 14:14 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Add grid view - oblivian@cumin1002" * 14:14 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Add grid view - oblivian@cumin1002 * 14:14 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Add grid view - oblivian@cumin1002 * 14:13 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Add grid view - oblivian@cumin1002" * 14:09 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-search: apply * 14:09 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns7001.wikimedia.org with reason: host reimage * 14:08 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-search: apply * 14:08 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-search: apply * 14:08 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-search: apply * 14:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-search: apply * 14:05 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns7001.wikimedia.org with reason: host reimage * 14:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-search: apply * 14:01 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7015.magru.wmnet with OS bullseye * 14:01 fabfur@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1002" * 14:01 fabfur@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1002" * 13:54 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Reclone ([[phab:T379724|T379724]]) * 13:54 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Reclone ([[phab:T379724|T379724]]) * 13:54 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb1020.eqiad.wmnet with reason: Reclone ([[phab:T379724|T379724]]) * 13:54 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb1020.eqiad.wmnet with reason: Reclone ([[phab:T379724|T379724]]) * 13:49 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs7003.magru.wmnet with OS bullseye * 13:49 fabfur@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1002" * 13:49 ladsgroup@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1020.eqiad.wmnet,service=s8 * 13:46 fabfur@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1002" * 13:43 fabfur@cumin1002: START - Cookbook sre.hosts.reimage for host dns7001.wikimedia.org with OS bullseye * 13:40 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host dns7001.wikimedia.org * 13:38 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp7004.magru.wmnet with reason: [[phab:T376737|T376737]] * 13:38 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp7004.magru.wmnet with reason: [[phab:T376737|T376737]] * 13:35 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7015.magru.wmnet with reason: host reimage * 13:34 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp7010.magru.wmnet with reason: [[phab:T376737|T376737]] * 13:34 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp7010.magru.wmnet with reason: [[phab:T376737|T376737]] * 13:34 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp7003.magru.wmnet with reason: [[phab:T376737|T376737]] * 13:34 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp7003.magru.wmnet with reason: [[phab:T376737|T376737]] * 13:32 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7015.magru.wmnet with reason: host reimage * 13:30 Emperor: swift delete wikipedia-commons-local-public.bf b/bf/Schuur_-_Nieuwerbrug_-_20164513_-_RCE.jpg [[phab:T380738|T380738]] * 13:29 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp7003.magru.wmnet with reason: [[phab:T376737|T376737]] * 13:28 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp7003.magru.wmnet with reason: [[phab:T376737|T376737]] * 13:28 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp7002.magru.wmnet with reason: [[phab:T376737|T376737]] * 13:28 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp7002.magru.wmnet with reason: [[phab:T376737|T376737]] * 13:28 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cp7002.magru.wmnet with reason: [[phab:T376737|T376737]] * 13:28 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on cp7002.magru.wmnet with reason: [[phab:T376737|T376737]] * 13:27 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cp7008.magru.wmnet with reason: [[phab:T376737|T376737]] * 13:27 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on cp7008.magru.wmnet with reason: [[phab:T376737|T376737]] * 13:27 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cp7006.magru.wmnet with reason: [[phab:T376737|T376737]] * 13:26 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on cp7006.magru.wmnet with reason: [[phab:T376737|T376737]] * 13:26 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cp7001.magru.wmnet with reason: [[phab:T376737|T376737]] * 13:26 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on cp7001.magru.wmnet with reason: [[phab:T376737|T376737]] * 13:21 cmooney@cumin1002: START - Cookbook sre.hosts.dhcp for host dns7001.wikimedia.org * 13:20 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs7003.magru.wmnet with reason: host reimage * 13:18 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs7003.magru.wmnet with reason: host reimage * 13:15 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephmon1004.eqiad.wmnet with OS bullseye * 13:11 fabfur@cumin1002: START - Cookbook sre.hosts.reimage for host cp7015.magru.wmnet with OS bullseye * 13:11 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 100%: repool', diff saved to https://phabricator.wikimedia.org/P71185 and previous config saved to /var/cache/conftool/dbconfig/20241126-131120-arnaudb.json * 13:07 sukhe@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns7001.wikimedia.org with OS bookworm * 13:03 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host dns7001.wikimedia.org with OS bookworm * 12:58 fabfur@cumin1002: START - Cookbook sre.hosts.reimage for host lvs7003.magru.wmnet with OS bullseye * 12:57 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephmon1004.eqiad.wmnet with reason: host reimage * 12:56 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 75%: repool', diff saved to https://phabricator.wikimedia.org/P71183 and previous config saved to /var/cache/conftool/dbconfig/20241126-125614-arnaudb.json * 12:53 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on A:cp-text_esams and A:cp for 9.2.6-1wm2 * 12:53 dcaro@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephmon1004.eqiad.wmnet with reason: host reimage * 12:51 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on A:cp-upload_esams and A:cp for 9.2.6-1wm2 * 12:48 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp7015.magru.wmnet with OS bullseye * 12:41 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 50%: repool', diff saved to https://phabricator.wikimedia.org/P71182 and previous config saved to /var/cache/conftool/dbconfig/20241126-124109-arnaudb.json * 12:30 robh@cumin2002: START - Cookbook sre.hosts.reimage for host cp7015.magru.wmnet with OS bullseye * 12:26 arnaudb@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 100%: repool', diff saved to https://phabricator.wikimedia.org/P71181 and previous config saved to /var/cache/conftool/dbconfig/20241126-122622-arnaudb.json * 12:26 hashar@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.5 refs [[phab:T375664|T375664]] * 12:26 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 25%: repool', diff saved to https://phabricator.wikimedia.org/P71180 and previous config saved to /var/cache/conftool/dbconfig/20241126-122603-arnaudb.json * 12:23 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:20 robh@cumin2002: START - Cookbook sre.dns.netbox * 12:20 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs7003 * 12:20 moritzm: failover Ganeti master in magru02 to ganeti7004 * 12:20 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs7003 * 12:19 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp7015 * 12:19 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp7015 * 12:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of prometheus7001.magru.wmnet to plain * 12:15 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of prometheus7001.magru.wmnet to plain * 12:13 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of doh7002.wikimedia.org to plain * 12:11 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of doh7002.wikimedia.org to plain * 12:11 arnaudb@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 75%: repool', diff saved to https://phabricator.wikimedia.org/P71179 and previous config saved to /var/cache/conftool/dbconfig/20241126-121117-arnaudb.json * 12:10 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 20%: repool', diff saved to https://phabricator.wikimedia.org/P71178 and previous config saved to /var/cache/conftool/dbconfig/20241126-121058-arnaudb.json * 12:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of durum7002.magru.wmnet to plain * 12:10 ladsgroup@deploy2002: Finished scap sync-world: Backport for [[gerrit:1098006{{!}}Bump ratio of new parsercache key spec to 4 (T373037)]] (duration: 15m 21s) * 12:09 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of durum7002.magru.wmnet to plain * 12:07 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of bast7001.wikimedia.org to plain * 12:07 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of bast7001.wikimedia.org to plain * 12:05 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir7002.magru.wmnet to plain * 12:05 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir7002.magru.wmnet to plain * 12:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of prometheus7001.magru.wmnet to drbd * 12:02 ladsgroup@deploy2002: ladsgroup: Continuing with sync * 12:01 ladsgroup@deploy2002: ladsgroup: Backport for [[gerrit:1098006{{!}}Bump ratio of new parsercache key spec to 4 (T373037)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 11:56 arnaudb@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 50%: repool', diff saved to https://phabricator.wikimedia.org/P71177 and previous config saved to /var/cache/conftool/dbconfig/20241126-115612-arnaudb.json * 11:55 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 15%: repool', diff saved to https://phabricator.wikimedia.org/P71176 and previous config saved to /var/cache/conftool/dbconfig/20241126-115552-arnaudb.json * 11:55 ladsgroup@deploy2002: Started scap sync-world: Backport for [[gerrit:1098006{{!}}Bump ratio of new parsercache key spec to 4 (T373037)]] * 11:54 hashar@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.5 refs [[phab:T375664|T375664]] (duration: 25m 52s) * 11:53 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on A:cp-upload_esams and A:cp for 9.2.6-1wm2 * 11:53 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on A:cp-text_esams and A:cp for 9.2.6-1wm2 * 11:41 arnaudb@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 25%: repool', diff saved to https://phabricator.wikimedia.org/P71175 and previous config saved to /var/cache/conftool/dbconfig/20241126-114106-arnaudb.json * 11:40 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 10%: repool', diff saved to https://phabricator.wikimedia.org/P71174 and previous config saved to /var/cache/conftool/dbconfig/20241126-114047-arnaudb.json * 11:38 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on A:cp-text_eqiad and A:cp for 9.2.6-1wm2 * 11:38 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on A:cp-upload_eqiad and A:cp for 9.2.6-1wm2 * 11:31 moritzm: remove ganeti7001 from active Ganeti nodes in magru01 * 11:28 dcaro@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephmon1004.eqiad.wmnet with OS bullseye * 11:28 hashar@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.5 refs [[phab:T375664|T375664]] * 11:28 moritzm: failover Ganeti master in magru01 to ganeti7003 * 11:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of install7001.wikimedia.org to plain * 11:26 arnaudb@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 15%: repool', diff saved to https://phabricator.wikimedia.org/P71173 and previous config saved to /var/cache/conftool/dbconfig/20241126-112601-arnaudb.json * 11:25 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 5%: repool', diff saved to https://phabricator.wikimedia.org/P71172 and previous config saved to /var/cache/conftool/dbconfig/20241126-112542-arnaudb.json * 11:25 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of install7001.wikimedia.org to plain * 11:25 hashar@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.5 refs [[phab:T375664|T375664]] * 11:25 dcaro@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephmon1004.eqiad.wmnet with OS bullseye * 11:24 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of doh7001.wikimedia.org to plain * 11:23 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of doh7001.wikimedia.org to plain * 11:21 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of durum7001.magru.wmnet to plain * 11:20 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of durum7001.magru.wmnet to plain * 11:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir7001.magru.wmnet to plain * 11:12 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir7001.magru.wmnet to plain * 11:10 arnaudb@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 10%: repool', diff saved to https://phabricator.wikimedia.org/P71171 and previous config saved to /var/cache/conftool/dbconfig/20241126-111056-arnaudb.json * 11:10 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on lvs7003.magru.wmnet with reason: [[phab:T376737|T376737]] * 11:10 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on lvs7003.magru.wmnet with reason: [[phab:T376737|T376737]] * 11:10 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 2%: repool', diff saved to https://phabricator.wikimedia.org/P71170 and previous config saved to /var/cache/conftool/dbconfig/20241126-111036-arnaudb.json * 11:10 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on lvs7002.magru.wmnet with reason: [[phab:T376737|T376737]] * 11:10 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on lvs7002.magru.wmnet with reason: [[phab:T376737|T376737]] * 11:10 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on lvs7001.magru.wmnet with reason: [[phab:T376737|T376737]] * 11:10 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on lvs7001.magru.wmnet with reason: [[phab:T376737|T376737]] * 11:09 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of netflow7001.magru.wmnet to plain * 11:09 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of netflow7001.magru.wmnet to plain * 11:06 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of netflow7001.magru.wmnet to drbd * 11:05 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 11:03 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of prometheus7001.magru.wmnet to drbd * 11:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of bast7001.wikimedia.org to drbd * 10:56 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of netflow7001.magru.wmnet to drbd * 10:55 arnaudb@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 5%: repool', diff saved to https://phabricator.wikimedia.org/P71169 and previous config saved to /var/cache/conftool/dbconfig/20241126-105550-arnaudb.json * 10:55 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 1%: repool', diff saved to https://phabricator.wikimedia.org/P71168 and previous config saved to /var/cache/conftool/dbconfig/20241126-105531-arnaudb.json * 10:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir7001.magru.wmnet to drbd * 10:47 hashar@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.5 refs [[phab:T375664|T375664]] * 10:46 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 10:43 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of bast7001.wikimedia.org to drbd * 10:42 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir7001.magru.wmnet to drbd * 10:42 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on A:cp-upload_eqiad and A:cp for 9.2.6-1wm2 * 10:42 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on A:cp-text_eqiad and A:cp for 9.2.6-1wm2 * 10:38 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of doh7002.wikimedia.org to drbd * 10:38 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of durum7001.magru.wmnet to drbd * 10:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1233.eqiad.wmnet onto db1246.eqiad.wmnet * 10:28 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of doh7002.wikimedia.org to drbd * 10:27 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of durum7001.magru.wmnet to drbd * 10:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of durum7002.magru.wmnet to drbd * 10:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of doh7001.wikimedia.org to drbd * 10:15 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of durum7002.magru.wmnet to drbd * 10:15 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of doh7001.wikimedia.org to drbd * 10:13 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir7002.magru.wmnet to drbd * 10:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of install7001.wikimedia.org to drbd * 10:02 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir7002.magru.wmnet to drbd * 09:57 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of install7001.wikimedia.org to drbd * 09:56 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti7004.magru.wmnet to cluster magru02 and group B4 * 09:54 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti7004.magru.wmnet to cluster magru02 and group B4 * 09:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti7003.magru.wmnet to cluster magru01 and group B3 * 09:52 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti7003.magru.wmnet to cluster magru01 and group B3 * 09:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti7004.mgmt.magru.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 09:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti7003.magru.wmnet to cluster magru01 and group B3 * 09:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti7003.magru.wmnet to cluster magru01 and group B3 * 09:23 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti7004.mgmt.magru.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 09:21 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts kubernetes[2005-2006,2015-2016].codfw.wmnet,kubernetes[1005-1006,1015-1016].eqiad.wmnet * 09:21 jayme@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:21 jayme@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kubernetes[2005-2006,2015-2016].codfw.wmnet,kubernetes[1005-1006,1015-1016].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jayme@cumin2002" * 09:21 jayme@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kubernetes[2005-2006,2015-2016].codfw.wmnet,kubernetes[1005-1006,1015-1016].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jayme@cumin2002" * 09:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti7003.mgmt.magru.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 09:11 jayme@cumin2002: START - Cookbook sre.dns.netbox * 09:11 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti7003.mgmt.magru.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 09:03 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db1233.eqiad.wmnet onto db1246.eqiad.wmnet * 08:52 jayme@cumin2002: START - Cookbook sre.hosts.decommission for hosts kubernetes[2005-2006,2015-2016].codfw.wmnet,kubernetes[1005-1006,1015-1016].eqiad.wmnet * 08:52 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti7004.magru.wmnet to cluster magru02 and group B4 * 08:52 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti7004.magru.wmnet to cluster magru02 and group B4 * 08:49 dcausse@deploy2002: Finished deploy [airflow-dags/search@f969d75]: search: swift_upload.py moved to refinery/bin/ (duration: 00m 27s) * 08:49 jayme@cumin2002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubernetes[1005-1006,1015-1016].eqiad.wmnet * 08:48 jayme@cumin2002: START - Cookbook sre.k8s.pool-depool-node depool for host kubernetes[1005-1006,1015-1016].eqiad.wmnet * 08:48 dcausse@deploy2002: Started deploy [airflow-dags/search@f969d75]: search: swift_upload.py moved to refinery/bin/ * 08:47 jayme@cumin2002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubernetes[2005-2006,2015-2016].codfw.wmnet * 08:46 jayme@cumin2002: START - Cookbook sre.k8s.pool-depool-node depool for host kubernetes[2005-2006,2015-2016].codfw.wmnet * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti7004.magru.wmnet * 08:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti7003.magru.wmnet to cluster magru01 and group B3 * 08:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti7003.magru.wmnet to cluster magru01 and group B3 * 08:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti7003.magru.wmnet * 08:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti7004.magru.wmnet * 08:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti7003.magru.wmnet * 08:06 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti7004 * 08:06 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti7004 * 08:06 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti7003 * 08:05 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti7003 * 07:55 arnaudb@cumin1002: dbctl commit (dc=all): 'manual depool commit', diff saved to https://phabricator.wikimedia.org/P71164 and previous config saved to /var/cache/conftool/dbconfig/20241126-075433-arnaudb.json * 07:55 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.depool (exit_code=99) db1233 - clone on db1246 * 07:54 arnaudb@cumin1002: START - Cookbook sre.mysql.depool db1233 - clone on db1246 * 07:36 joal@deploy2002: Finished deploy [analytics/refinery@f48b8de] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@f48b8de2] (duration: 00m 29s) * 07:35 joal@deploy2002: Started deploy [analytics/refinery@f48b8de] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@f48b8de2] * 07:35 joal@deploy2002: Finished deploy [analytics/refinery@f48b8de] (thin): Regular analytics weekly train THIN [analytics/refinery@f48b8de2] (duration: 00m 35s) * 07:34 joal@deploy2002: Started deploy [analytics/refinery@f48b8de] (thin): Regular analytics weekly train THIN [analytics/refinery@f48b8de2] * 07:34 joal@deploy2002: Finished deploy [analytics/refinery@f48b8de]: Regular analytics weekly train [analytics/refinery@f48b8de2] (duration: 02m 03s) * 07:33 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "UI bugfixes - oblivian@cumin1002" * 07:33 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: UI bugfixes - oblivian@cumin1002 * 07:33 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: UI bugfixes - oblivian@cumin1002 * 07:33 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "UI bugfixes - oblivian@cumin1002" * 07:32 joal@deploy2002: Started deploy [analytics/refinery@f48b8de]: Regular analytics weekly train [analytics/refinery@f48b8de2] * 03:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2215 ([[phab:T380449|T380449]])', diff saved to https://phabricator.wikimedia.org/P71163 and previous config saved to /var/cache/conftool/dbconfig/20241126-034040-ladsgroup.json * 03:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2215.codfw.wmnet with reason: Maintenance * 03:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2215.codfw.wmnet with reason: Maintenance * 03:12 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 20:00:00 on wdqs[2018-2020,2026-2027].codfw.wmnet with reason: [[phab:T376150|T376150]] non-prod hosts * 03:12 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 20:00:00 on wdqs[2018-2020,2026-2027].codfw.wmnet with reason: [[phab:T376150|T376150]] non-prod hosts * 03:11 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) ([[phab:T376150|T376150]], initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling neither afterwards * 03:10 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T376150|T376150]], initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling neither afterwards * 03:09 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) ([[phab:T376150|T376150]], initialize wdqs internal main tier) xfer wikidata_main from wdqs2021.codfw.wmnet -> wdqs2019.codfw.wmnet, repooling neither afterwards * 03:07 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T376150|T376150]], initialize wdqs internal main tier) xfer wikidata_main from wdqs2021.codfw.wmnet -> wdqs2019.codfw.wmnet, repooling neither afterwards * 02:42 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2201.codfw.wmnet with reason: Maintenance * 02:41 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2201.codfw.wmnet with reason: Maintenance * 02:34 brett: Import libvmod-netmapper 1.9.1-1 into varnish-staging apt component * 02:31 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on wdqs[2026-2027].codfw.wmnet with reason: [[phab:T376150|T376150]] * 02:30 bking@cumin2002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on wdqs[2026-2027].codfw.wmnet with reason: [[phab:T376150|T376150]] * 02:29 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) ([[phab:T376150|T376150]], initialize wdqs internal main tier) xfer wikidata_main from wdqs2021.codfw.wmnet -> wdqs2020.codfw.wmnet, repooling source-only afterwards * 02:24 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T376150|T376150]], initialize wdqs internal main tier) xfer wikidata_main from wdqs2021.codfw.wmnet -> wdqs2020.codfw.wmnet, repooling source-only afterwards * 01:47 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) ([[phab:T376150|T376150]], initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2027.codfw.wmnet, repooling source-only afterwards * 01:29 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host dns7001.wikimedia.org with OS bookworm * 01:08 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) ([[phab:T376150|T376150]], initialize wdqs internal main tier) xfer wikidata_main from wdqs2021.codfw.wmnet -> wdqs2019.codfw.wmnet, repooling source-only afterwards * 01:06 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on lvs[7001-7003].magru.wmnet with reason: site is depooled, maintenance * 01:06 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 10:00:00 on lvs[7001-7003].magru.wmnet with reason: site is depooled, maintenance * 01:04 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T376150|T376150]], initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2027.codfw.wmnet, repooling source-only afterwards * 01:04 sukhe@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp7015.magru.wmnet with OS bullseye * 01:03 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T376150|T376150]], initialize wdqs internal main tier) xfer wikidata_main from wdqs2021.codfw.wmnet -> wdqs2019.codfw.wmnet, repooling source-only afterwards * 01:01 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on wdqs[2026-2027].codfw.wmnet with reason: [[phab:T376150|T376150]] * 01:01 bking@cumin2002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on wdqs[2026-2027].codfw.wmnet with reason: [[phab:T376150|T376150]] * 00:55 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host cp7015.magru.wmnet with OS bullseye * 00:28 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) ([[phab:T376150|T376150]], initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards * 00:21 eileen: civicrm upgraded from {{Gerrit|190ea417}} to {{Gerrit|eec961a3}} * 00:16 tzatziki: removing 6 files for legal compliance * 00:00 tzatziki: removing 1 file for legal compliance == 2024-11-25 == * 23:56 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2197.codfw.wmnet with reason: Maintenance * 23:55 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2197.codfw.wmnet with reason: Maintenance * 23:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2191 ([[phab:T380449|T380449]])', diff saved to https://phabricator.wikimedia.org/P71162 and previous config saved to /var/cache/conftool/dbconfig/20241125-235547-ladsgroup.json * 23:54 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) ([[phab:T376150|T376150]], initialize wdqs internal main tier) xfer wikidata_main from wdqs2021.codfw.wmnet -> wdqs2018.codfw.wmnet, repooling source-only afterwards * 23:53 tzatziki: removing 1 file for legal compliance * 23:49 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T376150|T376150]], initialize wdqs internal main tier) xfer wikidata_main from wdqs2021.codfw.wmnet -> wdqs2018.codfw.wmnet, repooling source-only afterwards * 23:44 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T376150|T376150]], initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards * 23:42 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T376150|T376150]], initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards * 23:42 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T376150|T376150]], initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards * 23:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2191', diff saved to https://phabricator.wikimedia.org/P71161 and previous config saved to /var/cache/conftool/dbconfig/20241125-234040-ladsgroup.json * 23:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2191', diff saved to https://phabricator.wikimedia.org/P71160 and previous config saved to /var/cache/conftool/dbconfig/20241125-232533-ladsgroup.json * 23:23 tzatziki: removing 2 files for legal compliance * 23:16 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T376150|T376150]], initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards * 23:16 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T376150|T376150]], initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards * 23:16 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T376150|T376150]], initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards * 23:16 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T376150|T376150]], initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards * 23:14 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T376150|T376150]], initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards * 23:14 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T376150|T376150]], initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards * 23:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2191 ([[phab:T380449|T380449]])', diff saved to https://phabricator.wikimedia.org/P71159 and previous config saved to /var/cache/conftool/dbconfig/20241125-231026-ladsgroup.json * 23:10 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T376150|T376150]], initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards * 23:10 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T376150|T376150]], initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards * 23:09 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T376150|T376150]], initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards * 23:09 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T376150|T376150]], initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards * 23:09 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp7015.magru.wmnet with OS bullseye * 23:02 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp7015.magru.wmnet with OS bullseye * 23:01 bking@cumin1002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) * 23:01 bking@cumin1002: START - Cookbook sre.wdqs.data-transfer * 23:01 brett@cumin2002: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) cp7015.magru.wmnet lvs7003.magru.wmnet cp7015.mgmt.magru.wmnet lvs7003.mgmt.magru.wmnet on all recursors * 23:00 brett@cumin2002: START - Cookbook sre.dns.wipe-cache cp7015.magru.wmnet lvs7003.magru.wmnet cp7015.mgmt.magru.wmnet lvs7003.mgmt.magru.wmnet on all recursors * 23:00 brett@cumin2002: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) cp7015.magru.wmnet lvs7003.magru.wmnet on all recursors * 23:00 brett@cumin2002: START - Cookbook sre.dns.wipe-cache cp7015.magru.wmnet lvs7003.magru.wmnet on all recursors * 22:56 brett: Import varnish-modules 0.20.0-2~deb11u1 into varnish-staging apt component * 22:56 bking@cumin1002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T376150|T376150]], initialize wdqs internal scholarly tier) xfer wikidata from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards * 22:56 bking@cumin1002: START - Cookbook sre.wdqs.data-transfer ([[phab:T376150|T376150]], initialize wdqs internal scholarly tier) xfer wikidata from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards * 22:53 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T376150|T376150]], initialize wdqs internal scholarly tier) xfer wikidata from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards * 22:53 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T376150|T376150]], initialize wdqs internal scholarly tier) xfer wikidata from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards * 22:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2191 ([[phab:T380449|T380449]])', diff saved to https://phabricator.wikimedia.org/P71158 and previous config saved to /var/cache/conftool/dbconfig/20241125-224949-ladsgroup.json * 22:49 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2191.codfw.wmnet with reason: Maintenance * 22:49 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2191.codfw.wmnet with reason: Maintenance * 22:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2131 ([[phab:T380449|T380449]])', diff saved to https://phabricator.wikimedia.org/P71157 and previous config saved to /var/cache/conftool/dbconfig/20241125-224927-ladsgroup.json * 22:48 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T376150|T376150]], initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards * 22:48 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T376150|T376150]], initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards * 22:46 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp7015.magru.wmnet with OS bullseye * 22:43 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T376150|T376150]], initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards * 22:43 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T376150|T376150]], initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards * 22:38 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T376150|T376150]], initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards * 22:38 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T376150|T376150]], initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards * 22:37 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T376150|T376150]], initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards * 22:37 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T376150|T376150]], initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards * 22:37 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T376150|T376150]], initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards * 22:37 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T376150|T376150]], initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards * away: UTC late deploys done * 22:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2131', diff saved to https://phabricator.wikimedia.org/P71156 and previous config saved to /var/cache/conftool/dbconfig/20241125-223420-ladsgroup.json * 22:34 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1097327{{!}}SUL3: Sort overrides (T373737)]], [[gerrit:1097328{{!}}More authentication domain overrides (T373737)]], [[gerrit:1097322{{!}}Update private/readme.php to match production]] (duration: 12m 49s) * 22:32 eileen: civicrm upgraded from {{Gerrit|b7bd670f}} to {{Gerrit|190ea417}} * 22:31 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs7003.magru.wmnet with OS bullseye * 22:27 tgr@deploy2002: tgr: Continuing with sync * 22:25 tgr@deploy2002: tgr: Backport for [[gerrit:1097327{{!}}SUL3: Sort overrides (T373737)]], [[gerrit:1097328{{!}}More authentication domain overrides (T373737)]], [[gerrit:1097322{{!}}Update private/readme.php to match production]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 22:21 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1097327{{!}}SUL3: Sort overrides (T373737)]], [[gerrit:1097328{{!}}More authentication domain overrides (T373737)]], [[gerrit:1097322{{!}}Update private/readme.php to match production]] * 22:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2131', diff saved to https://phabricator.wikimedia.org/P71155 and previous config saved to /var/cache/conftool/dbconfig/20241125-221913-ladsgroup.json * 22:19 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1097518{{!}}Reader Survey: Increase coverage (T378660)]] (duration: 14m 08s) * 22:13 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs7003.magru.wmnet with reason: host reimage * 22:12 tgr@deploy2002: tgr, dani: Continuing with sync * 22:09 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs7003.magru.wmnet with reason: host reimage * 22:09 tgr@deploy2002: tgr, dani: Backport for [[gerrit:1097518{{!}}Reader Survey: Increase coverage (T378660)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 22:09 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp7015.magru.wmnet with OS bullseye * 22:08 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp7015.magru.wmnet with OS bullseye * 22:04 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1097518{{!}}Reader Survey: Increase coverage (T378660)]] * 22:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2131 ([[phab:T380449|T380449]])', diff saved to https://phabricator.wikimedia.org/P71154 and previous config saved to /var/cache/conftool/dbconfig/20241125-220406-ladsgroup.json * 22:02 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1097457{{!}}LoginCompleteHookHandler: onTempUserCreatedRedirect() should use getPrimaryInstance() (T380042)]] (duration: 12m 41s) * 21:56 tgr@deploy2002: tgr, matmarex: Continuing with sync * 21:54 tgr@deploy2002: tgr, matmarex: Backport for [[gerrit:1097457{{!}}LoginCompleteHookHandler: onTempUserCreatedRedirect() should use getPrimaryInstance() (T380042)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:50 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1097457{{!}}LoginCompleteHookHandler: onTempUserCreatedRedirect() should use getPrimaryInstance() (T380042)]] * 21:49 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1094054{{!}}Reader Survey: Increase coverage on enwiki (T378660)]] (duration: 16m 06s) * 21:46 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp7015.magru.wmnet with OS bullseye * 21:45 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp7015.magru.wmnet with OS bullseye * 21:42 tgr@deploy2002: tgr, dani: Continuing with sync * 21:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2131 ([[phab:T380449|T380449]])', diff saved to https://phabricator.wikimedia.org/P71153 and previous config saved to /var/cache/conftool/dbconfig/20241125-213904-ladsgroup.json * 21:38 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2131.codfw.wmnet with reason: Maintenance * 21:38 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2131.codfw.wmnet with reason: Maintenance * 21:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2115 ([[phab:T380449|T380449]])', diff saved to https://phabricator.wikimedia.org/P71152 and previous config saved to /var/cache/conftool/dbconfig/20241125-213841-ladsgroup.json * 21:37 tgr@deploy2002: tgr, dani: Backport for [[gerrit:1094054{{!}}Reader Survey: Increase coverage on enwiki (T378660)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:33 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1094054{{!}}Reader Survey: Increase coverage on enwiki (T378660)]] * 21:31 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs7003.magru.wmnet with OS bullseye * 21:30 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs7003.magru.wmnet with OS bullseye * 21:29 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1097417{{!}}Reader Survey: Fix yes/no messages (T378660)]] (duration: 16m 02s) * 21:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2115', diff saved to https://phabricator.wikimedia.org/P71151 and previous config saved to /var/cache/conftool/dbconfig/20241125-212334-ladsgroup.json * 21:22 tgr@deploy2002: dani, tgr: Continuing with sync * 21:18 sukhe@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "testing - sukhe@cumin1002" * 21:18 sukhe@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "testing - sukhe@cumin1002" * 21:17 tgr@deploy2002: dani, tgr: Backport for [[gerrit:1097417{{!}}Reader Survey: Fix yes/no messages (T378660)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:13 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1097417{{!}}Reader Survey: Fix yes/no messages (T378660)]] * 21:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2115', diff saved to https://phabricator.wikimedia.org/P71150 and previous config saved to /var/cache/conftool/dbconfig/20241125-210827-ladsgroup.json * 21:04 brett@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Host reimage - brett@cumin2002 - brett@cumin2002" * 21:04 brett@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Host reimage - brett@cumin2002 - brett@cumin2002" * 21:03 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts restbase[2021-2023].codfw.wmnet * 21:03 eevans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:03 eevans@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: restbase[2021-2023].codfw.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 21:03 eevans@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: restbase[2021-2023].codfw.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:59 eevans@cumin1002: START - Cookbook sre.dns.netbox * 20:57 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7008.magru.wmnet with OS bullseye * 20:57 brett@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002" * 20:56 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - robh@cumin2002" * 20:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2115 ([[phab:T380449|T380449]])', diff saved to https://phabricator.wikimedia.org/P71149 and previous config saved to /var/cache/conftool/dbconfig/20241125-205320-ladsgroup.json * 20:51 eevans@cumin1002: START - Cookbook sre.hosts.decommission for hosts restbase[2021-2023].codfw.wmnet * 20:51 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:51 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru reshuffle - robh@cumin2002" * 20:50 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru reshuffle - robh@cumin2002" * 20:45 robh@cumin2002: START - Cookbook sre.dns.netbox * 20:45 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dns7001 * 20:45 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host dns7001 * 20:44 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs7003.magru.wmnet with OS bullseye * 20:43 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs7003.magru.wmnet with OS bullseye * 20:40 brett@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002" * 20:34 robh@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti7004.magru.wmnet with reason: host reimage * 20:31 robh@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti7004.magru.wmnet with reason: host reimage * 20:26 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs7003.magru.wmnet with OS bullseye * 20:24 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp7015.magru.wmnet with OS bullseye * 20:12 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7008.magru.wmnet with reason: host reimage * 20:09 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7008.magru.wmnet with reason: host reimage * 20:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2115 ([[phab:T380449|T380449]])', diff saved to https://phabricator.wikimedia.org/P71147 and previous config saved to /var/cache/conftool/dbconfig/20241125-200031-ladsgroup.json * 20:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2115.codfw.wmnet with reason: Maintenance * 20:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2115.codfw.wmnet with reason: Maintenance * 20:00 robh@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti7004.magru.wmnet with OS bookworm * 19:58 robh@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti7003.magru.wmnet with OS bookworm * 19:58 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - robh@cumin2002" * 19:56 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - robh@cumin2002" * 19:50 eevans@cumin1002: conftool action : set/weight=10; selector: cluster=restbase,dc=codfw,name=restbase2038.codfw.wmnet * 19:50 eevans@cumin1002: conftool action : set/weight=10; selector: cluster=restbase,dc=codfw,name=restbase2037.codfw.wmnet * 19:50 eevans@cumin1002: conftool action : set/weight=10; selector: cluster=restbase,dc=codfw,name=restbase2036.codfw.wmnet * 19:43 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp7008.magru.wmnet with OS bullseye * 19:43 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp7008.magru.wmnet with OS bullseye * 19:37 ejegg: fundraising civicrm upgraded from {{Gerrit|3311520a}} to {{Gerrit|b7bd670f}} * 19:36 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1095126{{!}}[Growth] enwiki: Deploy Add Link to 2% of new users (T377631)]] (duration: 11m 59s) * 19:35 robh@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti7003.magru.wmnet with reason: host reimage * 19:31 robh@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti7003.magru.wmnet with reason: host reimage * 19:29 urbanecm@deploy2002: urbanecm: Continuing with sync * 19:28 urbanecm@deploy2002: urbanecm: Backport for [[gerrit:1095126{{!}}[Growth] enwiki: Deploy Add Link to 2% of new users (T377631)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:24 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1095126{{!}}[Growth] enwiki: Deploy Add Link to 2% of new users (T377631)]] * 19:18 swfrench@deploy2002: Finished scap sync-world: Deployment to pick up new php 8.1 base images (duration: 09m 37s) * 19:14 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:14 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru reshuffle - robh@cumin2002" * 19:14 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru reshuffle - robh@cumin2002" * 19:11 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance * 19:11 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance * 19:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1237 ([[phab:T380449|T380449]])', diff saved to https://phabricator.wikimedia.org/P71144 and previous config saved to /var/cache/conftool/dbconfig/20241125-191124-ladsgroup.json * 19:10 robh@cumin2002: START - Cookbook sre.dns.netbox * 19:10 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs7003 * 19:10 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs7003 * 19:08 swfrench@deploy2002: Started scap sync-world: Deployment to pick up new php 8.1 base images * 19:06 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp7008.magru.wmnet with OS bullseye * 19:06 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp7008.magru.wmnet with OS bullseye * 19:02 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7006.magru.wmnet with OS bullseye * 19:02 brett@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002" * 18:59 robh@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti7003.magru.wmnet with OS bookworm * 18:59 brett@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002" * 18:59 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:59 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru reshuffle - robh@cumin2002" * 18:59 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru reshuffle - robh@cumin2002" * 18:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1237', diff saved to https://phabricator.wikimedia.org/P71143 and previous config saved to /var/cache/conftool/dbconfig/20241125-185617-ladsgroup.json * 18:53 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs7003 * 18:53 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs7003 * 18:53 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp7008.magru.wmnet with OS bullseye * 18:52 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp7008.magru.wmnet with OS bullseye * 18:49 jayme@cumin2002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on D<nowiki>{</nowiki>wikikube-worker[2128-2170].codfw.wmnet<nowiki>}</nowiki> and (A:wikikube-staging-worker-codfw or A:wikikube-staging-master-codfw or A:wikikube-staging-worker-eqiad or A:wikikube-staging-master-eqiad or A:wikikube-worker-codfw or A:wikikube-master-codfw or A:wikikube-worker-eqiad or A:wikikube-master-eqiad or A:ml-serve-worker-eqiad or A:ml-se * 18:48 krinkle@deploy2002: Finished deploy [statsv/statsv@6678d4b]: {{Gerrit|I7a8d831817}}: remove unused statsvr.py (duration: 00m 09s) * 18:48 krinkle@deploy2002: Started deploy [statsv/statsv@6678d4b]: {{Gerrit|I7a8d831817}}: remove unused statsvr.py * 18:45 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp7015 * 18:45 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp7015 * 18:45 robh@cumin2002: START - Cookbook sre.dns.netbox * 18:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1237', diff saved to https://phabricator.wikimedia.org/P71142 and previous config saved to /var/cache/conftool/dbconfig/20241125-184110-ladsgroup.json * 18:34 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp7008.magru.wmnet with OS bullseye * 18:33 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7006.magru.wmnet with reason: host reimage * 18:31 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:29 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7006.magru.wmnet with reason: host reimage * 18:28 robh@cumin2002: START - Cookbook sre.dns.netbox * 18:28 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cp7015.magru.wmnet * 18:28 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:27 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp7015.magru.wmnet decommissioned, removing all IPs except the asset tag one - robh@cumin2002" * 18:27 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp7015.magru.wmnet decommissioned, removing all IPs except the asset tag one - robh@cumin2002" * 18:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1237 ([[phab:T380449|T380449]])', diff saved to https://phabricator.wikimedia.org/P71141 and previous config saved to /var/cache/conftool/dbconfig/20241125-182603-ladsgroup.json * 18:24 robh@cumin2002: START - Cookbook sre.dns.netbox * 18:18 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp7015.magru.wmnet * 18:17 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts lvs7003.magru.wmnet * 18:17 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:17 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs7003.magru.wmnet decommissioned, removing all IPs except the asset tag one - robh@cumin2002" * 18:16 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs7003.magru.wmnet decommissioned, removing all IPs except the asset tag one - robh@cumin2002" * 18:13 robh@cumin2002: START - Cookbook sre.dns.netbox * 18:08 swfrench-wmf: rebuilt php8.1 production images to pick up 8.1.31 * 18:08 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1097310{{!}}Migrate to virtual domains (T354939)]], [[gerrit:1097369{{!}}createExtensionTables: Use virtual domains for GrowthExperiments (T354939)]] (duration: 13m 18s) * 18:04 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp7006.magru.wmnet with OS bullseye * 18:03 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp7006.magru.wmnet with OS bullseye * 18:03 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts lvs7003.magru.wmnet * 18:02 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:02 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru reshuffle - robh@cumin2002" * 18:02 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru reshuffle - robh@cumin2002" * 18:01 urbanecm@deploy2002: urbanecm: Continuing with sync * 17:59 urbanecm@deploy2002: urbanecm: Backport for [[gerrit:1097310{{!}}Migrate to virtual domains (T354939)]], [[gerrit:1097369{{!}}createExtensionTables: Use virtual domains for GrowthExperiments (T354939)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 17:58 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti7004 * 17:58 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti7004 * 17:57 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp7008 * 17:57 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp7008 * 17:56 ryankemper@deploy2002: Finished deploy [wdqs/wdqs@9927a5a] (wcqs): Deploy 0.3.150 to WCQS (duration: 02m 53s) * 17:55 robh@cumin2002: START - Cookbook sre.dns.netbox * 17:54 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1097310{{!}}Migrate to virtual domains (T354939)]], [[gerrit:1097369{{!}}createExtensionTables: Use virtual domains for GrowthExperiments (T354939)]] * 17:53 ryankemper@deploy2002: Started deploy [wdqs/wdqs@9927a5a] (wcqs): Deploy 0.3.150 to WCQS * 17:49 ryankemper: [[phab:T378260|T378260]] `snapshot1016.eqiad.wmnet` => manually deleted `cirrussearch-dump-s11.[timer,service]` * 17:49 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7001.magru.wmnet with OS bullseye * 17:49 fabfur@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1002" * 17:46 fabfur@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1002" * 17:44 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp7006.magru.wmnet with OS bullseye * 17:41 robh@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cp7008.magru.wmnet * 17:41 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:39 robh@cumin2002: START - Cookbook sre.dns.netbox * 17:39 robh@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti7004.magru.wmnet * 17:39 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:39 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti7004.magru.wmnet decommissioned, removing all IPs except the asset tag one - robh@cumin2002" * 17:39 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti7004.magru.wmnet decommissioned, removing all IPs except the asset tag one - robh@cumin2002" * 17:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1237 ([[phab:T380449|T380449]])', diff saved to https://phabricator.wikimedia.org/P71140 and previous config saved to /var/cache/conftool/dbconfig/20241125-173511-ladsgroup.json * 17:35 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1237.eqiad.wmnet with reason: Maintenance * 17:34 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1237.eqiad.wmnet with reason: Maintenance * 17:34 robh@cumin2002: START - Cookbook sre.dns.netbox * 17:29 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp7008.magru.wmnet * 17:29 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti7004.magru.wmnet * 17:23 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:23 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru reshuffle - robh@cumin2002" * 17:22 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru reshuffle - robh@cumin2002" * 17:19 robh@cumin2002: START - Cookbook sre.dns.netbox * 17:17 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7001.magru.wmnet with reason: host reimage * 17:14 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7001.magru.wmnet with reason: host reimage * 17:10 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cp7006.magru.wmnet * 17:10 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:10 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp7006.magru.wmnet decommissioned, removing all IPs except the asset tag one - robh@cumin2002" * 17:10 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp7006.magru.wmnet decommissioned, removing all IPs except the asset tag one - robh@cumin2002" * 17:06 robh@cumin2002: START - Cookbook sre.dns.netbox * 16:59 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp7006.magru.wmnet * 16:59 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ganeti7003.magru.wmnet * 16:59 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:58 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti7003.magru.wmnet decommissioned, removing all IPs except the asset tag one - robh@cumin2002" * 16:58 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti7003.magru.wmnet decommissioned, removing all IPs except the asset tag one - robh@cumin2002" * 16:55 robh@cumin2002: START - Cookbook sre.dns.netbox * 16:49 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti7003.magru.wmnet * 16:47 fabfur@cumin1002: START - Cookbook sre.hosts.reimage for host cp7001.magru.wmnet with OS bullseye * 16:45 hashar@deploy2002: Pruned MediaWiki: 1.44.0-wmf.2 (duration: 03m 05s) * 16:44 dcaro@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephmon1004.eqiad.wmnet with OS bullseye * 16:42 hashar@deploy2002: Installation of scap version "4.129.0" completed for 211 hosts * 16:42 swfrench-wmf: uploaded php8.1 8.1.31-1+wmf11u1 to apt.w.o (16:25 UTC) * 16:38 hashar@deploy2002: Installing scap version "4.129.0" for 211 hosts * 16:27 hashar@deploy2002: Installation of scap version "4.128.0" completed for 211 hosts * 16:27 Dreamy_Jazz: Restarted MediaModeration scanning script - https://wikitech.wikimedia.org/wiki/MediaModeration * 16:23 hashar@deploy2002: Installing scap version "4.128.0" for 211 hosts * 16:19 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1225.eqiad.wmnet with reason: Maintenance * 16:19 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1225.eqiad.wmnet with reason: Maintenance * 16:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224 ([[phab:T380449|T380449]])', diff saved to https://phabricator.wikimedia.org/P71138 and previous config saved to /var/cache/conftool/dbconfig/20241125-161915-ladsgroup.json * 16:05 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dns7001.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:05 jayme@cumin2002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on D<nowiki>{</nowiki>wikikube-worker[1305-1312].eqiad.wmnet<nowiki>}</nowiki> and (A:wikikube-staging-worker-codfw or A:wikikube-staging-master-codfw or A:wikikube-staging-worker-eqiad or A:wikikube-staging-master-eqiad or A:wikikube-worker-codfw or A:wikikube-master-codfw or A:wikikube-worker-eqiad or A:wikikube-master-eqiad or A:ml-serve-worker-eqiad or A:ml-se * 16:04 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 16:04 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on A:cp-text_drmrs and A:cp for 9.2.6-1wm2 * 16:04 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 16:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P71134 and previous config saved to /var/cache/conftool/dbconfig/20241125-160408-ladsgroup.json * 16:02 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on A:cp-upload_drmrs and A:cp for 9.2.6-1wm2 * 15:58 robh@cumin2002: START - Cookbook sre.hosts.provision for host dns7001.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:52 Lucas_WMDE: UTC afternoon backport+config window done (apologies for the temporary flood of “Use of QuickSurveys survey” deprecation warnings – should be fixed again) * 15:52 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dns7001.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:49 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1097410{{!}}Reader Survey: Fix question (T378660)]] (duration: 13m 02s) * 15:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P71133 and previous config saved to /var/cache/conftool/dbconfig/20241125-154901-ladsgroup.json * 15:48 robh@cumin2002: START - Cookbook sre.hosts.provision for host dns7001.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:47 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:47 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru swaps - robh@cumin2002" * 15:46 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru swaps - robh@cumin2002" * 15:46 claime: homer cr*eqiad* commit '[[phab:T380027|T380027]]' * 15:42 robh@cumin2002: START - Cookbook sre.dns.netbox * 15:41 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, dani: Continuing with sync * 15:41 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts kubernetes[1009-1014].eqiad.wmnet * 15:41 cgoubert@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 15:40 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wmde: apply * 15:40 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wmde: apply * 15:40 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, dani: Backport for [[gerrit:1097410{{!}}Reader Survey: Fix question (T378660)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:39 cgoubert@cumin1002: START - Cookbook sre.dns.netbox * 15:38 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1309.eqiad.wmnet with OS bookworm * 15:37 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on backup2011.codfw.wmnet with reason: Reboot * 15:37 jynus@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on backup2011.codfw.wmnet with reason: Reboot * 15:37 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on backup2010.codfw.wmnet with reason: Reboot * 15:37 jynus@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on backup2010.codfw.wmnet with reason: Reboot * 15:36 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1097410{{!}}Reader Survey: Fix question (T378660)]] * 15:36 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp7001.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224 ([[phab:T380449|T380449]])', diff saved to https://phabricator.wikimedia.org/P71132 and previous config saved to /var/cache/conftool/dbconfig/20241125-153354-ladsgroup.json * 15:31 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp7001.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:23 lucaswerkmeister-wmde@deploy2002: dani, lucaswerkmeister-wmde: Continuing with sync * 15:21 lucaswerkmeister-wmde@deploy2002: dani, lucaswerkmeister-wmde: Backport for [[gerrit:1093987{{!}}Reader Survey: Deploy on enwiki (T378660)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:19 cgoubert@cumin1002: START - Cookbook sre.hosts.decommission for hosts kubernetes[1009-1014].eqiad.wmnet * 15:18 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1309.eqiad.wmnet with reason: host reimage * 15:18 robh@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:17 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1093987{{!}}Reader Survey: Deploy on enwiki (T378660)]] * 15:15 robh@cumin1002: START - Cookbook sre.dns.netbox * 15:15 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1309.eqiad.wmnet with reason: host reimage * 15:15 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1094511{{!}}New stream config for Android Rabbit Holes feature. (T380107)]] (duration: 15m 45s) * 15:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1224 ([[phab:T380449|T380449]])', diff saved to https://phabricator.wikimedia.org/P71131 and previous config saved to /var/cache/conftool/dbconfig/20241125-151103-ladsgroup.json * 15:10 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1224.eqiad.wmnet with reason: Maintenance * 15:10 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1224.eqiad.wmnet with reason: Maintenance * 15:08 lucaswerkmeister-wmde@deploy2002: dbrant, lucaswerkmeister-wmde: Continuing with sync * 15:03 lucaswerkmeister-wmde@deploy2002: dbrant, lucaswerkmeister-wmde: Backport for [[gerrit:1094511{{!}}New stream config for Android Rabbit Holes feature. (T380107)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:02 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on A:cp-text_drmrs and A:cp for 9.2.6-1wm2 * 15:02 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on A:cp-upload_drmrs and A:cp for 9.2.6-1wm2 * 14:59 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubernetes[1009-1014].eqiad.wmnet * 14:59 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1094511{{!}}New stream config for Android Rabbit Holes feature. (T380107)]] * 14:57 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1097381{{!}}Pass context to 'revreview-pending-basic' on history page (T380519)]], [[gerrit:1097382{{!}}Use Contexts for Message objects in review dialog (tooltip) (T380519)]] (duration: 15m 35s) * 14:56 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:56 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove vlan1107 IPv6 entries - cmooney@cumin1002" * 14:56 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host kubernetes[1009-1014].eqiad.wmnet * 14:54 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove vlan1107 IPv6 entries - cmooney@cumin1002" * 14:54 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1309.eqiad.wmnet with OS bookworm * 14:52 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker1309.eqiad.wmnet * 14:52 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker1309.eqiad.wmnet * 14:52 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 14:50 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, matmarex: Continuing with sync * 14:49 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on A:cp-text_codfw and A:cp for 9.2.6-1wm2 * 14:48 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, matmarex: Backport for [[gerrit:1097381{{!}}Pass context to 'revreview-pending-basic' on history page (T380519)]], [[gerrit:1097382{{!}}Use Contexts for Message objects in review dialog (tooltip) (T380519)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:47 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[1310-1312].eqiad.wmnet * 14:47 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[1310-1312].eqiad.wmnet * 14:47 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on A:cp-upload_codfw and A:cp for 9.2.6-1wm2 * 14:44 cmooney@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 14:44 cmooney@cumin1002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add reverse IPv6 includes to dns repo for vlan1107 - cmooney@cumin1002" * 14:42 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add reverse IPv6 includes to dns repo for vlan1107 - cmooney@cumin1002" * 14:41 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1097381{{!}}Pass context to 'revreview-pending-basic' on history page (T380519)]], [[gerrit:1097382{{!}}Use Contexts for Message objects in review dialog (tooltip) (T380519)]] * 14:39 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 14:26 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Add tooltips - oblivian@cumin1002" * 14:26 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Add tooltips - oblivian@cumin1002 * 14:26 moritzm: prune unneeded kernels from grafana2001 * 14:26 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Add tooltips - oblivian@cumin1002 * 14:26 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Add tooltips - oblivian@cumin1002" * 14:20 claime: Manually deleting wikikube-worker13[13-20].eqiad.wmnet for ip exhaustion [[phab:T375845|T375845]] * 14:19 claime: disable puppet and kubelet on wikikube-worker13[13-28].eqiad.wmnet for ip exhaustion [[phab:T375845|T375845]] * 14:12 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 14:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host es2045.codfw.wmnet * 14:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host es2046.codfw.wmnet * 14:02 aborrero@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:01 aborrero@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudgw updates - aborrero@cumin1002" * 14:01 aborrero@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudgw updates - aborrero@cumin1002" * 13:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host es2046.codfw.wmnet * 13:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host es2045.codfw.wmnet * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host es2044.codfw.wmnet * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host es2043.codfw.wmnet * 13:51 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on A:cp-upload_codfw and A:cp for 9.2.6-1wm2 * 13:51 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on A:cp-text_codfw and A:cp for 9.2.6-1wm2 * 13:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host es2044.codfw.wmnet * 13:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host es2043.codfw.wmnet * 13:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host es2042.codfw.wmnet * 13:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host es2041.codfw.wmnet * 13:47 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 13:47 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 13:47 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1318.eqiad.wmnet with OS bookworm * 13:46 jayme: deployed sessionstore to non-dedicated nodes - [[phab:T379599|T379599]] * 13:44 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/services/sessionstore: apply * 13:44 jayme@deploy2002: helmfile [codfw] START helmfile.d/services/sessionstore: apply * 13:43 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1320.eqiad.wmnet with OS bookworm * 13:43 jayme: cordoned kubernetes[2005-2006,2015-2016].codfw.wmnet,kubernetes[1005-1006,1015-1016].eqiad.wmnet - [[phab:T379599|T379599]] * 13:42 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/services/sessionstore: apply * 13:42 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host es2042.codfw.wmnet * 13:42 aborrero@cumin1002: START - Cookbook sre.dns.netbox * 13:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host es2041.codfw.wmnet * 13:41 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1314.eqiad.wmnet with OS bookworm * 13:40 btullis@cumin1002: END (PASS) - Cookbook sre.ceph.roll-restart-reboot-server (exit_code=0) rolling reboot on A:cephosd and (A:cephosd) * 13:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host db1246.eqiad.wmnet * 13:38 jayme@deploy2002: helmfile [eqiad] START helmfile.d/services/sessionstore: apply * 13:38 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply * 13:38 jayme@deploy2002: helmfile [staging] START helmfile.d/services/sessionstore: apply * 13:37 andrewtavis-wmde@deploy2002: Finished deploy [airflow-dags/wmde@006515b]: Testing the new k8s deployment (duration: 02m 34s) * 13:37 andrewtavis-wmde@deploy2002: Started deploy [airflow-dags/wmde@006515b]: Testing the new k8s deployment * 13:35 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1319.eqiad.wmnet with OS bookworm * 13:32 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1315.eqiad.wmnet with OS bookworm * 13:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host db1246.eqiad.wmnet * 13:32 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 13:30 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1317.eqiad.wmnet with OS bookworm * 13:28 jayme@cumin2002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on D<nowiki>{</nowiki>wikikube-worker[1305-1312].eqiad.wmnet<nowiki>}</nowiki> and (A:wikikube-staging-worker-codfw or A:wikikube-staging-master-codfw or A:wikikube-staging-worker-eqiad or A:wikikube-staging-master-eqiad or A:wikikube-worker-codfw or A:wikikube-master-codfw or A:wikikube-worker-eqiad or A:wikikube-master-eqiad or A:ml-serve-worker-eqiad or A:ml-serve-master-eqiad or * 13:28 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1318.eqiad.wmnet with reason: host reimage * 13:28 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1316.eqiad.wmnet with OS bookworm * 13:27 jayme@cumin2002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on D<nowiki>{</nowiki>wikikube-worker[2128-2170].codfw.wmnet<nowiki>}</nowiki> and (A:wikikube-staging-worker-codfw or A:wikikube-staging-master-codfw or A:wikikube-staging-worker-eqiad or A:wikikube-staging-master-eqiad or A:wikikube-worker-codfw or A:wikikube-master-codfw or A:wikikube-worker-eqiad or A:wikikube-master-eqiad or A:ml-serve-worker-eqiad or A:ml-serve-master-eqiad or * 13:24 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1320.eqiad.wmnet with reason: host reimage * 13:24 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1313.eqiad.wmnet with OS bookworm * 13:23 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wmde: apply * 13:23 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wmde: apply * 13:21 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1314.eqiad.wmnet with reason: host reimage * 13:17 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1179 gradually with 4 steps - Maint over * 13:17 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1319.eqiad.wmnet with reason: host reimage * 13:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: [[phab:T373579|T373579]], host is WIP * 13:16 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: [[phab:T373579|T373579]], host is WIP * 13:13 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1315.eqiad.wmnet with reason: host reimage * 13:11 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1317.eqiad.wmnet with reason: host reimage * 13:09 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wmde: apply * 13:08 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1316.eqiad.wmnet with reason: host reimage * 13:08 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wmde: apply * 13:06 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1320.eqiad.wmnet with reason: host reimage * 13:05 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp7015.magru.wmnet with reason: [[phab:T376737|T376737]] * 13:05 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp7015.magru.wmnet with reason: [[phab:T376737|T376737]] * 13:05 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp7008.magru.wmnet with reason: [[phab:T376737|T376737]] * 13:05 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp7008.magru.wmnet with reason: [[phab:T376737|T376737]] * 13:05 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1319.eqiad.wmnet with reason: host reimage * 13:05 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp7006.magru.wmnet with reason: [[phab:T376737|T376737]] * 13:04 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1313.eqiad.wmnet with reason: host reimage * 13:04 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp7006.magru.wmnet with reason: [[phab:T376737|T376737]] * 13:04 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp7001.magru.wmnet with reason: [[phab:T376737|T376737]] * 13:04 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp7001.magru.wmnet with reason: [[phab:T376737|T376737]] * 13:04 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on lvs7003.magru.wmnet with reason: [[phab:T376737|T376737]] * 13:04 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs7003.magru.wmnet with reason: [[phab:T376737|T376737]] * 13:04 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1318.eqiad.wmnet with reason: host reimage * 13:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aux-k8s-etcd2005.codfw.wmnet * 13:03 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1317.eqiad.wmnet with reason: host reimage * 13:03 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti7004.magru.wmnet with reason: [[phab:T376737|T376737]] * 13:03 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ganeti7004.magru.wmnet with reason: [[phab:T376737|T376737]] * 13:03 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti7003.magru.wmnet with reason: [[phab:T376737|T376737]] * 13:03 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1316.eqiad.wmnet with reason: host reimage * 13:03 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ganeti7003.magru.wmnet with reason: [[phab:T376737|T376737]] * 13:03 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wmde: apply * 13:02 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dns7001.wikimedia.org with reason: [[phab:T376737|T376737]] * 13:02 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 13:02 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dns7001.wikimedia.org with reason: [[phab:T376737|T376737]] * 13:02 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wmde: apply * 13:02 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1315.eqiad.wmnet with reason: host reimage * 13:02 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp7001.magru.wmnet with reason: [[phab:T376737|T376737]] * 13:02 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp7001.magru.wmnet with reason: [[phab:T376737|T376737]] * 13:02 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-search: apply * 13:02 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1314.eqiad.wmnet with reason: host reimage * 13:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aux-k8s-etcd2004.codfw.wmnet * 13:01 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-search: apply * 13:01 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 13:01 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-research: apply * 13:00 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1313.eqiad.wmnet with reason: host reimage * 13:00 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-research: apply * 13:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host aux-k8s-etcd2005.codfw.wmnet * 12:59 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-platform-eng: apply * 12:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aux-k8s-etcd2003.codfw.wmnet * 12:59 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-platform-eng: apply * 12:59 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-product: apply * 12:58 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-product: apply * 12:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aux-k8s-ctrl2003.codfw.wmnet * 12:58 jayme@cumin2002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on D<nowiki>{</nowiki>kubestage100[5-6].eqiad.wmnet<nowiki>}</nowiki> and (A:wikikube-staging-worker-codfw or A:wikikube-staging-master-codfw or A:wikikube-staging-worker-eqiad or A:wikikube-staging-master-eqiad or A:wikikube-worker-codfw or A:wikikube-master-codfw or A:wikikube-worker-eqiad or A:wikikube-master-eqiad or A:ml-serve-worker-eqiad or A:ml-serve-maste * 12:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host aux-k8s-etcd2004.codfw.wmnet * 12:57 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 12:56 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 12:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host aux-k8s-etcd2003.codfw.wmnet * 12:54 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host aux-k8s-ctrl2003.codfw.wmnet * 12:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aux-k8s-ctrl2002.codfw.wmnet * 12:48 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 12:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aux-k8s-worker2005.codfw.wmnet * 12:48 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 12:47 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on backup1011.eqiad.wmnet with reason: Reboot * 12:47 jynus@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on backup1011.eqiad.wmnet with reason: Reboot * 12:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host aux-k8s-ctrl2002.codfw.wmnet * 12:44 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host aux-k8s-worker2005.codfw.wmnet * 12:44 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1320.eqiad.wmnet with OS bookworm * 12:43 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1319.eqiad.wmnet with OS bookworm * 12:43 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1318.eqiad.wmnet with OS bookworm * 12:43 jayme@cumin2002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on D<nowiki>{</nowiki>kubestage100[5-6].eqiad.wmnet<nowiki>}</nowiki> and (A:wikikube-staging-worker-codfw or A:wikikube-staging-master-codfw or A:wikikube-staging-worker-eqiad or A:wikikube-staging-master-eqiad or A:wikikube-worker-codfw or A:wikikube-master-codfw or A:wikikube-worker-eqiad or A:wikikube-master-eqiad or A:ml-serve-worker-eqiad or A:ml-serve-master-eqiad or A:ml-ser * 12:42 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1317.eqiad.wmnet with OS bookworm * 12:42 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1316.eqiad.wmnet with OS bookworm * 12:42 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on backup1010.eqiad.wmnet with reason: Reboot * 12:41 jynus@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on backup1010.eqiad.wmnet with reason: Reboot * 12:41 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1315.eqiad.wmnet with OS bookworm * 12:41 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1314.eqiad.wmnet with OS bookworm * 12:40 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1313.eqiad.wmnet with OS bookworm * 12:33 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on A:cp-eqsin and not (P<nowiki>{</nowiki>cp5018.*<nowiki>}</nowiki> or P<nowiki>{</nowiki>cp5026.*<nowiki>}</nowiki>) and A:cp for 9.2.6-1wm2 * 12:32 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool db1179 gradually with 4 steps - Maint over * 12:28 btullis@cumin1002: START - Cookbook sre.ceph.roll-restart-reboot-server rolling reboot on A:cephosd and (A:cephosd) * 12:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aux-k8s-worker2004.codfw.wmnet * 12:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aux-k8s-worker2003.codfw.wmnet * 12:23 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host aux-k8s-worker2004.codfw.wmnet * 12:22 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host aux-k8s-worker2003.codfw.wmnet * 12:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aux-k8s-worker2002.codfw.wmnet * 12:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host build2002.codfw.wmnet * 12:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host aux-k8s-worker2002.codfw.wmnet * 12:06 hashar@deploy2002: Pruned MediaWiki: 1.39.0-wmf.1 (duration: 00m 40s) * 12:06 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host build2002.codfw.wmnet * 12:03 hashar@deploy2002: Pruned MediaWiki: 1.39.0-wmf.1 (duration: 00m 37s) * 11:56 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker1256.eqiad.wmnet * 11:56 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker1256.eqiad.wmnet * 11:51 hashar@deploy2002: Installation of scap version "4.128.0" completed for 211 hosts * 11:49 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) check for host wikikube-worker1290.eqiad.wmnet * 11:49 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node check for host wikikube-worker1290.eqiad.wmnet * 11:47 hashar@deploy2002: Installing scap version "4.128.0" for 211 hosts * 11:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1179 ([[phab:T380449|T380449]])', diff saved to https://phabricator.wikimedia.org/P71125 and previous config saved to /var/cache/conftool/dbconfig/20241125-114651-ladsgroup.json * 11:46 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance * 11:46 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance * 11:41 claime: homer 'cr*eqiad*' commit '[[phab:T379454|T379454]]' * 11:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1256.eqiad.wmnet with OS bookworm * 11:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - cgoubert@cumin1002" * 11:39 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - cgoubert@cumin1002" * 11:34 hashar@deploy2002: Installing scap version "4.128.0" for 211 hosts * 11:24 moritzm: installing Linux 6.1.119 on Bookworm nodes * 11:20 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1256.eqiad.wmnet with reason: host reimage * 11:18 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1256.eqiad.wmnet with reason: host reimage * 11:03 fabfur@cumin1002: conftool action : set/pooled=no; selector: cluster=dnsbox,dc=magru * 11:02 fabfur: depooling dnsboxes @ magru for hardware swap ([[phab:T376737|T376737]]) * 11:02 fabfur@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site magru [reason: depool magru for hw swap, [[phab:T376737|T376737]]] * 11:01 fabfur@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site magru [reason: depool magru for hw swap, [[phab:T376737|T376737]]] * 11:01 fabfur: depooling magru for hardware swap ([[phab:T376737|T376737]]) * 10:40 hashar@deploy2002: Finished deploy [integration/docroot@d585f2b]: build: Updating cross-spawn to 7.0.6 (duration: 00m 10s) * 10:40 hashar@deploy2002: Started deploy [integration/docroot@d585f2b]: build: Updating cross-spawn to 7.0.6 * 10:38 _joe_: deleted pyall component from reprepro * 10:35 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on A:cp-eqsin and not (P<nowiki>{</nowiki>cp5018.*<nowiki>}</nowiki> or P<nowiki>{</nowiki>cp5026.*<nowiki>}</nowiki>) and A:cp for 9.2.6-1wm2 * 10:25 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on A:cp-ulsfo and not (P<nowiki>{</nowiki>cp4043.*<nowiki>}</nowiki> or P<nowiki>{</nowiki>cp4051.*<nowiki>}</nowiki>) and A:cp for 9.2.6-1wm2 * 10:17 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host thanos-be1005.eqiad.wmnet * 10:11 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 10:11 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 10:10 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 10:10 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 10:07 jynus: extending backup1009 free filesystem * 10:06 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be1005.eqiad.wmnet * 09:58 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host thanos-be2005.codfw.wmnet * 09:46 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2005.codfw.wmnet * 09:45 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host thanos-be2005.codfw.wmnet * 09:44 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 09:43 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 09:39 moritzm: remove ganeti7003 from active Ganeti nodes in magru01 [[phab:T376737|T376737]] * 09:34 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2005.codfw.wmnet * 09:32 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 09:32 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 09:25 ladsgroup@deploy2002: Finished scap sync-world: Backport for [[gerrit:1093956{{!}}Bump ratio of new parsercache key spec to 6 (T373037)]] (duration: 11m 05s) * 09:18 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of doh7001.wikimedia.org to plain * 09:18 ladsgroup@deploy2002: ladsgroup: Continuing with sync * 09:18 ladsgroup@deploy2002: ladsgroup: Backport for [[gerrit:1093956{{!}}Bump ratio of new parsercache key spec to 6 (T373037)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 09:14 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of doh7001.wikimedia.org to plain * 09:13 ladsgroup@deploy2002: Started scap sync-world: Backport for [[gerrit:1093956{{!}}Bump ratio of new parsercache key spec to 6 (T373037)]] * 09:13 dcausse: restarting blazegraph on wdqs1012 (BlazegraphFreeAllocatorsDecreasingRapidly) * 09:04 kostajh: UTC morning deploys done * 09:01 kharlan@deploy2002: Finished scap sync-world: Backport for [[gerrit:1053230{{!}}IPReputation: Enable everywhere (T360067)]] (duration: 15m 48s) * 08:53 kharlan@deploy2002: kharlan: Continuing with sync * 08:50 kharlan@deploy2002: kharlan: Backport for [[gerrit:1053230{{!}}IPReputation: Enable everywhere (T360067)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of durum7001.magru.wmnet to plain * 08:47 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of durum7001.magru.wmnet to plain * 08:46 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2179.codfw.wmnet with reason: Maintenance * 08:46 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2179.codfw.wmnet with reason: Maintenance * 08:46 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1160.eqiad.wmnet with reason: Maintenance * 08:46 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1160.eqiad.wmnet with reason: Maintenance * 08:46 kharlan@deploy2002: Started scap sync-world: Backport for [[gerrit:1053230{{!}}IPReputation: Enable everywhere (T360067)]] * 08:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2240 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P71123 and previous config saved to /var/cache/conftool/dbconfig/20241125-084531-arnaudb.json * 08:43 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of install7001.wikimedia.org to plain * 08:39 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of install7001.wikimedia.org to plain * 08:39 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1094071{{!}}Disable more extensions when using the shared login domain (T373737)]] (duration: 30m 35s) * 08:37 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on A:cp-ulsfo and not (P<nowiki>{</nowiki>cp4043.*<nowiki>}</nowiki> or P<nowiki>{</nowiki>cp4051.*<nowiki>}</nowiki>) and A:cp for 9.2.6-1wm2 * 08:30 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2240', diff saved to https://phabricator.wikimedia.org/P71122 and previous config saved to /var/cache/conftool/dbconfig/20241125-083024-arnaudb.json * 08:30 tgr@deploy2002: tgr: Continuing with sync * 08:25 tgr@deploy2002: tgr: Backport for [[gerrit:1094071{{!}}Disable more extensions when using the shared login domain (T373737)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:17 jelto@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikidata-query-gui: apply * 08:17 jelto@deploy2002: helmfile [codfw] START helmfile.d/services/wikidata-query-gui: apply * 08:17 jelto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikidata-query-gui: apply * 08:16 jelto@deploy2002: helmfile [eqiad] START helmfile.d/services/wikidata-query-gui: apply * 08:16 jelto@deploy2002: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply * 08:15 jelto@deploy2002: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply * 08:15 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2240', diff saved to https://phabricator.wikimedia.org/P71121 and previous config saved to /var/cache/conftool/dbconfig/20241125-081517-arnaudb.json * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir7001.magru.wmnet to plain * 08:10 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir7001.magru.wmnet to plain * 08:08 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1094071{{!}}Disable more extensions when using the shared login domain (T373737)]] * 08:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of netflow7001.magru.wmnet to plain * 08:00 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of netflow7001.magru.wmnet to plain * 08:00 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2240 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P71120 and previous config saved to /var/cache/conftool/dbconfig/20241125-080010-arnaudb.json * 07:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2240 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P71119 and previous config saved to /var/cache/conftool/dbconfig/20241125-075758-arnaudb.json * 07:57 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2240.codfw.wmnet with reason: Maintenance * 07:57 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2240.codfw.wmnet with reason: Maintenance * 07:56 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 07:56 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 07:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:55 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:54 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti7003.magru.wmnet * 07:53 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti7003.magru.wmnet * 07:47 moritzm: remove ganeti7004 from active Ganeti nodes in magru02 [[phab:T376737|T376737]] * 07:15 _joe_: upgrading vopsbot to 0.3.9 == 2024-11-23 == * 12:08 btullis@cumin1002: END (FAIL) - Cookbook sre.hadoop.roll-restart-masters (exit_code=99) restart masters for Hadoop test cluster: Restart of jvm daemons. * 12:05 btullis@cumin1002: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop test cluster: Restart of jvm daemons. * 02:15 urandom: decommissioning Cassandra/restbase2023-<nowiki>{</nowiki>a,b,c<nowiki>}</nowiki> — [[phab:T380236|T380236]] == 2024-11-22 == * 21:51 bking@cumin2002: conftool action : set/pooled=false; selector: dnsdisc=wdqs-internal-scholarly,name=eqiad * 21:37 bking@cumin2002: conftool action : set/pooled=yes; selector: name=wdqs2026.codfw.wmnet * 21:37 bking@cumin2002: conftool action : set/pooled=yes; selector: name=wdqs2018.codfw.wmnet * 21:33 bking@cumin2002: conftool action : set/weight=1; selector: name=wdqs2026.codfw.wmnet * 21:33 bking@cumin2002: conftool action : set/weight=1; selector: name=wdqs2018.codfw.wmnet * 21:25 bking@cumin2002: conftool action : set/pooled=yes:weight=1; selector: cluster=wdqs-scholarly,service=wdqs-internal-scholarly * 21:25 bking@cumin2002: conftool action : set/pooled=yes:weight=1; selector: cluster=wdqs-main,service=wdqs-internal-main * 20:59 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-worker2005.codfw.wmnet * 20:59 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker2005.codfw.wmnet with OS bookworm * 20:41 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker2005.codfw.wmnet with reason: host reimage * 20:37 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker2005.codfw.wmnet with reason: host reimage * 20:20 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker2005.codfw.wmnet with OS bookworm * 20:17 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker2005.codfw.wmnet - herron@cumin1002" * 20:17 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker2005.codfw.wmnet - herron@cumin1002" * 20:17 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker2005.codfw.wmnet on all recursors * 20:17 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker2005.codfw.wmnet on all recursors * 20:17 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:17 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker2005.codfw.wmnet - herron@cumin1002" * 20:17 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker2005.codfw.wmnet - herron@cumin1002" * 20:07 herron@cumin1002: START - Cookbook sre.dns.netbox * 20:07 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker2005.codfw.wmnet * 19:47 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-worker2004.codfw.wmnet * 19:47 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker2004.codfw.wmnet with OS bookworm * 19:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2045.codfw.wmnet with OS bookworm * 19:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 19:36 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 19:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2046.codfw.wmnet with OS bookworm * 19:35 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 19:32 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 19:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2043.codfw.wmnet with OS bookworm * 19:32 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 19:31 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker2004.codfw.wmnet with reason: host reimage * 19:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 19:27 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker2004.codfw.wmnet with reason: host reimage * 19:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2044.codfw.wmnet with OS bookworm * 19:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 19:26 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 19:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es2045.codfw.wmnet with reason: host reimage * 19:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es2046.codfw.wmnet with reason: host reimage * 19:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es2043.codfw.wmnet with reason: host reimage * 19:13 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker2004.codfw.wmnet with OS bookworm * 19:10 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker2004.codfw.wmnet - herron@cumin1002" * 19:10 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker2004.codfw.wmnet - herron@cumin1002" * 19:10 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker2004.codfw.wmnet on all recursors * 19:10 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker2004.codfw.wmnet on all recursors * 19:10 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:10 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker2004.codfw.wmnet - herron@cumin1002" * 19:10 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker2004.codfw.wmnet - herron@cumin1002" * 19:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es2044.codfw.wmnet with reason: host reimage * 19:05 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on es2045.codfw.wmnet with reason: host reimage * 19:05 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on es2046.codfw.wmnet with reason: host reimage * 19:05 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on es2043.codfw.wmnet with reason: host reimage * 19:05 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on es2044.codfw.wmnet with reason: host reimage * 18:58 herron@cumin1002: START - Cookbook sre.dns.netbox * 18:58 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker2004.codfw.wmnet * 18:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2042.codfw.wmnet with OS bookworm * 18:53 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:52 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:50 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host es2043.codfw.wmnet with OS bookworm * 18:50 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host es2044.codfw.wmnet with OS bookworm * 18:50 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host es2045.codfw.wmnet with OS bookworm * 18:50 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host es2046.codfw.wmnet with OS bookworm * 18:45 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-worker2003.codfw.wmnet * 18:45 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker2003.codfw.wmnet with OS bookworm * 18:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es2042.codfw.wmnet with reason: host reimage * 18:32 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on es2042.codfw.wmnet with reason: host reimage * 18:31 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker2003.codfw.wmnet with reason: host reimage * 18:27 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker2003.codfw.wmnet with reason: host reimage * 18:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host es2042.codfw.wmnet with OS bookworm * 18:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es2042.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:11 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker2003.codfw.wmnet with OS bookworm * 18:10 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker2003.codfw.wmnet - herron@cumin1002" * 18:10 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker2003.codfw.wmnet - herron@cumin1002" * 18:10 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker2003.codfw.wmnet on all recursors * 18:10 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker2003.codfw.wmnet on all recursors * 18:10 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:10 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker2003.codfw.wmnet - herron@cumin1002" * 18:10 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker2003.codfw.wmnet - herron@cumin1002" * 18:09 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host es2042.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:03 herron@cumin1002: START - Cookbook sre.dns.netbox * 18:02 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:02 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding es2042 to codfw - jhancock@cumin2002" * 18:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding es2042 to codfw - jhancock@cumin2002" * 18:02 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker2003.codfw.wmnet * 17:58 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 17:41 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-worker2002.codfw.wmnet * 17:41 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker2002.codfw.wmnet with OS bookworm * 17:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es2046.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:28 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host es2042 * 17:28 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host es2042 * 17:25 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker2002.codfw.wmnet with reason: host reimage * 17:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es2044.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:23 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cloudsw1-d5-eqiad.mgmt,cloudsw1-e4-eqiad.mgmt with reason: replace optics on faulty WMCS link from D5 to E4 * 17:22 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on cloudsw1-d5-eqiad.mgmt,cloudsw1-e4-eqiad.mgmt with reason: replace optics on faulty WMCS link from D5 to E4 * 17:22 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker2002.codfw.wmnet with reason: host reimage * 17:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host es2046.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host es2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:11 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host es2044.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:08 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker2002.codfw.wmnet with OS bookworm * 17:06 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker2002.codfw.wmnet - herron@cumin1002" * 17:06 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker2002.codfw.wmnet - herron@cumin1002" * 17:05 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker2002.codfw.wmnet on all recursors * 17:05 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker2002.codfw.wmnet on all recursors * 17:05 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:05 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker2002.codfw.wmnet - herron@cumin1002" * 17:05 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker2002.codfw.wmnet - herron@cumin1002" * 17:00 herron@cumin1002: START - Cookbook sre.dns.netbox * 17:00 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker2002.codfw.wmnet * 16:57 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host es2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:54 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of aux-k8s-etcd2003.codfw.wmnet to plain * 16:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es2042.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:53 herron@cumin1002: START - Cookbook sre.ganeti.changedisk for changing disk type of aux-k8s-etcd2003.codfw.wmnet to plain * 16:48 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of aux-k8s-etcd2004.codfw.wmnet to plain * 16:47 herron@cumin1002: START - Cookbook sre.ganeti.changedisk for changing disk type of aux-k8s-etcd2004.codfw.wmnet to plain * 16:43 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of aux-k8s-etcd2005.codfw.wmnet to plain * 16:43 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2041.codfw.wmnet with OS bookworm * 16:43 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 16:43 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 16:42 herron@cumin1002: START - Cookbook sre.ganeti.changedisk for changing disk type of aux-k8s-etcd2005.codfw.wmnet to plain * 16:40 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host es2042.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:27 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es2041.codfw.wmnet with reason: host reimage * 16:24 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on es2041.codfw.wmnet with reason: host reimage * 16:12 claime: homer 'cr*codfw*' commit '[[phab:T380473|T380473]]' * 16:11 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts parse[2002-2020].codfw.wmnet * 16:11 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:10 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: parse[2002-2020].codfw.wmnet decommissioned, removing all IPs except the asset tag one - cgoubert@cumin1002" * 16:10 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: parse[2002-2020].codfw.wmnet decommissioned, removing all IPs except the asset tag one - cgoubert@cumin1002" * 16:09 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host es2041.codfw.wmnet with OS bookworm * 16:08 bking@deploy2002: Finished deploy [wdqs/wdqs@9927a5a]: 0.3.150 (duration: 03m 00s) * 16:07 cgoubert@cumin1002: START - Cookbook sre.dns.netbox * 16:05 bking@deploy2002: Started deploy [wdqs/wdqs@9927a5a]: 0.3.150 * 16:00 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es2041.codfw.wmnet with OS bookworm * 15:31 cgoubert@cumin1002: START - Cookbook sre.hosts.decommission for hosts parse[2002-2020].codfw.wmnet * 15:31 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host es2041.codfw.wmnet with OS bookworm * 15:29 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts parse2001.codfw.wmnet * 15:29 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:29 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: parse2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - cgoubert@cumin1002" * 15:29 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: parse2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - cgoubert@cumin1002" * 15:29 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host es2041.codfw.wmnet with OS bookworm * 15:25 cgoubert@cumin1002: START - Cookbook sre.dns.netbox * 15:22 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host es2041.codfw.wmnet with OS bookworm * 15:20 cgoubert@cumin1002: START - Cookbook sre.hosts.decommission for hosts parse2001.codfw.wmnet * 15:17 ihurbain@deploy2002: helmfile [eqiad] DONE helmfile.d/services/push-notifications: apply * 15:17 ihurbain@deploy2002: helmfile [eqiad] START helmfile.d/services/push-notifications: apply * 15:16 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/push-notifications: apply * 15:15 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/push-notifications: apply * 15:14 claime: kubectl delete node parse20<nowiki>{</nowiki>01..20<nowiki>}</nowiki>.codfw.wmnet - [[phab:T380473|T380473]] * 15:12 claime: parse[2001-2020].codfw.wmnet 'systemctl stop kubelet.service' - [[phab:T380473|T380473]] * 15:11 claime: parse[2001-2020].codfw.wmnet 'disable-puppet "decom"' - [[phab:T380473|T380473]] * 15:09 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host parse[2001-2020].codfw.wmnet * 15:02 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on wdqs[2018-2020].codfw.wmnet with reason: [[phab:T379023|T379023]] * 15:02 bking@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on wdqs[2018-2020].codfw.wmnet with reason: [[phab:T379023|T379023]] * 15:01 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on wdqs[2026-2027].codfw.wmnet with reason: [[phab:T379023|T379023]] * 15:01 bking@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on wdqs[2026-2027].codfw.wmnet with reason: [[phab:T379023|T379023]] * 14:54 urandom: decommissioning Cassandra/restbase2022-<nowiki>{</nowiki>a,b,c<nowiki>}</nowiki> — * 14:53 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on restbase2022.codfw.wmnet with reason: Decommissioning — [[phab:T380236|T380236]] * 14:53 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on restbase2022.codfw.wmnet with reason: Decommissioning — [[phab:T380236|T380236]] * 14:49 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host parse[2001-2020].codfw.wmnet * 14:37 ihurbain@deploy2002: helmfile [codfw] DONE helmfile.d/services/push-notifications: apply * 14:27 ihurbain@deploy2002: helmfile [codfw] START helmfile.d/services/push-notifications: apply * 14:23 ihurbain@deploy2002: helmfile [codfw] DONE helmfile.d/services/push-notifications: apply * 14:22 vgutierrez: restoring haproxykafka on A:cp-ulsfo and A:cp-eqsin - [[phab:T380570|T380570]] * 14:13 ihurbain@deploy2002: helmfile [codfw] START helmfile.d/services/push-notifications: apply * 14:12 ihurbain@deploy2002: helmfile [staging] DONE helmfile.d/services/push-notifications: apply * 14:12 ihurbain@deploy2002: helmfile [staging] START helmfile.d/services/push-notifications: apply * 11:26 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2156-2170].codfw.wmnet * 11:26 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2156-2170].codfw.wmnet * 11:25 claime: homer 'lsw1-d7-codfw*' commit '[[phab:T376966|T376966]]' * 11:24 claime: homer 'lsw1-d6-codfw*' commit '[[phab:T376966|T376966]]' * 11:24 claime: homer 'lsw1-d5-codfw*' commit '[[phab:T376966|T376966]]' * 11:23 claime: homer 'lsw1-d4-codfw*' commit '[[phab:T376966|T376966]]' * 11:22 claime: homer 'lsw1-d1-codfw*' commit '[[phab:T376966|T376966]]' * 11:21 claime: homer 'lsw1-c7-codfw*' commit '[[phab:T376966|T376966]]' * 11:20 claime: homer 'lsw1-c4-codfw*' commit '[[phab:T376966|T376966]]' * 11:19 claime: homer 'lsw1-c2-codfw*' commit '[[phab:T376966|T376966]]' * 11:19 claime: homer 'lsw1-b7-codfw*' commit '[[phab:T376966|T376966]]' * 11:18 claime: homer 'lsw1-b4-codfw*' commit '[[phab:T376966|T376966]]' * 11:07 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2140.codfw.wmnet * 11:07 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2140.codfw.wmnet * 11:04 claime: homer 'lsw1-b7-codfw*' commit '[[phab:T377028|T377028]]' * 11:02 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2159.codfw.wmnet with OS bookworm * 10:43 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 10:40 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 10:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti1014.eqiad.wmnet * 10:37 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:37 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti1014.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:37 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti1014.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:31 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:26 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti1014.eqiad.wmnet * 10:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti1011.eqiad.wmnet * 10:23 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:23 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti1011.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:22 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti1011.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:22 vgutierrez: manually stopping haproxykafka on A:cp-ulsfo and A:cp-eqsin - [[phab:T380570|T380570]] * 10:21 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2159.codfw.wmnet with OS bookworm * 10:16 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:10 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti1011.eqiad.wmnet * 08:08 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Add sorting options to tree view - oblivian@cumin1002" * 08:08 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Add sorting options to tree view - oblivian@cumin1002 * 08:07 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Add sorting options to tree view - oblivian@cumin1002 * 08:07 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Add sorting options to tree view - oblivian@cumin1002" * 01:00 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-etcd2005.codfw.wmnet * 01:00 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-etcd2005.codfw.wmnet with OS bookworm * 00:46 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-etcd2005.codfw.wmnet with reason: host reimage * 00:42 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-etcd2005.codfw.wmnet with reason: host reimage * 00:27 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-etcd2005.codfw.wmnet with OS bookworm * 00:20 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-etcd2005.codfw.wmnet - herron@cumin1002" * 00:20 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-etcd2005.codfw.wmnet - herron@cumin1002" * 00:20 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-etcd2005.codfw.wmnet on all recursors * 00:20 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-etcd2005.codfw.wmnet on all recursors * 00:20 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 00:20 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-etcd2005.codfw.wmnet - herron@cumin1002" * 00:16 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-etcd2005.codfw.wmnet - herron@cumin1002" * 00:11 herron@cumin1002: START - Cookbook sre.dns.netbox * 00:11 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-etcd2005.codfw.wmnet * 00:11 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-etcd2004.codfw.wmnet * 00:11 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-etcd2004.codfw.wmnet with OS bookworm == 2024-11-21 == * 23:56 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-etcd2004.codfw.wmnet with reason: host reimage * 23:52 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-etcd2004.codfw.wmnet with reason: host reimage * 23:36 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-etcd2004.codfw.wmnet with OS bookworm * 23:29 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-etcd2004.codfw.wmnet - herron@cumin1002" * 23:29 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-etcd2004.codfw.wmnet - herron@cumin1002" * 23:29 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-etcd2004.codfw.wmnet on all recursors * 23:28 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-etcd2004.codfw.wmnet on all recursors * 23:28 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:28 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-etcd2004.codfw.wmnet - herron@cumin1002" * 23:24 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-etcd2004.codfw.wmnet - herron@cumin1002" * 23:11 herron@cumin1002: START - Cookbook sre.dns.netbox * 23:11 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-etcd2004.codfw.wmnet * 23:09 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-etcd2003.codfw.wmnet * 23:09 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-etcd2003.codfw.wmnet with OS bookworm * 23:08 brennen: end of utc late backport & config window * 23:07 brennen@deploy2002: Finished scap sync-world: Backport for [[gerrit:1094005{{!}}Add statsv to charts impressions (T379833)]] (duration: 12m 08s) * 23:06 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es2041.codfw.wmnet with OS bookworm * 23:01 brennen@deploy2002: bvibber, brennen: Continuing with sync * 23:00 brennen@deploy2002: bvibber, brennen: Backport for [[gerrit:1094005{{!}}Add statsv to charts impressions (T379833)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 22:55 brennen@deploy2002: Started scap sync-world: Backport for [[gerrit:1094005{{!}}Add statsv to charts impressions (T379833)]] * 22:55 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-etcd2003.codfw.wmnet with reason: host reimage * 22:54 brennen@deploy2002: Finished scap sync-world: resuming sync for [[gerrit:1094000{{!}}Add tracking categories for {{#chart:}} usage (T369684)]] after messing up a keypress (duration: 12m 35s) * 22:52 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-etcd2003.codfw.wmnet with reason: host reimage * 22:42 brennen@deploy2002: Started scap sync-world: resuming sync for [[gerrit:1094000{{!}}Add tracking categories for {{#chart:}} usage (T369684)]] after messing up a keypress * 22:40 brennen@deploy2002: Sync cancelled. * 22:40 brennen@deploy2002: bvibber, brennen: Backport for [[gerrit:1094000{{!}}Add tracking categories for {{#chart:}} usage (T369684)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 22:38 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-etcd2003.codfw.wmnet with OS bookworm * 22:36 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-etcd2003.codfw.wmnet - herron@cumin1002" * 22:36 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-etcd2003.codfw.wmnet - herron@cumin1002" * 22:35 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-etcd2003.codfw.wmnet on all recursors * 22:35 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-etcd2003.codfw.wmnet on all recursors * 22:35 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:35 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-etcd2003.codfw.wmnet - herron@cumin1002" * 22:35 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-etcd2003.codfw.wmnet - herron@cumin1002" * 22:32 herron@cumin1002: START - Cookbook sre.dns.netbox * 22:32 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-etcd2003.codfw.wmnet * 22:25 brennen@deploy2002: Started scap sync-world: Backport for [[gerrit:1094000{{!}}Add tracking categories for {{#chart:}} usage (T369684)]] * 22:25 brennen@deploy2002: Finished scap sync-world: Backport for [[gerrit:1092334{{!}}Disable various extensions when using the shared login domain (T373737)]] (duration: 18m 16s) * 22:22 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host es2041.codfw.wmnet with OS bookworm * 22:18 brennen@deploy2002: tgr, brennen: Continuing with sync * 22:10 brennen@deploy2002: tgr, brennen: Backport for [[gerrit:1092334{{!}}Disable various extensions when using the shared login domain (T373737)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 22:06 brennen@deploy2002: Started scap sync-world: Backport for [[gerrit:1092334{{!}}Disable various extensions when using the shared login domain (T373737)]] * 22:05 brennen@deploy2002: Finished scap sync-world: Backport for [[gerrit:1094047{{!}}Revert "Reduce number of bucketsizes for MediaViewer (group0)" (T372165)]] (duration: 10m 34s) * 21:58 brennen@deploy2002: brennen: Continuing with sync * 21:58 brennen@deploy2002: brennen: Backport for [[gerrit:1094047{{!}}Revert "Reduce number of bucketsizes for MediaViewer (group0)" (T372165)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:54 brennen@deploy2002: Started scap sync-world: Backport for [[gerrit:1094047{{!}}Revert "Reduce number of bucketsizes for MediaViewer (group0)" (T372165)]] * 21:51 brennen@deploy2002: Sync cancelled. * 21:42 brennen@deploy2002: brennen, tgr, simon04: Backport for [[gerrit:1079640{{!}}Reduce number of bucketsizes for MediaViewer (group0) (T372165)]], [[gerrit:1093961{{!}}Set 'remember' central session object field when recreating (T379254 T372702)]], [[gerrit:1093962{{!}}Use cookie to access central session when local session expired]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:39 brennen@deploy2002: Started scap sync-world: Backport for [[gerrit:1079640{{!}}Reduce number of bucketsizes for MediaViewer (group0) (T372165)]], [[gerrit:1093961{{!}}Set 'remember' central session object field when recreating (T379254 T372702)]], [[gerrit:1093962{{!}}Use cookie to access central session when local session expired]] * 21:36 brennen@deploy2002: Finished scap sync-world: Backport for [[gerrit:1093960{{!}}Enable Skin-Codex logging (T375287)]] (duration: 15m 53s) * 21:29 brennen@deploy2002: brennen, jdlrobson: Continuing with sync * 21:26 brennen@deploy2002: brennen, jdlrobson: Backport for [[gerrit:1093960{{!}}Enable Skin-Codex logging (T375287)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:20 brennen@deploy2002: Started scap sync-world: Backport for [[gerrit:1093960{{!}}Enable Skin-Codex logging (T375287)]] * 21:19 brennen@deploy2002: Finished scap sync-world: Backport for [[gerrit:1090968{{!}}Enable AutoModerator on afwiki (T376597)]] (duration: 13m 50s) * 21:12 brennen@deploy2002: kgraessle, brennen: Continuing with sync * 21:10 brennen@deploy2002: kgraessle, brennen: Backport for [[gerrit:1090968{{!}}Enable AutoModerator on afwiki (T376597)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:05 brennen@deploy2002: Started scap sync-world: Backport for [[gerrit:1090968{{!}}Enable AutoModerator on afwiki (T376597)]] * 20:46 tgr * 20:24 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp2038.codfw.wmnet [reason: DIMM replaced, [[phab:T308459|T308459]]] * 20:20 sukhe: force agent on cp2038 * 19:31 gmodena@deploy2002: Finished deploy [analytics/refinery@199401a] (hadoop-test): Ad-hoc deployment TEST [analytics/refinery@199401a6] (duration: 03m 45s) * 19:27 gmodena@deploy2002: Started deploy [analytics/refinery@199401a] (hadoop-test): Ad-hoc deployment TEST [analytics/refinery@199401a6] * 19:07 gmodena@deploy2002: Finished deploy [analytics/refinery@199401a] (thin): Ad-hoc deployment THIN [analytics/refinery@199401a6] (duration: 05m 37s) * 19:01 gmodena@deploy2002: Started deploy [analytics/refinery@199401a] (thin): Ad-hoc deployment THIN [analytics/refinery@199401a6] * 18:57 gmodena@deploy2002: Finished deploy [analytics/refinery@199401a]: Ad-hoc deployment [analytics/refinery@199401a6] (duration: 14m 08s) * 18:57 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1093983{{!}}Follow-up fix for Charts enable on commons/test2 (T379689)]] (duration: 11m 29s) * 18:49 cdanis@deploy2002: cdanis, bvibber: Continuing with sync * 18:49 cdanis@deploy2002: cdanis, bvibber: Backport for [[gerrit:1093983{{!}}Follow-up fix for Charts enable on commons/test2 (T379689)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 18:45 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1093983{{!}}Follow-up fix for Charts enable on commons/test2 (T379689)]] * 18:43 gmodena@deploy2002: Started deploy [analytics/refinery@199401a]: Ad-hoc deployment [analytics/refinery@199401a6] * 18:21 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1091328{{!}}Enabling Charts on commons+test2 (T379689)]] (duration: 14m 05s) * 18:16 jayme@cumin2002: conftool action : set/pooled=yes; selector: name=kubestage200[34].codfw.wmnet * 18:15 jayme@cumin2002: conftool action : set/weight=10; selector: name=kubestage200[34].codfw.wmnet * 18:13 cdanis@deploy2002: cdanis, bvibber: Continuing with sync * 18:12 cdanis@deploy2002: cdanis, bvibber: Backport for [[gerrit:1091328{{!}}Enabling Charts on commons+test2 (T379689)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 18:10 sukhe: running puppet on A:cp to resolve failed puppet run * 18:10 sukhe: sudo cumin -b11 'A:cp' 'run-puppet-agent * 18:09 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp2038.codfw.wmnet with reason: DIMM replacement in progress * 18:09 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on cp2038.codfw.wmnet with reason: DIMM replacement in progress * 18:07 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1091328{{!}}Enabling Charts on commons+test2 (T379689)]] * 17:58 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=cp2038.codfw.wmnet [reason: DIMM failure [[phab:T308459|T308459]]] * 17:45 jayme@cumin2002: END (FAIL) - Cookbook sre.k8s.pool-depool-node (exit_code=99) check for host kubestage2003.codfw.wmnet * 17:45 jayme@cumin2002: START - Cookbook sre.k8s.pool-depool-node check for host kubestage2003.codfw.wmnet * 17:40 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts clouddb2002-dev.codfw.wmnet * 17:40 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:40 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: clouddb2002-dev.codfw.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002" * 17:39 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: clouddb2002-dev.codfw.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002" * 17:39 fabfur: adding acls to kafka-jumbo cluster ([[phab:T380373|T380373]]) * 17:36 andrew@cumin1002: START - Cookbook sre.dns.netbox * 17:31 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts clouddb2002-dev.codfw.wmnet * 17:02 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2157.codfw.wmnet with OS bookworm * 16:54 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs2013.codfw.wmnet * 16:54 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs2013.codfw.wmnet * 16:54 sukhe: enable puppet on lvs2013 and start pybal * 16:48 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs2013.codfw.wmnet with reason: rebooting * 16:47 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs2013.codfw.wmnet with reason: rebooting * 16:47 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2140.codfw.wmnet with OS bookworm * 16:47 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - cgoubert@cumin1002" * 16:46 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs2013.codfw.wmnet * 16:46 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - cgoubert@cumin1002" * 16:43 sukhe@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs2013.codfw.wmnet * 16:43 sukhe: rebooting drained lvs2013 * 16:43 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 16:39 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 16:26 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 16:23 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 16:21 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2157.codfw.wmnet with OS bookworm * 16:20 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2157.codfw.wmnet with OS bookworm * 16:13 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=cluster=dnsbox,dc=magru [reason: testing] * 16:08 dancy@deploy2002: Finished scap sync-world: testing (duration: 03m 01s) * 16:05 dancy@deploy2002: Started scap sync-world: testing * 16:04 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2140.codfw.wmnet with OS bookworm * 16:03 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2140.codfw.wmnet with OS bookworm * 16:00 dancy@deploy2002: Installing scap version "4.127.0" for 209 hosts * 15:39 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1093927{{!}}Fix layout broken by display:flex on HorizontalLayout (T380471)]], [[gerrit:1093928{{!}}Revert "ExperimentUserDefaultsManager: use read latest when retrieving central id"]] (duration: 15m 51s) * 15:34 gmodena@deploy2002: Finished deploy [analytics/refinery@358ccf5] (hadoop-test): Ad-hoc deployment TEST [analytics/refinery@358ccf55] (duration: 03m 30s) * 15:33 kartik@deploy2002: abi, sgimeno, kartik: Continuing with sync * 15:31 gmodena@deploy2002: Started deploy [analytics/refinery@358ccf5] (hadoop-test): Ad-hoc deployment TEST [analytics/refinery@358ccf55] * 15:29 gmodena@deploy2002: Finished deploy [analytics/refinery@358ccf5] (thin): Ad-hoc deployment THIN [analytics/refinery@358ccf55] (duration: 05m 16s) * 15:29 ihurbain@deploy2002: helmfile [eqiad] DONE helmfile.d/services/push-notifications: apply * 15:29 kartik@deploy2002: abi, sgimeno, kartik: Backport for [[gerrit:1093927{{!}}Fix layout broken by display:flex on HorizontalLayout (T380471)]], [[gerrit:1093928{{!}}Revert "ExperimentUserDefaultsManager: use read latest when retrieving central id"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:28 ihurbain@deploy2002: helmfile [eqiad] START helmfile.d/services/push-notifications: apply * 15:28 ihurbain@deploy2002: helmfile [codfw] DONE helmfile.d/services/push-notifications: apply * 15:27 ihurbain@deploy2002: helmfile [codfw] START helmfile.d/services/push-notifications: apply * 15:26 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@6183645]: increase driver memory for mjolnir feature selection (duration: 00m 31s) * 15:26 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs2013.codfw.wmnet with reason: rebooting * 15:25 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs2013.codfw.wmnet with reason: rebooting * 15:25 ebernhardson@deploy2002: Started deploy [airflow-dags/search@6183645]: increase driver memory for mjolnir feature selection * 15:24 sukhe: stop pybal on lvs2013 to confirm changes in CR {{Gerrit|1091243}} * 15:24 gmodena@deploy2002: Started deploy [analytics/refinery@358ccf5] (thin): Ad-hoc deployment THIN [analytics/refinery@358ccf55] * 15:24 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1093927{{!}}Fix layout broken by display:flex on HorizontalLayout (T380471)]], [[gerrit:1093928{{!}}Revert "ExperimentUserDefaultsManager: use read latest when retrieving central id"]] * 15:23 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2140.codfw.wmnet with OS bookworm * 15:23 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2140.codfw.wmnet with OS bookworm * 15:16 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2140.codfw.wmnet with OS bookworm * 15:15 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2140.codfw.wmnet with OS bookworm * 15:11 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on restbase2021.codfw.wmnet with reason: Decommissioning — [[phab:T380236|T380236]] * 15:10 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on restbase2021.codfw.wmnet with reason: Decommissioning — [[phab:T380236|T380236]] * 15:06 gmodena@deploy2002: Finished deploy [analytics/refinery@358ccf5]: Ad-hoc deployment [analytics/refinery@358ccf55] (duration: 11m 44s) * 14:56 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2169.codfw.wmnet with OS bookworm * 14:55 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2140.codfw.wmnet with OS bookworm * 14:54 gmodena@deploy2002: Started deploy [analytics/refinery@358ccf5]: Ad-hoc deployment [analytics/refinery@358ccf55] * 14:53 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2168.codfw.wmnet with OS bookworm * 14:51 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2170.codfw.wmnet with OS bookworm * 14:50 sergi0: UTC afternoon deploys done * 14:49 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2167.codfw.wmnet with OS bookworm * 14:48 sgimeno@deploy2002: Sync cancelled. * 14:47 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2140.codfw.wmnet with OS bookworm * 14:47 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2166.codfw.wmnet with OS bookworm * 14:43 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on kafka-main1001.eqiad.wmnet with reason: Per claime's recommendation * 14:43 jynus@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on kafka-main1001.eqiad.wmnet with reason: Per claime's recommendation * 14:43 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2157.codfw.wmnet with OS bookworm * 14:41 sgimeno@deploy2002: sgimeno: Backport for [[gerrit:1093889{{!}}ExperimentUserDefaultsManager: use read latest when retrieving central id (T379682)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:39 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2140.codfw.wmnet with OS bookworm * 14:36 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 14:35 sgimeno@deploy2002: Started scap sync-world: Backport for [[gerrit:1093889{{!}}ExperimentUserDefaultsManager: use read latest when retrieving central id (T379682)]] * 14:33 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 14:31 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 14:28 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 14:25 ihurbain@deploy2002: helmfile [staging] DONE helmfile.d/services/push-notifications: apply * 14:25 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 14:25 ihurbain@deploy2002: helmfile [staging] START helmfile.d/services/push-notifications: apply * 14:24 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 14:24 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 14:23 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 14:23 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 14:22 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 14:21 sgimeno@deploy2002: Finished scap sync-world: Backport for [[gerrit:1092956{{!}}enwiki: Add abusefilter-access-protected-vars to EFH/EFM (T380332)]] (duration: 13m 50s) * 14:14 sgimeno@deploy2002: eggroll97, sgimeno: Continuing with sync * 14:11 sgimeno@deploy2002: eggroll97, sgimeno: Backport for [[gerrit:1092956{{!}}enwiki: Add abusefilter-access-protected-vars to EFH/EFM (T380332)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:11 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage1006.eqiad.wmnet with OS bookworm * 14:07 sgimeno@deploy2002: Started scap sync-world: Backport for [[gerrit:1092956{{!}}enwiki: Add abusefilter-access-protected-vars to EFH/EFM (T380332)]] * 14:06 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage1005.eqiad.wmnet with OS bookworm * 14:05 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2170.codfw.wmnet with OS bookworm * 14:05 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2169.codfw.wmnet with OS bookworm * 14:04 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2168.codfw.wmnet with OS bookworm * 14:04 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2167.codfw.wmnet with OS bookworm * 14:03 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2166.codfw.wmnet with OS bookworm * 13:54 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage1006.eqiad.wmnet with reason: host reimage * 13:51 jayme@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage1006.eqiad.wmnet with reason: host reimage * 13:47 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage1005.eqiad.wmnet with reason: host reimage * 13:44 jayme@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage1005.eqiad.wmnet with reason: host reimage * 13:34 jayme@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage1006.eqiad.wmnet with OS bookworm * 13:33 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from kubernetes1008 to kubestage1006 * 13:32 jayme@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubestage1006 * 13:31 jayme@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host kubestage1006 * 13:31 jayme@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:31 jayme@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes1008 to kubestage1006 - jayme@cumin2002" * 13:30 jayme@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes1008 to kubestage1006 - jayme@cumin2002" * 13:27 jayme@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage1005.eqiad.wmnet with OS bookworm * 13:25 jayme@cumin2002: START - Cookbook sre.dns.netbox * 13:25 jayme@cumin2002: START - Cookbook sre.hosts.rename from kubernetes1008 to kubestage1006 * 13:24 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from kubernetes1007 to kubestage1005 * 13:24 jayme@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubestage1005 * 13:22 jayme@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host kubestage1005 * 13:22 jayme@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:22 jayme@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes1007 to kubestage1005 - jayme@cumin2002" * 13:21 jayme@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes1007 to kubestage1005 - jayme@cumin2002" * 13:18 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2160.codfw.wmnet with OS bookworm * 13:18 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on P<nowiki>{</nowiki>cp5026*<nowiki>}</nowiki> and A:cp for 9.2.6-1wm2 * 13:17 jayme@cumin2002: START - Cookbook sre.dns.netbox * 13:17 jayme@cumin2002: START - Cookbook sre.hosts.rename from kubernetes1007 to kubestage1005 * 13:14 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2164.codfw.wmnet with OS bookworm * 13:14 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on P<nowiki>{</nowiki>cp5026*<nowiki>}</nowiki> and A:cp for 9.2.6-1wm2 * 13:14 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on P<nowiki>{</nowiki>cp5018*<nowiki>}</nowiki> and A:cp for 9.2.6-1wm2 * 13:11 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2162.codfw.wmnet with OS bookworm * 13:10 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on P<nowiki>{</nowiki>cp5018*<nowiki>}</nowiki> and A:cp for 9.2.6-1wm2 * 13:10 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2165.codfw.wmnet with OS bookworm * 13:05 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 13:02 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2158.codfw.wmnet with OS bookworm * 12:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2161.codfw.wmnet with OS bookworm * 12:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 12:55 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2156.codfw.wmnet with OS bookworm * 12:55 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 12:52 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 12:49 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 12:46 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2163.codfw.wmnet with reason: host reimage * 12:42 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 12:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 12:38 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 12:38 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 12:38 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2163.codfw.wmnet with reason: host reimage * 12:37 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 12:36 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 12:36 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 12:35 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 12:32 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 12:32 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 12:19 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2165.codfw.wmnet with OS bookworm * 12:18 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2164.codfw.wmnet with OS bookworm * 12:18 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 12:17 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2162.codfw.wmnet with OS bookworm * 12:17 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2160.codfw.wmnet with OS bookworm * 12:16 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2161.codfw.wmnet with OS bookworm * 12:16 jmm@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 12:13 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2158.codfw.wmnet with OS bookworm * 12:13 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2156.codfw.wmnet with OS bookworm * 12:09 jmm@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 12:09 jmm@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 12:02 jmm@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply * 11:56 jmm@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 11:56 jmm@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply * 11:00 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-be1005.eqiad.wmnet with OS bullseye * 11:00 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 10:59 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 10:41 jayme@cumin2002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubernetes[1007-1008].eqiad.wmnet * 10:41 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-be1005.eqiad.wmnet with reason: host reimage * 10:40 jayme@cumin2002: START - Cookbook sre.k8s.pool-depool-node depool for host kubernetes[1007-1008].eqiad.wmnet * 10:39 urbanecm@deploy2002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply * 10:38 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P71113 and previous config saved to /var/cache/conftool/dbconfig/20241121-103834-arnaudb.json * 10:38 urbanecm@deploy2002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply * 10:38 urbanecm@deploy2002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply * 10:37 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-be1005.eqiad.wmnet with reason: host reimage * 10:36 urbanecm@deploy2002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply * 10:34 urbanecm@deploy2002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply * 10:33 urbanecm@deploy2002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply * 10:25 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host thanos-be1005.eqiad.wmnet with OS bullseye * 10:23 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P71112 and previous config saved to /var/cache/conftool/dbconfig/20241121-102328-arnaudb.json * 10:19 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 102 * 10:19 ayounsi@cumin1002: START - Cookbook sre.network.debug for Netbox circuit ID 102 * 10:08 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P71111 and previous config saved to /var/cache/conftool/dbconfig/20241121-100821-arnaudb.json * 10:01 dcausse@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 10:01 dcausse@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 09:59 dcausse: restarting eventgate-main@codfw * 09:53 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P71110 and previous config saved to /var/cache/conftool/dbconfig/20241121-095313-arnaudb.json * 09:51 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P71109 and previous config saved to /var/cache/conftool/dbconfig/20241121-095102-arnaudb.json * 09:50 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 09:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 09:50 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 09:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 09:35 moritzm: installing nghttp2 security updates * 09:18 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1246.eqiad.wmnet with OS bookworm * 09:17 aklapper@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.4 refs [[phab:T375663|T375663]] * 09:07 moritzm: installing exim4 security updates * 09:03 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1246.eqiad.wmnet with reason: host reimage * 09:00 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1246.eqiad.wmnet with reason: host reimage * 08:45 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host db1246.eqiad.wmnet with OS bookworm * 08:21 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1093733{{!}}Enable the Contribute menu in 4th group of Wikis (T375303)]] (duration: 14m 05s) * 08:14 kartik@deploy2002: kartik: Continuing with sync * 08:10 kartik@deploy2002: kartik: Backport for [[gerrit:1093733{{!}}Enable the Contribute menu in 4th group of Wikis (T375303)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:06 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1093733{{!}}Enable the Contribute menu in 4th group of Wikis (T375303)]] * 07:48 moritzm: removing ganeti1017 from active Ganeti nodes [[phab:T378921|T378921]] * 05:51 aikochou@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'article-models' for release 'main' . * 02:30 brett: Import libvmod-re2_2.0.0-2~bpo11u1 into varnish-staging apt component * 00:45 urandom: decommissioning Cassandra/restbase2021-<nowiki>{</nowiki>a,b,c<nowiki>}</nowiki> — [[phab:T380236|T380236]] * 00:42 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on restbase2023.codfw.wmnet with reason: Decommissioning — [[phab:T380236|T380236]] * 00:42 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on restbase2023.codfw.wmnet with reason: Decommissioning — [[phab:T380236|T380236]] * 00:42 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on restbase2022.codfw.wmnet with reason: Decommissioning — [[phab:T380236|T380236]] * 00:42 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on restbase2022.codfw.wmnet with reason: Decommissioning — [[phab:T380236|T380236]] * 00:42 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on restbase2021.codfw.wmnet with reason: Decommissioning — [[phab:T380236|T380236]] * 00:42 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on restbase2021.codfw.wmnet with reason: Decommissioning — [[phab:T380236|T380236]] * 00:40 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase2038.codfw.wmnet * 00:40 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for restbase2038.codfw.wmnet * 00:40 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase2037.codfw.wmnet * 00:40 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for restbase2037.codfw.wmnet * 00:40 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase2036.codfw.wmnet * 00:40 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for restbase2036.codfw.wmnet * 00:15 urbanecm: [urbanecm@deploy2002 ~]$ mwscript-k8s -- extensions/GrowthExperiments/maintenance/revalidateLinkRecommendations.php --wiki=azwiki --all --verbose # [[phab:T380329|T380329]] == 2024-11-20 == * 23:22 cjming: end of UTC late backport window * 23:20 eileen: civicrm upgraded from {{Gerrit|7c940d6f}} to {{Gerrit|3311520a}} * 23:17 cjming@deploy2002: Finished scap sync-world: Backport for [[gerrit:1093408{{!}}Temporarily disable dark mode for anonymous users (T379765)]] (duration: 13m 06s) * 23:10 cjming@deploy2002: jdlrobson, cjming: Continuing with sync * 23:08 cjming@deploy2002: jdlrobson, cjming: Backport for [[gerrit:1093408{{!}}Temporarily disable dark mode for anonymous users (T379765)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 23:04 cjming@deploy2002: Started scap sync-world: Backport for [[gerrit:1093408{{!}}Temporarily disable dark mode for anonymous users (T379765)]] * 23:03 cjming@deploy2002: Finished scap sync-world: Backport for [[gerrit:1093328{{!}}knwiki: update portal namespace (T380366)]] (duration: 12m 17s) * 22:56 cjming@deploy2002: cjming, anzx: Continuing with sync * 22:55 cjming@deploy2002: cjming, anzx: Backport for [[gerrit:1093328{{!}}knwiki: update portal namespace (T380366)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 22:52 brett: Import libvmod-querysort 0.4-3 into varnish-staging apt component * 22:51 cjming@deploy2002: Started scap sync-world: Backport for [[gerrit:1093328{{!}}knwiki: update portal namespace (T380366)]] * 22:49 cjming@deploy2002: Finished scap sync-world: Backport for [[gerrit:1093446{{!}}Revert "Add contact form for U4C"]] (duration: 14m 22s) * 22:49 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-be2005.codfw.wmnet with OS bullseye * 22:41 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply * 22:41 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply * 22:40 cjming@deploy2002: trainbranchbot, cjming: Continuing with sync * 22:40 cjming@deploy2002: trainbranchbot, cjming: Backport for [[gerrit:1093446{{!}}Revert "Add contact form for U4C"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 22:39 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply * 22:39 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply * 22:34 cjming@deploy2002: Started scap sync-world: Backport for [[gerrit:1093446{{!}}Revert "Add contact form for U4C"]] * 22:31 cjming@deploy2002: Sync cancelled. * 22:28 cjming@deploy2002: nmw03, cjming: Backport for [[gerrit:1091868{{!}}Add contact form for U4C (T379317)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 22:27 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-be2005.codfw.wmnet with reason: host reimage * 22:24 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-be2005.codfw.wmnet with reason: host reimage * 22:23 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply * 22:22 cjming@deploy2002: Started scap sync-world: Backport for [[gerrit:1091868{{!}}Add contact form for U4C (T379317)]] * 22:21 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply * 22:20 cjming@deploy2002: Finished scap sync-world: Backport for [[gerrit:1093358{{!}}Bump wikimedia/parsoid to 0.21.0-a7 (T373776 T380333)]], [[gerrit:1093359{{!}}Bump wikimedia/parsoid to 0.21.0-a7 (T380333)]] (duration: 17m 11s) * 22:18 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply * 22:16 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply * 22:13 cjming@deploy2002: arlolra, cjming: Continuing with sync * 22:12 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2005.codfw.wmnet with OS bullseye * 22:11 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-be2005.codfw.wmnet with OS bullseye * 22:11 jhathaway@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhathaway@cumin2002" * 22:09 jhathaway@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhathaway@cumin2002" * 22:08 cjming@deploy2002: arlolra, cjming: Backport for [[gerrit:1093358{{!}}Bump wikimedia/parsoid to 0.21.0-a7 (T373776 T380333)]], [[gerrit:1093359{{!}}Bump wikimedia/parsoid to 0.21.0-a7 (T380333)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 22:06 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply * 22:03 cjming@deploy2002: Started scap sync-world: Backport for [[gerrit:1093358{{!}}Bump wikimedia/parsoid to 0.21.0-a7 (T373776 T380333)]], [[gerrit:1093359{{!}}Bump wikimedia/parsoid to 0.21.0-a7 (T380333)]] * 22:02 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply * 21:52 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply * 21:50 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply * 21:47 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-be2005.codfw.wmnet with reason: host reimage * 21:43 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-be2005.codfw.wmnet with reason: host reimage * 21:40 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply * 21:32 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2005.codfw.wmnet with OS bullseye * 21:31 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-be2005.codfw.wmnet with OS bullseye * 21:28 cjming@deploy2002: Finished scap sync-world: Backport for [[gerrit:1091810{{!}}[ptwiki] Enable the CampaignEvents extension (T380090)]] (duration: 15m 04s) * 21:23 eileen: * civicrm upgraded from {{Gerrit|e29243f0}} to {{Gerrit|7c940d6f}} * 21:20 cjming@deploy2002: cjming, albertoleoncio: Continuing with sync * 21:19 cjming@deploy2002: cjming, albertoleoncio: Backport for [[gerrit:1091810{{!}}[ptwiki] Enable the CampaignEvents extension (T380090)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:13 cjming@deploy2002: Started scap sync-world: Backport for [[gerrit:1091810{{!}}[ptwiki] Enable the CampaignEvents extension (T380090)]] * 21:08 dancy@deploy2002: Installing scap version "4.124.0" for 209 hosts * 21:06 dancy@deploy2002: Installing scap version "4.124.0" for 209 hosts * 21:05 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-ctrl2003.codfw.wmnet * 21:05 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-ctrl2003.codfw.wmnet with OS bookworm * 21:03 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2005.codfw.wmnet with OS bullseye * 21:00 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-be2005.codfw.wmnet with OS bullseye * 20:51 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-ctrl2003.codfw.wmnet with reason: host reimage * 20:48 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply * 20:48 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply * 20:48 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-ctrl2003.codfw.wmnet with reason: host reimage * 20:48 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply * 20:47 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es2041.codfw.wmnet with OS bookworm * 20:44 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply * 20:40 dancy@deploy2002: Installation of scap version "4.126.0" completed for 1 hosts * 20:39 dancy@deploy2002: Installing scap version "4.126.0" for 1 hosts * 20:32 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-ctrl2003.codfw.wmnet with OS bookworm * 20:30 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2005.codfw.wmnet with OS bullseye * 20:30 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-be2005.codfw.wmnet with OS bullseye * 20:28 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-ctrl2003.codfw.wmnet - herron@cumin1002" * 20:28 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-ctrl2003.codfw.wmnet - herron@cumin1002" * 20:28 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-ctrl2003.codfw.wmnet on all recursors * 20:28 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-ctrl2003.codfw.wmnet on all recursors * 20:28 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:28 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-ctrl2003.codfw.wmnet - herron@cumin1002" * 20:26 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-ctrl2003.codfw.wmnet - herron@cumin1002" * 20:13 herron@cumin1002: START - Cookbook sre.dns.netbox * 20:13 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-ctrl2003.codfw.wmnet * 20:10 dancy@deploy2002: Installing scap version "4.126.0" for 1 hosts * 20:08 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2005.codfw.wmnet with OS bullseye * 20:05 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-be2005.codfw.wmnet with OS bullseye * 20:03 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host es2041.codfw.wmnet with OS bookworm * 19:52 hashar@deploy2002: Finished deploy [integration/docroot@1627206]: build: update mediawiki-codesniffer to 45.0.0 & prevent LibUp from removing a phpcs rule (duration: 00m 10s) * 19:52 hashar@deploy2002: Started deploy [integration/docroot@1627206]: build: update mediawiki-codesniffer to 45.0.0 & prevent LibUp from removing a phpcs rule * 19:51 dancy@deploy2002: Installing scap version "4.126.0" for 1 hosts * 19:47 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2005.codfw.wmnet with OS bullseye * 19:42 dancy@deploy2002: Installing scap version "4.126.0" for 209 hosts * 19:35 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-ctrl2002.codfw.wmnet * 19:35 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-ctrl2002.codfw.wmnet with OS bookworm * 19:20 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-ctrl2002.codfw.wmnet with reason: host reimage * 19:17 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-ctrl2002.codfw.wmnet with reason: host reimage * 19:12 urandom: bootstrapping cassandra, restbase2038-<nowiki>{</nowiki>a,b,c<nowiki>}</nowiki> — [[phab:T380236|T380236]] * 19:08 inflatador: bking@krb1001 add kerberos keytab for blunderbuss https://phabricator.wikimedia.org/P71106 [[phab:T371994|T371994]] * 19:04 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-ctrl2002.codfw.wmnet with OS bookworm * 19:03 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-ctrl2002.codfw.wmnet - herron@cumin1002" * 19:03 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-ctrl2002.codfw.wmnet - herron@cumin1002" * 19:03 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-ctrl2002.codfw.wmnet on all recursors * 19:03 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-ctrl2002.codfw.wmnet on all recursors * 19:03 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:03 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-ctrl2002.codfw.wmnet - herron@cumin1002" * 19:03 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-ctrl2002.codfw.wmnet - herron@cumin1002" * 18:58 herron@cumin1002: START - Cookbook sre.dns.netbox * 18:58 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-ctrl2002.codfw.wmnet * 17:32 joal@deploy2002: Finished deploy [analytics/refinery@295d5a4] (hadoop-test): Regular analytics weekly train BIS TEST [analytics/refinery@295d5a44] (duration: 03m 36s) * 17:28 joal@deploy2002: Started deploy [analytics/refinery@295d5a4] (hadoop-test): Regular analytics weekly train BIS TEST [analytics/refinery@295d5a44] * 17:28 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply * 17:27 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply * 17:22 joal@deploy2002: Finished deploy [analytics/refinery@295d5a4] (thin): Regular analytics weekly train BIS THIN [analytics/refinery@295d5a44] (duration: 05m 02s) * 17:22 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply * 17:21 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply * 17:20 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply * 17:19 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply * 17:18 joal@deploy2002: Started deploy [analytics/refinery@295d5a4] (thin): Regular analytics weekly train BIS THIN [analytics/refinery@295d5a44] * 17:17 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply * 17:16 joal@deploy2002: Finished deploy [analytics/refinery@295d5a4]: Regular analytics weekly train BIS [analytics/refinery@295d5a44] (duration: 03m 41s) * 17:12 joal@deploy2002: Started deploy [analytics/refinery@295d5a4]: Regular analytics weekly train BIS [analytics/refinery@295d5a44] * 17:05 sukhe: restart tomcat on idp2004 * 17:04 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply * 17:03 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply * 17:02 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply * 17:01 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply * 17:00 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply * 17:00 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply * 16:43 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply * 16:43 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply * 16:43 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply * 16:43 jiji@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply * 16:43 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventstreams: apply * 16:42 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/tegola-vector-tiles: apply * 16:40 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/tegola-vector-tiles: apply * 16:39 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply * 16:38 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply * 16:37 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/eventstreams: apply * 16:36 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply * 16:35 klausman@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'. * 16:35 jiji@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams: apply * 16:34 klausman@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'. * 16:28 jiji@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-main: apply * 16:26 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 16:25 aikochou@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revision-models' for release 'main' . * 16:24 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 16:23 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 16:22 jiji@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 16:22 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/benthos-cache-invalidator: apply * 16:21 jiji@deploy2002: helmfile [staging] START helmfile.d/services/benthos-cache-invalidator: apply * 16:15 aikochou@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revision-models' for release 'main' . * 16:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1017.eqiad.wmnet * 15:51 apine@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 15:50 apine@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 15:50 apine@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 15:49 apine@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 15:48 dancy@deploy2002: Finished scap sync-world: no-op deployment for testing. (duration: 03m 21s) * 15:44 dancy@deploy2002: Started scap sync-world: no-op deployment for testing. * 15:44 apine@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 15:44 apine@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 15:37 apine@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 15:37 apine@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 15:33 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: host overworked by dumps - [[phab:T368098|T368098]] * 15:33 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: host overworked by dumps - [[phab:T368098|T368098]] * 15:31 jynus: starting resharding of commons backup files into new host backup2010 [[phab:T376892|T376892]] * 15:27 apine@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 15:23 apine@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 15:23 apine@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 15:22 apine@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 15:22 apine@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 15:19 apine@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 15:19 apine@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 15:15 apine@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 15:14 apine@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 15:13 apine@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 15:13 apine@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 15:10 apine@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 15:09 apine@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 15:09 urandom: bootstrapping cassandra, restbase2037-<nowiki>{</nowiki>a,b,c<nowiki>}</nowiki> — [[phab:T380236|T380236]] * 15:04 btullis@cumin1002: END (PASS) - Cookbook sre.ceph.roll-restart-reboot-server (exit_code=0) rolling reboot on P<nowiki>{</nowiki>cephosd100[2-4].eqiad.wmnet<nowiki>}</nowiki> and (A:cephosd) * 14:57 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 14:53 JennH: power cycling unresponsive mgmt switch in codfw: msw-c3-codfw * 14:50 btullis@cumin1002: END (FAIL) - Cookbook sre.hadoop.roll-restart-workers (exit_code=99) restart workers for Hadoop analytics cluster: Roll restart of jvm daemons for openjdk upgrade. * 14:43 elukey@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 14:29 cdanis: [[phab:T380226|T380226]] 💙cdanis@mwmaint2002.codfw.wmnet ~ 🕤☕ mwscript sql.php --wiki=commonswiki --cluster=extension1 /srv/mediawiki/php-1.44.0-wmf.4/extensions/JsonConfig/sql/mysql/tables-generated.sql * 14:25 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp7007.magru.wmnet [reason: host reimaged] * 14:24 btullis@cumin1002: START - Cookbook sre.ceph.roll-restart-reboot-server rolling reboot on P<nowiki>{</nowiki>cephosd100[2-4].eqiad.wmnet<nowiki>}</nowiki> and (A:cephosd) * 14:23 jynus: starting resharding of commons backup files into new host backup1010 [[phab:T376892|T376892]] * 14:23 sukhe: running homer on asw*magru* * 14:06 jiji@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 14:05 jiji@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'. * 14:05 jiji@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 14:05 jiji@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 14:05 jiji@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'. * 14:04 jiji@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'. * 14:04 jiji@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 14:04 jiji@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 14:04 jiji@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 14:03 jiji@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 14:03 jiji@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 14:03 jiji@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 14:03 jiji@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 14:03 jiji@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 14:03 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 14:02 jiji@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'. * 14:02 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 14:02 jiji@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'. * 13:56 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2136-2139,2141-2155].codfw.wmnet * 13:55 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2136-2139,2141-2155].codfw.wmnet * 13:53 claime: homer 'lsw1-d4-codfw*' commit '[[phab:T377028|T377028]]' * 13:52 claime: homer 'lsw1-b4-codfw*' commit '[[phab:T377028|T377028]]' * 13:52 claime: homer 'lsw1-d2-codfw*' commit '[[phab:T377028|T377028]]' * 13:51 claime: homer 'lsw1-c2-codfw*' commit '[[phab:T377028|T377028]]' * 13:50 claime: homer 'lsw1-d7-codfw*' commit '[[phab:T377028|T377028]]' * 13:50 claime: homer 'lsw1-c4-codfw*' commit '[[phab:T377028|T377028]]' * 13:49 claime: homer 'lsw1-d5-codfw*' commit '[[phab:T377028|T377028]]' * 13:48 claime: homer 'lsw1-b7-codfw*' commit '[[phab:T377028|T377028]]' * 13:47 claime: homer 'lsw1-c7-codfw*' commit '[[phab:T377028|T377028]]' * 13:46 claime: homer 'lsw1-d6-codfw*' commit '[[phab:T377028|T377028]]' * 13:45 claime: homer 'lsw1-b2-codfw*' commit '[[phab:T377028|T377028]]' * 13:44 claime: homer 'lsw1-d1-codfw*' commit '[[phab:T377028|T377028]]' * 13:41 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2151.codfw.wmnet with OS bookworm * 13:38 effie: putting kafka-main1006.eqiad.wmnet in production * 13:38 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2152.codfw.wmnet with OS bookworm * 13:36 jiji@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-main-eqiad * 13:33 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2154.codfw.wmnet with OS bookworm * 13:31 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2155.codfw.wmnet with OS bookworm * 13:29 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 13:28 btullis@cumin1002: START - Cookbook sre.hadoop.roll-restart-workers restart workers for Hadoop analytics cluster: Roll restart of jvm daemons for openjdk upgrade. * 13:28 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 13:26 jiji@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-main-eqiad * 13:26 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2153.codfw.wmnet with OS bookworm * 13:23 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2150.codfw.wmnet with OS bookworm * 13:21 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 13:17 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7007.magru.wmnet with OS bullseye * 13:17 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 13:14 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 13:11 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 13:07 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 13:03 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 13:02 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 13:02 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 13:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1017.eqiad.wmnet * 13:01 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 13:01 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 13:00 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 13:00 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 12:55 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1017.eqiad.wmnet * 12:51 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 12:50 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7007.magru.wmnet with reason: host reimage * 12:50 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 12:49 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1017.eqiad.wmnet * 12:46 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7007.magru.wmnet with reason: host reimage * 12:44 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2155.codfw.wmnet with OS bookworm * 12:43 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2154.codfw.wmnet with OS bookworm * 12:42 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2153.codfw.wmnet with OS bookworm * 12:42 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2152.codfw.wmnet with OS bookworm * 12:41 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2143.codfw.wmnet with OS bookworm * 12:41 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2151.codfw.wmnet with OS bookworm * 12:41 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2150.codfw.wmnet with OS bookworm * 12:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2146.codfw.wmnet with OS bookworm * 12:38 sukhe: re-enable puppet on cumin2002 * 12:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 12:34 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2145.codfw.wmnet with OS bookworm * 12:33 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 12:31 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2147.codfw.wmnet with OS bookworm * 12:26 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2148.codfw.wmnet with OS bookworm * 12:23 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2149.codfw.wmnet with OS bookworm * 12:23 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 12:22 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 12:22 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 12:21 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2144.codfw.wmnet with OS bookworm * 12:20 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp7007.magru.wmnet with OS bullseye * 12:19 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host cp7007.magru.wmnet * 12:18 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 12:16 sukhe@cumin2002: START - Cookbook sre.hosts.dhcp for host cp7007.magru.wmnet * 12:16 sukhe@cumin1002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host cp7007.magru.wmnet * 12:15 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 12:14 sukhe@cumin1002: START - Cookbook sre.hosts.dhcp for host cp7007.magru.wmnet * 12:11 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 12:08 sukhe: disable puppet on cumin2002 to test cumin alias for A:installserver * 12:07 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 12:04 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 12:01 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 11:59 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 11:59 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 11:58 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 11:57 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 11:57 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 11:56 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 11:56 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 11:40 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2149.codfw.wmnet with OS bookworm * 11:39 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2148.codfw.wmnet with OS bookworm * 11:39 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2147.codfw.wmnet with OS bookworm * 11:38 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2146.codfw.wmnet with OS bookworm * 11:38 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2145.codfw.wmnet with OS bookworm * 11:37 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2144.codfw.wmnet with OS bookworm * 11:36 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2143.codfw.wmnet with OS bookworm * 11:30 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_magru * 11:24 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_magru * 11:22 akosiaris: decommission cxserver endpoints /api/rest_v1/transform/html/from, /api/rest_v1/transform/word/from from RESTBase [[phab:T375616|T375616]] * 10:43 btullis@cumin1002: END (PASS) - Cookbook sre.ceph.roll-restart-reboot-server (exit_code=0) rolling reboot on P<nowiki>{</nowiki>cephosd1001.eqiad.wmnet<nowiki>}</nowiki> and (A:cephosd) * 10:38 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_magru * 10:38 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_magru * 10:37 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_esams * 10:34 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_esams * 10:33 btullis@cumin1002: START - Cookbook sre.ceph.roll-restart-reboot-server rolling reboot on P<nowiki>{</nowiki>cephosd1001.eqiad.wmnet<nowiki>}</nowiki> and (A:cephosd) * 10:33 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on kafka-main[1001,1006].eqiad.wmnet with reason: Hardware refresh * 10:33 jayme: re-enabled puppet on all k8s controll planes for rollout of [[phab:T380142|T380142]] * 10:33 jiji@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on kafka-main[1001,1006].eqiad.wmnet with reason: Hardware refresh * 10:22 effie: removing leadership from kafka-main1001 - [[phab:T363214|T363214]] * 10:19 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 10:18 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 09:52 aklapper@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.4 refs [[phab:T375663|T375663]] * 09:44 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 09:44 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 09:41 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . * 09:38 akosiaris: decommission cxserver endpoints /api/rest_v1/list/(pair{{!}}tool{{!}}languagepairs) from RESTBase [[phab:T375616|T375616]] * 09:35 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 09:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 09:33 aklapper@deploy2002: Finished scap sync-world: Backport for [[gerrit:1093172{{!}}EditionLookup: Update EntityLookup calls (T380304)]] (duration: 13m 33s) * 09:33 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_esams * 09:33 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_esams * 09:28 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 09:27 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 09:27 aklapper@deploy2002: aklapper, thiemowmde: Continuing with sync * 09:26 aklapper@deploy2002: aklapper, thiemowmde: Backport for [[gerrit:1093172{{!}}EditionLookup: Update EntityLookup calls (T380304)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 09:21 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of prometheus7001.magru.wmnet to plain * 09:20 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of prometheus7001.magru.wmnet to plain * 09:20 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 09:20 aklapper@deploy2002: Started scap sync-world: Backport for [[gerrit:1093172{{!}}EditionLookup: Update EntityLookup calls (T380304)]] * 09:19 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 09:18 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of doh7002.wikimedia.org to plain * 09:15 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of doh7002.wikimedia.org to plain * 09:13 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir7002.magru.wmnet to plain * 09:13 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir7002.magru.wmnet to plain * 08:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of durum7002.magru.wmnet to plain * 08:51 jayme: disabling puppet on all k8s controll planes for rollout of [[phab:T380142|T380142]] * 08:48 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of durum7002.magru.wmnet to plain * 08:46 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of bast7001.wikimedia.org to plain * 08:44 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of bast7001.wikimedia.org to plain * 08:35 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti7004.magru.wmnet * 08:35 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti7004.magru.wmnet * 08:35 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti7004.magru.wmnet * 08:34 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti7004.magru.wmnet * 08:18 hashar: Restarted CI Jenkins to upgrade Leastload plugin and remove the SSH server plugin == 2024-11-19 == * 22:50 ryankemper@deploy2002: Started deploy [wdqs/wdqs@9927a5a] (wcqs): Deploy 0.3.150 to WCQS * 22:00 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1092341{{!}}Enable experimental Parsoid fragment support on labs and test wikis (T374661)]], [[gerrit:1092850{{!}}Revert "editcheck: Remove try/catch around transaction squashing" (T333710 T380234)]], [[gerrit:1092851{{!}}Revert "editcheck: Remove try/catch around transaction squashing" (T333710 T380234)]] (duration: 20m 39s) * 21:53 urbanecm@deploy2002: cscott, kemayo, urbanecm: Continuing with sync * 21:45 urbanecm@deploy2002: cscott, kemayo, urbanecm: Backport for [[gerrit:1092341{{!}}Enable experimental Parsoid fragment support on labs and test wikis (T374661)]], [[gerrit:1092850{{!}}Revert "editcheck: Remove try/catch around transaction squashing" (T333710 T380234)]], [[gerrit:1092851{{!}}Revert "editcheck: Remove try/catch around transaction squashing" (T333710 T380234)]] synced to the testservers (https://wikitech.wikimedia.or * 21:39 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es2041.codfw.wmnet with OS bookworm * 21:39 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1092341{{!}}Enable experimental Parsoid fragment support on labs and test wikis (T374661)]], [[gerrit:1092850{{!}}Revert "editcheck: Remove try/catch around transaction squashing" (T333710 T380234)]], [[gerrit:1092851{{!}}Revert "editcheck: Remove try/catch around transaction squashing" (T333710 T380234)]] * 21:38 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1092296{{!}}Promote Vector 2022 as default on 3 wikis (T379765)]], [[gerrit:1092912{{!}}Separate cache key space for test & production JsonConfig data (T380320)]] (duration: 14m 38s) * 21:31 urbanecm@deploy2002: bvibber, jdlrobson, urbanecm: Continuing with sync * 21:29 urbanecm@deploy2002: bvibber, jdlrobson, urbanecm: Backport for [[gerrit:1092296{{!}}Promote Vector 2022 as default on 3 wikis (T379765)]], [[gerrit:1092912{{!}}Separate cache key space for test & production JsonConfig data (T380320)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:23 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1092296{{!}}Promote Vector 2022 as default on 3 wikis (T379765)]], [[gerrit:1092912{{!}}Separate cache key space for test & production JsonConfig data (T380320)]] * 21:16 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on restbase2038.codfw.wmnet with reason: Bootstrapping — [[phab:T380236|T380236]] * 21:15 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on restbase2038.codfw.wmnet with reason: Bootstrapping — [[phab:T380236|T380236]] * 21:15 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on restbase2037.codfw.wmnet with reason: Bootstrapping — [[phab:T380236|T380236]] * 21:15 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on restbase2037.codfw.wmnet with reason: Bootstrapping — [[phab:T380236|T380236]] * 21:15 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on restbase2036.codfw.wmnet with reason: Bootstrapping — [[phab:T380236|T380236]] * 21:15 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on restbase2036.codfw.wmnet with reason: Bootstrapping — [[phab:T380236|T380236]] * 20:56 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host es2041.codfw.wmnet with OS bookworm * 20:50 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host thanos-be2005.codfw.wmnet with OS bullseye * 20:40 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2005.codfw.wmnet with OS bullseye * 20:40 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-be2005.codfw.wmnet with OS bullseye * 20:32 sukhe@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp7007.magru.wmnet with OS bullseye * 20:29 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host cp7007.magru.wmnet with OS bullseye * 20:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es2041.codfw.wmnet with OS bookworm * 20:24 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2005.codfw.wmnet with OS bullseye * 20:10 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 20:10 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 20:05 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host es2041.codfw.wmnet with OS bookworm * 20:03 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1183.eqiad.wmnet with OS bullseye * 20:03 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 19:47 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host cp7007.magru.wmnet * 19:41 sukhe@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp7007.magru.wmnet with OS bullseye * 19:40 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host cp7007.magru.wmnet * 19:34 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 19:17 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@a4d0954]: mjolnir: [[phab:T379045|T379045]] Increase maxResultSize (duration: 00m 26s) * 19:16 ebernhardson@deploy2002: Started deploy [airflow-dags/search@a4d0954]: mjolnir: [[phab:T379045|T379045]] Increase maxResultSize * 19:15 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host cp7007.magru.wmnet with OS bullseye * 19:14 sukhe@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp7007.magru.wmnet with OS bullseye * 19:12 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1183.eqiad.wmnet with reason: host reimage * 19:08 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host cp7007.magru.wmnet with OS bullseye * 19:08 sukhe@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp7007.magru.wmnet with OS bullseye * 19:08 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1183.eqiad.wmnet with reason: host reimage * 19:05 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:05 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 18:53 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1183.eqiad.wmnet with OS bullseye * 18:53 brett: Import ncmonitor 1.3.0-1 into main apt repo * 18:52 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1183.eqiad.wmnet with OS bullseye * 18:48 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host cp7007.magru.wmnet with OS bullseye * 18:47 sukhe@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp7007.magru.wmnet with OS bullseye * 18:39 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply * 18:36 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply * 18:34 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply * 18:34 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host cp7007.magru.wmnet with OS bullseye * 18:34 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply * 18:34 sukhe@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp7007.magru.wmnet with OS bullseye * 18:32 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 18:32 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 18:07 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host cp7007.magru.wmnet with OS bullseye * 17:57 brennen@deploy2002: Finished scap sync-world: Backport for [[gerrit:1092875{{!}}Prevent ce_event_wikis query when feature flag is off (T380288)]] (duration: 15m 10s) * 17:56 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1326.eqiad.wmnet with OS bookworm * 17:56 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 17:55 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 17:54 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1327.eqiad.wmnet with OS bookworm * 17:53 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 17:53 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 17:52 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1183.eqiad.wmnet with OS bullseye * 17:50 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1325.eqiad.wmnet with OS bookworm * 17:50 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 17:50 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 17:50 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1183.eqiad.wmnet with OS bullseye * 17:50 brennen@deploy2002: daimona, brennen: Continuing with sync * 17:48 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1323.eqiad.wmnet with OS bookworm * 17:48 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 17:47 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host wikikube-worker1290 * 17:47 cmooney@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1290 * 17:47 brennen@deploy2002: daimona, brennen: Backport for [[gerrit:1092875{{!}}Prevent ce_event_wikis query when feature flag is off (T380288)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 17:47 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 17:45 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1322.eqiad.wmnet with OS bookworm * 17:45 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 17:43 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 17:42 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on wikikube-worker1290.eqiad.wmnet with reason: being moved to new port * 17:42 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on wikikube-worker1290.eqiad.wmnet with reason: being moved to new port * 17:42 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 17:41 brennen@deploy2002: Started scap sync-world: Backport for [[gerrit:1092875{{!}}Prevent ce_event_wikis query when feature flag is off (T380288)]] * 17:41 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 17:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1324.eqiad.wmnet with OS bookworm * 17:41 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 17:40 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 17:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1326.eqiad.wmnet with reason: host reimage * 17:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2110.codfw.wmnet with OS bullseye * 17:37 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:36 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1327.eqiad.wmnet with reason: host reimage * 17:34 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1183.eqiad.wmnet with OS bullseye * 17:32 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1325.eqiad.wmnet with reason: host reimage * 17:29 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1323.eqiad.wmnet with reason: host reimage * 17:28 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1326.eqiad.wmnet with reason: host reimage * 17:28 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1327.eqiad.wmnet with reason: host reimage * 17:28 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1325.eqiad.wmnet with reason: host reimage * 17:26 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1322.eqiad.wmnet with reason: host reimage * 17:23 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1324.eqiad.wmnet with reason: host reimage * 17:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2110.codfw.wmnet with reason: host reimage * 17:18 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1323.eqiad.wmnet with reason: host reimage * 17:18 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1314.eqiad.wmnet with OS bookworm * 17:18 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 17:18 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1324.eqiad.wmnet with reason: host reimage * 17:18 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1322.eqiad.wmnet with reason: host reimage * 17:18 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 17:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2110.codfw.wmnet with reason: host reimage * 17:15 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2140.codfw.wmnet with OS bookworm * 17:15 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1318.eqiad.wmnet with OS bookworm * 17:15 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 17:14 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 17:11 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1319.eqiad.wmnet with OS bookworm * 17:11 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 17:11 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 17:11 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1326.eqiad.wmnet with OS bookworm * 17:10 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1327.eqiad.wmnet with OS bookworm * 17:10 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1325.eqiad.wmnet with OS bookworm * 17:09 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1320.eqiad.wmnet with OS bookworm * 17:09 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 17:08 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 17:04 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1321.eqiad.wmnet with OS bookworm * 17:04 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 17:04 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 17:02 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1316.eqiad.wmnet with OS bookworm * 17:02 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 17:01 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 17:00 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1323.eqiad.wmnet with OS bookworm * 17:00 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1324.eqiad.wmnet with OS bookworm * 17:00 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1322.eqiad.wmnet with OS bookworm * 17:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2110.codfw.wmnet with OS bullseye * 17:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2110'] * 17:00 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1314.eqiad.wmnet with reason: host reimage * 17:00 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2110'] * 16:58 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1317.eqiad.wmnet with OS bookworm * 16:58 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 16:58 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 16:56 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1318.eqiad.wmnet with reason: host reimage * 16:56 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1315.eqiad.wmnet with OS bookworm * 16:56 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 16:55 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 16:53 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1319.eqiad.wmnet with reason: host reimage * 16:52 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1313.eqiad.wmnet with OS bookworm * 16:52 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 16:52 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 16:50 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1320.eqiad.wmnet with reason: host reimage * 16:46 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1321.eqiad.wmnet with reason: host reimage * 16:43 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1316.eqiad.wmnet with reason: host reimage * 16:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1317.eqiad.wmnet with reason: host reimage * 16:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2110.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:37 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1315.eqiad.wmnet with reason: host reimage * 16:36 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1320.eqiad.wmnet with reason: host reimage * 16:36 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp7007.magru.wmnet * 16:35 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1321.eqiad.wmnet with reason: host reimage * 16:34 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1318.eqiad.wmnet with reason: host reimage * 16:34 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1319.eqiad.wmnet with reason: host reimage * 16:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1313.eqiad.wmnet with reason: host reimage * 16:33 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1316.eqiad.wmnet with reason: host reimage * 16:33 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1317.eqiad.wmnet with reason: host reimage * 16:33 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1315.eqiad.wmnet with reason: host reimage * 16:31 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1314.eqiad.wmnet with reason: host reimage * 16:30 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1313.eqiad.wmnet with reason: host reimage * 16:29 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2110.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:28 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic2110.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2110.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:24 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2142.codfw.wmnet with OS bookworm * 16:19 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2139.codfw.wmnet with OS bookworm * 16:17 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1319.eqiad.wmnet with OS bookworm * 16:17 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1320.eqiad.wmnet with OS bookworm * 16:17 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1321.eqiad.wmnet with OS bookworm * 16:17 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1318.eqiad.wmnet with OS bookworm * 16:16 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2141.codfw.wmnet with OS bookworm * 16:15 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1317.eqiad.wmnet with OS bookworm * 16:15 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1316.eqiad.wmnet with OS bookworm * 16:15 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1315.eqiad.wmnet with OS bookworm * 16:13 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1314.eqiad.wmnet with OS bookworm * 16:13 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1313.eqiad.wmnet with OS bookworm * 16:13 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2138.codfw.wmnet with OS bookworm * 16:09 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2137.codfw.wmnet with OS bookworm * 16:07 dreamyjazz@deploy2002: Finished scap sync-world: Backport for [[gerrit:1092856{{!}}ExperimentUserDefaultsManager: Decrease log severity to debug (T380271)]] (duration: 13m 16s) * 16:04 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 16:03 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 16:00 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2139.codfw.wmnet with reason: host reimage * 15:59 dreamyjazz@deploy2002: dreamyjazz: Continuing with sync * 15:59 dreamyjazz@deploy2002: dreamyjazz: Backport for [[gerrit:1092856{{!}}ExperimentUserDefaultsManager: Decrease log severity to debug (T380271)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:57 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:55 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2140.codfw.wmnet with OS bookworm * 15:54 cgoubert@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-worker2140.codfw.wmnet with OS bookworm * 15:53 dreamyjazz@deploy2002: Started scap sync-world: Backport for [[gerrit:1092856{{!}}ExperimentUserDefaultsManager: Decrease log severity to debug (T380271)]] * 15:53 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 15:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 15:48 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:47 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:47 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2139.codfw.wmnet with reason: host reimage * 15:46 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 15:46 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 15:45 moritzm: installing libheif security updates * 15:44 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 15:40 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 15:29 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2142.codfw.wmnet with OS bookworm * 15:29 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2141.codfw.wmnet with OS bookworm * 15:29 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2139.codfw.wmnet with OS bookworm * 15:28 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2138.codfw.wmnet with OS bookworm * 15:28 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2137.codfw.wmnet with OS bookworm * 15:25 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2139.codfw.wmnet with OS bookworm * 15:25 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2138.codfw.wmnet with OS bookworm * 15:22 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 15:21 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2142.codfw.wmnet with OS bookworm * 15:21 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2141.codfw.wmnet with OS bookworm * 15:21 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2137.codfw.wmnet with OS bookworm * 15:21 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 15:15 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7007.magru.wmnet with OS bullseye * 15:14 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_eqiad * 15:11 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_eqiad * 15:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from codfw to eqiad * 15:06 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.finalize for the switch from codfw to eqiad * 15:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from codfw to eqiad * 15:05 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from codfw to eqiad * away: UTC afternoon deploys done * 14:59 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1092333{{!}}Use 'auth' rather than 'sso' as cookie prefix on the auth domain (T379811)]] (duration: 14m 16s) * 14:52 tgr@deploy2002: tgr: Continuing with sync * 14:50 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7007.magru.wmnet with reason: host reimage * 14:50 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from eqiad to codfw * 14:50 tgr@deploy2002: tgr: Backport for [[gerrit:1092333{{!}}Use 'auth' rather than 'sso' as cookie prefix on the auth domain (T379811)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:49 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.finalize for the switch from eqiad to codfw * 14:49 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from eqiad to codfw * 14:48 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from eqiad to codfw * 14:46 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7007.magru.wmnet with reason: host reimage * 14:45 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2140.codfw.wmnet with OS bookworm * 14:44 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1092333{{!}}Use 'auth' rather than 'sso' as cookie prefix on the auth domain (T379811)]] * 14:44 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2142.codfw.wmnet with OS bookworm * 14:44 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2141.codfw.wmnet with OS bookworm * 14:43 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2139.codfw.wmnet with OS bookworm * 14:42 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2138.codfw.wmnet with OS bookworm * 14:41 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2137.codfw.wmnet with OS bookworm * 14:40 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 14:39 elukey: limit /v2/_catalog to internal IPs only for all Docker Registry nodes - [[phab:T378618|T378618]] * 14:38 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1092740{{!}}Enable message group subscription feature for MediaWiki.org (T372386)]] (duration: 16m 21s) * 14:35 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from codfw to eqiad * 14:34 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.finalize for the switch from codfw to eqiad * 14:34 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from codfw to eqiad * 14:33 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from codfw to eqiad * 14:31 kartik@deploy2002: kartik, abi: Continuing with sync * 14:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from eqiad to codfw * 14:30 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.finalize for the switch from eqiad to codfw * 14:29 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from eqiad to codfw * 14:28 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from eqiad to codfw * 14:28 kartik@deploy2002: kartik, abi: Backport for [[gerrit:1092740{{!}}Enable message group subscription feature for MediaWiki.org (T372386)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:26 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_eqiad * 14:26 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_eqiad * 14:25 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from codfw to eqiad * 14:24 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.finalize for the switch from codfw to eqiad * 14:23 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from codfw to eqiad * 14:23 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from codfw to eqiad * 14:22 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1092740{{!}}Enable message group subscription feature for MediaWiki.org (T372386)]] * 14:22 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from codfw to eqiad * 14:21 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.finalize for the switch from codfw to eqiad * 14:21 fabfur@cumin1002: START - Cookbook sre.hosts.reimage for host cp7007.magru.wmnet with OS bullseye * 14:21 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_drmrs * 14:18 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_drmrs * 14:17 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1092257{{!}}Enable the Contribute menu in 3rd group of Wikis (T375301)]] (duration: 15m 07s) * 14:15 joal@deploy2002: Finished deploy [analytics/refinery@295d5a4]: Regular analytics weekly train [analytics/refinery@295d5a44] (duration: 08m 56s) * 14:11 kartik@deploy2002: kartik: Continuing with sync * 14:10 akosiaris@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker1290.eqiad.wmnet * 14:10 kartik@deploy2002: kartik: Backport for [[gerrit:1092257{{!}}Enable the Contribute menu in 3rd group of Wikis (T375301)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:10 akosiaris@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker1290.eqiad.wmnet * 14:07 ihurbain@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: apply * 14:06 joal@deploy2002: Started deploy [analytics/refinery@295d5a4]: Regular analytics weekly train [analytics/refinery@295d5a44] * 14:06 ihurbain@deploy2002: helmfile [codfw] START helmfile.d/services/proton: apply * 14:05 ihurbain@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: apply * 14:04 ihurbain@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: apply * 14:03 ihurbain@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: apply * 14:02 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1092257{{!}}Enable the Contribute menu in 3rd group of Wikis (T375301)]] * 14:02 ihurbain@deploy2002: helmfile [staging] START helmfile.d/services/proton: apply * 14:01 ihurbain@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: apply * 14:01 ihurbain@deploy2002: helmfile [staging] START helmfile.d/services/proton: apply * 13:27 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_drmrs * 13:27 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_drmrs * 13:08 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 266098 * 13:08 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 266098 * 13:08 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 267521 * 13:07 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 267521 * 13:07 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 201838 * 13:06 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 201838 * 13:06 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 262979 * 13:06 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 262979 * 13:06 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 266631 * 13:06 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 266631 * 13:05 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 53180 * 13:05 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 53180 * 13:05 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 21574 * 13:05 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 21574 * 12:57 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:55 cgoubert@cumin1002: START - Cookbook sre.dns.netbox * 12:43 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from eqiad to codfw * 12:42 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.finalize for the switch from eqiad to codfw * 12:41 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from eqiad to codfw * 12:40 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from eqiad to codfw * 12:38 arnaudb@cumin1002: END (FAIL) - Cookbook sre.switchdc.databases.prepare (exit_code=99) for the switch from eqiad to codfw * 12:36 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from eqiad to codfw * 12:35 moritzm: removing ganeti1016 from active Ganeti nodes [[phab:T378921|T378921]] * 12:30 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_codfw * 12:27 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_codfw * 12:23 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from codfw to eqiad * 12:22 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.finalize for the switch from codfw to eqiad * 12:20 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from codfw to eqiad * 12:18 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from codfw to eqiad * 11:59 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1016.eqiad.wmnet * 11:44 arnaudb@cumin1002: dbctl commit (dc=all): 'db2216 (re)pooling @ 100%: repool', diff saved to https://phabricator.wikimedia.org/P71095 and previous config saved to /var/cache/conftool/dbconfig/20241119-114422-arnaudb.json * 11:40 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_codfw * 11:40 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_codfw * 11:29 arnaudb@cumin1002: dbctl commit (dc=all): 'db2216 (re)pooling @ 75%: repool', diff saved to https://phabricator.wikimedia.org/P71094 and previous config saved to /var/cache/conftool/dbconfig/20241119-112917-arnaudb.json * 11:14 arnaudb@cumin1002: dbctl commit (dc=all): 'db2216 (re)pooling @ 50%: repool', diff saved to https://phabricator.wikimedia.org/P71093 and previous config saved to /var/cache/conftool/dbconfig/20241119-111411-arnaudb.json * 11:05 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2004.codfw.wmnet * 11:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 207947 * 11:03 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 207947 * 10:59 arnaudb@cumin1002: dbctl commit (dc=all): 'db2216 (re)pooling @ 25%: repool', diff saved to https://phabricator.wikimedia.org/P71092 and previous config saved to /var/cache/conftool/dbconfig/20241119-105906-arnaudb.json * 10:58 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-gp2004.codfw.wmnet * 10:44 arnaudb@cumin1002: dbctl commit (dc=all): 'db2216 (re)pooling @ 15%: repool', diff saved to https://phabricator.wikimedia.org/P71091 and previous config saved to /var/cache/conftool/dbconfig/20241119-104401-arnaudb.json * 10:41 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_eqsin * 10:37 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_eqsin * 10:28 arnaudb@cumin1002: dbctl commit (dc=all): 'db2216 (re)pooling @ 10%: repool', diff saved to https://phabricator.wikimedia.org/P71090 and previous config saved to /var/cache/conftool/dbconfig/20241119-102855-arnaudb.json * 10:27 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-docker-registry (exit_code=0) rolling restart_daemons on A:docker-registry * 10:25 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-docker-registry rolling restart_daemons on A:docker-registry * 10:16 moritzm: restart spamd on vrts to pick up openssl updates * 10:13 arnaudb@cumin1002: dbctl commit (dc=all): 'db2216 (re)pooling @ 5%: repool', diff saved to https://phabricator.wikimedia.org/P71089 and previous config saved to /var/cache/conftool/dbconfig/20241119-101350-arnaudb.json * 10:02 moritzm: installing openssl security updates * 10:00 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from eqiad to codfw * 10:00 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.finalize for the switch from eqiad to codfw * 09:59 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from eqiad to codfw * 09:59 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.finalize for the switch from eqiad to codfw * 09:58 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from eqiad to codfw * 09:58 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.finalize for the switch from eqiad to codfw * 09:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from eqiad to codfw * 09:52 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from eqiad to codfw * 09:51 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply * 09:51 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply * 09:49 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from eqiad to codfw * 09:49 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from eqiad to codfw * 09:42 fabfur: upgrade haproxy on cp-text{{!}}upload_eqsin ([[phab:T379891|T379891]]) * 09:42 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_eqsin * 09:41 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_eqsin * 09:39 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from codfw to eqiad * 09:39 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply * 09:39 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply * 09:39 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.finalize for the switch from codfw to eqiad * 09:39 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 09:38 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from codfw to eqiad * 09:38 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 09:35 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from codfw to eqiad * 09:33 dcausse@deploy2002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply * 09:32 dcausse@deploy2002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply * 09:19 aklapper@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.4 refs [[phab:T375663|T375663]] * 09:18 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from codfw to eqiad * 09:18 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from codfw to eqiad * 08:59 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1092752{{!}}Add + to nowiki in core-Permissions.php (T380252)]] (duration: 10m 17s) * 08:54 urbanecm@deploy2002: urbanecm, jhsoby: Continuing with sync * 08:54 urbanecm@deploy2002: urbanecm, jhsoby: Backport for [[gerrit:1092752{{!}}Add + to nowiki in core-Permissions.php (T380252)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:49 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1092752{{!}}Add + to nowiki in core-Permissions.php (T380252)]] * 08:48 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1092741{{!}}fix tours by finishing partial variable rename (T380071)]], [[gerrit:1092364{{!}}affcom contactpages: Fix Letter of intent and logo field labels (T375392)]], [[gerrit:1092743{{!}}Add nowiki to commonsuploads dblist (T380252)]] (duration: 14m 29s) * 08:43 urbanecm@deploy2002: ammarpad, migr, jhsoby, urbanecm: Continuing with sync * 08:39 urbanecm@deploy2002: ammarpad, migr, jhsoby, urbanecm: Backport for [[gerrit:1092741{{!}}fix tours by finishing partial variable rename (T380071)]], [[gerrit:1092364{{!}}affcom contactpages: Fix Letter of intent and logo field labels (T375392)]], [[gerrit:1092743{{!}}Add nowiki to commonsuploads dblist (T380252)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:34 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1092741{{!}}fix tours by finishing partial variable rename (T380071)]], [[gerrit:1092364{{!}}affcom contactpages: Fix Letter of intent and logo field labels (T375392)]], [[gerrit:1092743{{!}}Add nowiki to commonsuploads dblist (T380252)]] * 08:29 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1082726{{!}}Translate Event Logging: Enable using $wgTranslateEnableEventLogging (T364460)]], [[gerrit:1092258{{!}}CirrusSearch: enable offloading weighted tags via EventBus (T378983 T377150)]], [[gerrit:1091197{{!}}[GrowthExperiments] Add virtual domain config (T354939)]] (duration: 24m 42s) * 08:22 urbanecm@deploy2002: urbanecm, wangombe, pfischer: Continuing with sync * 08:12 urbanecm@deploy2002: urbanecm, wangombe, pfischer: Backport for [[gerrit:1082726{{!}}Translate Event Logging: Enable using $wgTranslateEnableEventLogging (T364460)]], [[gerrit:1092258{{!}}CirrusSearch: enable offloading weighted tags via EventBus (T378983 T377150)]], [[gerrit:1091197{{!}}[GrowthExperiments] Add virtual domain config (T354939)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:04 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1082726{{!}}Translate Event Logging: Enable using $wgTranslateEnableEventLogging (T364460)]], [[gerrit:1092258{{!}}CirrusSearch: enable offloading weighted tags via EventBus (T378983 T377150)]], [[gerrit:1091197{{!}}[GrowthExperiments] Add virtual domain config (T354939)]] * 07:45 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2202.codfw.wmnet with reason: sad * 07:45 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2202.codfw.wmnet with reason: sad * 07:41 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db1246.eqiad.wmnet with reason: [[phab:T374215|T374215]] - hw maintenance * 07:40 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db1246.eqiad.wmnet with reason: [[phab:T374215|T374215]] - hw maintenance * 07:32 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1016.eqiad.wmnet * 07:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1016.eqiad.wmnet * 07:24 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1016.eqiad.wmnet * 05:01 mwpresync@deploy2002: Pruned MediaWiki: 1.44.0-wmf.1 (duration: 01m 18s) * 04:52 mwpresync@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.4 refs [[phab:T375663|T375663]] (duration: 49m 01s) * 04:16 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1062.eqiad.wmnet with OS bookworm * 04:03 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.4 refs [[phab:T375663|T375663]] * 04:00 ejegg: fundraising civicrm upgraded from {{Gerrit|463a12c5}} to {{Gerrit|e29243f0}} * 03:51 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1062.eqiad.wmnet with reason: host reimage * 03:48 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1062.eqiad.wmnet with reason: host reimage * 03:33 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1062.eqiad.wmnet with OS bookworm * 03:09 ejegg: payments-wiki upgraded from {{Gerrit|459f259b}} to {{Gerrit|c4463536}} * 02:31 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-jumbo1018.eqiad.wmnet with OS bullseye * 02:30 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 02:30 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 02:23 ejegg: standalone (IPN listener) SmashPig upgraded from {{Gerrit|601405dc}} to {{Gerrit|131e92a5}} * 02:12 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-jumbo1018.eqiad.wmnet with reason: host reimage * 02:08 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-jumbo1018.eqiad.wmnet with reason: host reimage * 01:54 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-jumbo1018.eqiad.wmnet with OS bullseye * 01:54 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-jumbo1018.eqiad.wmnet with OS bullseye * 01:51 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-jumbo1016.eqiad.wmnet with OS bullseye * 01:51 jclark@cumin1002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 01:50 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-jumbo1017.eqiad.wmnet with OS bullseye * 01:50 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 01:40 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 01:24 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 01:24 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-jumbo1017.eqiad.wmnet with reason: host reimage * 01:21 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-jumbo1017.eqiad.wmnet with reason: host reimage * 01:12 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host maps-test2006.codfw.wmnet with OS bookworm * 01:12 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002" * 01:07 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-jumbo1018.eqiad.wmnet with OS bullseye * 01:07 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-jumbo1017.eqiad.wmnet with OS bullseye * 01:06 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-jumbo1017.eqiad.wmnet with OS bullseye * 01:03 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002" * 01:02 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-jumbo1016.eqiad.wmnet with reason: host reimage * 00:58 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-jumbo1016.eqiad.wmnet with reason: host reimage * 00:54 tzatziki: removing 1 file for legal compliance * 00:53 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-be2005.codfw.wmnet with OS bookworm * 00:51 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host maps-test2005.codfw.wmnet with OS bookworm * 00:51 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002" * 00:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-jumbo1016.eqiad.wmnet with OS bullseye * 00:42 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps-test2006.codfw.wmnet with reason: host reimage * 00:41 tzatziki: removing 1 file for legal compliance * 00:39 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-jumbo1016.eqiad.wmnet with OS bullseye * 00:39 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps-test2006.codfw.wmnet with reason: host reimage * 00:34 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002" * 00:18 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-jumbo1017.eqiad.wmnet with OS bullseye * 00:18 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-jumbo1017.eqiad.wmnet with OS bullseye * 00:14 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host maps-test2006.codfw.wmnet with OS bookworm * 00:14 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps-test2005.codfw.wmnet with reason: host reimage * 00:14 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host maps-test2004.codfw.wmnet with OS bookworm * 00:14 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002" * 00:10 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002" * 00:10 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps-test2005.codfw.wmnet with reason: host reimage * 00:03 tzatziki: removing 1 file for legal compliance * 00:00 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host maps-test2003.codfw.wmnet with OS bookworm * 00:00 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002" == 2024-11-18 == * 23:51 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002" * 23:50 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps-test2004.codfw.wmnet with reason: host reimage * 23:48 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps-test2004.codfw.wmnet with reason: host reimage * 23:46 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host maps-test2005.codfw.wmnet with OS bookworm * 23:32 tzatziki: removing 1 file for legal compliance * 23:31 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps-test2003.codfw.wmnet with reason: host reimage * 23:28 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host maps-test2002.codfw.wmnet with OS bookworm * 23:28 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002" * 23:27 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002" * 23:26 tzatziki: removing 1 file for legal compliance * 23:26 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps-test2003.codfw.wmnet with reason: host reimage * 23:25 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host maps-test2004.codfw.wmnet with OS bookworm * 23:19 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-be2005.codfw.wmnet with reason: host reimage * 23:15 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-be2005.codfw.wmnet with reason: host reimage * 23:12 tzatziki: removing 2 files for legal compliance * 23:09 eevans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:09 eevans@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Additional IPs for Cassandra — restbase2036 - eevans@cumin1002" * 23:09 eevans@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Additional IPs for Cassandra — restbase2036 - eevans@cumin1002" * 23:08 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps-test2002.codfw.wmnet with reason: host reimage * 23:06 eevans@cumin1002: START - Cookbook sre.dns.netbox * 23:05 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps-test2002.codfw.wmnet with reason: host reimage * 23:04 eevans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:04 eevans@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Additional IPs for Cassandra — restbase2036 - eevans@cumin1002" * 23:04 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host maps-test2003.codfw.wmnet with OS bookworm * 23:04 eevans@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Additional IPs for Cassandra — restbase2036 - eevans@cumin1002" * 23:03 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2005.codfw.wmnet with OS bookworm * 23:01 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host thanos-be2005.codfw.wmnet with OS bookworm * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-jumbo1018.eqiad.wmnet with OS bullseye * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-jumbo1017.eqiad.wmnet with OS bullseye * 23:00 eevans@cumin1002: START - Cookbook sre.dns.netbox * 22:59 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-jumbo1016.eqiad.wmnet with OS bullseye * 22:57 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2005.codfw.wmnet with OS bookworm * 22:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es2045.codfw.wmnet with OS bookworm * 22:55 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host thanos-be2005.codfw.wmnet with OS bookworm * 22:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es2044.codfw.wmnet with OS bookworm * 22:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es2046.codfw.wmnet with OS bookworm * 22:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es2043.codfw.wmnet with OS bookworm * 22:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es2041.codfw.wmnet with OS bookworm * 22:52 tzatziki: removing 10 files for legal compliance * 22:50 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host maps-test2001.codfw.wmnet with OS bookworm * 22:50 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002" * 22:49 bking@deploy2002: Finished deploy [wdqs/wdqs@9927a5a]: 0.3.150 (duration: 11m 59s) * 22:47 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2005.codfw.wmnet with OS bookworm * 22:37 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es2042.codfw.wmnet with OS bookworm * 22:37 bking@deploy2002: Started deploy [wdqs/wdqs@9927a5a]: 0.3.150 * 22:22 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-be2005.codfw.wmnet with OS bookworm * 22:18 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1092336{{!}}[GrowthExperiments] testwiki: Only enable Add Link for new accounts (T380204)]] (duration: 09m 14s) * 22:13 urbanecm@deploy2002: urbanecm: Continuing with sync * 22:13 urbanecm@deploy2002: urbanecm: Backport for [[gerrit:1092336{{!}}[GrowthExperiments] testwiki: Only enable Add Link for new accounts (T380204)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 22:09 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1092336{{!}}[GrowthExperiments] testwiki: Only enable Add Link for new accounts (T380204)]] * 21:58 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1092304{{!}}Use WAN cache for JsonConfig remote fetch cache (T374746)]], [[gerrit:1092300{{!}}Create no-link-recommendation variant (T377787 T380204)]], [[gerrit:1092295{{!}}[GrowthExperiments] testwiki: Enable no-link-recommendation experiment (T380204)]] (duration: 12m 10s) * 21:54 urbanecm@deploy2002: urbanecm, bvibber: Continuing with sync * 21:52 urbanecm@deploy2002: urbanecm, bvibber: Backport for [[gerrit:1092304{{!}}Use WAN cache for JsonConfig remote fetch cache (T374746)]], [[gerrit:1092300{{!}}Create no-link-recommendation variant (T377787 T380204)]], [[gerrit:1092295{{!}}[GrowthExperiments] testwiki: Enable no-link-recommendation experiment (T380204)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:48 effie: upload prometheus-mcrouter-exporter_0.4.0+git20241118-1~wmf1 - [[phab:T380212|T380212]] * 21:46 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1092304{{!}}Use WAN cache for JsonConfig remote fetch cache (T374746)]], [[gerrit:1092300{{!}}Create no-link-recommendation variant (T377787 T380204)]], [[gerrit:1092295{{!}}[GrowthExperiments] testwiki: Enable no-link-recommendation experiment (T380204)]] * 21:42 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002" * 21:36 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1091839{{!}}Rename everything referring to "SSO domain" to use "shared domain" (T379811)]], [[gerrit:1091841{{!}}Rename shared domain sso.wikimedia.org to auth.wikimedia.org (T379811)]], [[gerrit:1091842{{!}}Use DB name rather than server name in shared domain path prefix (T379811)]] (duration: 10m 54s) * 21:31 urbanecm@deploy2002: matmarex, urbanecm: Continuing with sync * 21:30 urbanecm@deploy2002: matmarex, urbanecm: Backport for [[gerrit:1091839{{!}}Rename everything referring to "SSO domain" to use "shared domain" (T379811)]], [[gerrit:1091841{{!}}Rename shared domain sso.wikimedia.org to auth.wikimedia.org (T379811)]], [[gerrit:1091842{{!}}Use DB name rather than server name in shared domain path prefix (T379811)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:29 urbanecm: Add bvibber to wmf-deployment Gerrit group (existing deployer) * 21:26 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1091839{{!}}Rename everything referring to "SSO domain" to use "shared domain" (T379811)]], [[gerrit:1091841{{!}}Rename shared domain sso.wikimedia.org to auth.wikimedia.org (T379811)]], [[gerrit:1091842{{!}}Use DB name rather than server name in shared domain path prefix (T379811)]] * 21:21 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps-test2001.codfw.wmnet with reason: host reimage * 21:18 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps-test2001.codfw.wmnet with reason: host reimage * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host es2046.codfw.wmnet with OS bookworm * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host es2045.codfw.wmnet with OS bookworm * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host es2044.codfw.wmnet with OS bookworm * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host es2043.codfw.wmnet with OS bookworm * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host es2042.codfw.wmnet with OS bookworm * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host es2041.codfw.wmnet with OS bookworm * 21:16 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host maps-test2002.codfw.wmnet with OS bookworm * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['es2042'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['es2042'] * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['es2041'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['es2041'] * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es2042.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es2044.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es2041.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:03 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2005.codfw.wmnet with OS bookworm * 21:01 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host thanos-be2005.codfw.wmnet with OS bookworm * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es2046.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:52 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2005.codfw.wmnet with OS bookworm * 20:51 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host thanos-be2005.codfw.wmnet with OS bullseye * 20:51 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2005.codfw.wmnet with OS bullseye * 20:49 jhathaway: disabling auto-reboot on re-imaging for debugging * 20:49 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host maps-test2001.codfw.wmnet with OS bookworm * 20:39 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host es2046.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:39 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host es2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:39 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host es2044.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:39 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host es2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:39 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host es2042.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:39 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host es2041.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:39 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host thanos-be2005.codfw.wmnet with OS bullseye * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding es2041 to codfw - jhancock@cumin2002" * 20:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding es2041 to codfw - jhancock@cumin2002" * 20:33 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:29 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2005.codfw.wmnet with OS bullseye * 20:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2037.codfw.wmnet with OS bullseye * 20:23 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 20:19 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 20:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2112.codfw.wmnet with OS bullseye * 20:19 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 20:14 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 20:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2113.codfw.wmnet with OS bullseye * 20:12 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 20:11 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 20:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2037.codfw.wmnet with reason: host reimage * 19:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2037.codfw.wmnet with reason: host reimage * 19:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2112.codfw.wmnet with reason: host reimage * 19:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 19:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 19:55 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 19:55 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@594d3b5]: [[phab:T377153|T377153]] Release glent 0.3.5 (duration: 00m 27s) * 19:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2113.codfw.wmnet with reason: host reimage * 19:54 ebernhardson@deploy2002: Started deploy [airflow-dags/search@594d3b5]: [[phab:T377153|T377153]] Release glent 0.3.5 * 19:52 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2112.codfw.wmnet with reason: host reimage * 19:51 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2113.codfw.wmnet with reason: host reimage * 19:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2163.codfw.wmnet with reason: host reimage * 19:36 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2112.codfw.wmnet with OS bullseye * 19:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2113.codfw.wmnet with OS bullseye * 19:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host restbase2037.codfw.wmnet with OS bullseye * 19:34 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2163.codfw.wmnet with reason: host reimage * 19:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2113'] * 19:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['restbase2037'] * 19:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2113'] * 19:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase2037'] * 19:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2113.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host restbase2037.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:22 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic2110.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2113.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2110.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host restbase2037.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:17 swfrench@deploy2002: Finished scap sync-world: Test deployment after adding mwdebug-next check command - [[phab:T372604|T372604]] (duration: 01m 31s) * 19:15 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 19:15 swfrench@deploy2002: Started scap sync-world: Test deployment after adding mwdebug-next check command - [[phab:T372604|T372604]] * 19:08 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply * 18:58 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply * 18:57 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 18:56 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 18:46 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply * 18:45 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply * 18:43 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply * 18:41 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1183.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:27 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply * 18:17 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply * 18:15 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply * 18:15 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply * 18:14 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply * 18:13 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply * 18:12 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host thanos-be2005.codfw.wmnet with OS bullseye * 18:09 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply * 18:08 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply * 18:04 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply * 18:03 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply * 18:03 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply * 18:01 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply * 17:53 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2005.codfw.wmnet with OS bullseye * 17:34 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply * 17:28 xcollazo@deploy2002: Finished deploy [airflow-dags/analytics@16a5867]: Deploy latest DAGs to analytics Airflow instance. [[phab:T368755|T368755]]. (duration: 02m 10s) * 17:25 xcollazo@deploy2002: Started deploy [airflow-dags/analytics@16a5867]: Deploy latest DAGs to analytics Airflow instance. [[phab:T368755|T368755]]. * 17:24 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply * 16:55 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:55 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: set DNS for new maps-test nodes - pt1979@cumin2002" * 16:55 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: set DNS for new maps-test nodes - pt1979@cumin2002" * 16:50 volans: installing spicerack v8.16.2 on cumin1002 * 16:50 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 16:38 volans: installing spicerack v8.16.2 on cumin2002 * 16:34 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[1305-1312].eqiad.wmnet * 16:34 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[1305-1312].eqiad.wmnet * 16:34 volans: uploaded spicerack_8.16.2 to apt.wikimedia.org bullseye-wikimedia * 16:30 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1311.eqiad.wmnet with OS bookworm * 16:25 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1310.eqiad.wmnet with OS bookworm * 16:22 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1312.eqiad.wmnet with OS bookworm * 16:19 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1306.eqiad.wmnet with OS bookworm * 16:16 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1308.eqiad.wmnet with OS bookworm * 16:14 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1309.eqiad.wmnet with OS bookworm * 16:13 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1005.eqiad.wmnet * 16:11 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1311.eqiad.wmnet with reason: host reimage * 16:10 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1307.eqiad.wmnet with OS bookworm * 16:08 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1305.eqiad.wmnet with OS bookworm * 16:07 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1310.eqiad.wmnet with reason: host reimage * 16:06 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-gp1005.eqiad.wmnet * 16:04 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1312.eqiad.wmnet with reason: host reimage * 16:01 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1306.eqiad.wmnet with reason: host reimage * 15:58 Lucas_WMDE: UTC afternoon backport+config window done * 15:58 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1092259{{!}}Unified dashboard: Add UI for page collection recommendations (T368718)]] (duration: 27m 17s) * 15:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1308.eqiad.wmnet with reason: host reimage * 15:56 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1312.eqiad.wmnet with reason: host reimage * 15:55 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1311.eqiad.wmnet with reason: host reimage * 15:54 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1309.eqiad.wmnet with reason: host reimage * 15:51 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1307.eqiad.wmnet with reason: host reimage * 15:51 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1310.eqiad.wmnet with reason: host reimage * 15:50 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1309.eqiad.wmnet with reason: host reimage * 15:49 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1308.eqiad.wmnet with reason: host reimage * 15:49 lucaswerkmeister-wmde@deploy2002: sbisson, lucaswerkmeister-wmde: Continuing with sync * 15:48 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1305.eqiad.wmnet with reason: host reimage * 15:48 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1307.eqiad.wmnet with reason: host reimage * 15:46 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1306.eqiad.wmnet with reason: host reimage * 15:45 lucaswerkmeister-wmde@deploy2002: sbisson, lucaswerkmeister-wmde: Backport for [[gerrit:1092259{{!}}Unified dashboard: Add UI for page collection recommendations (T368718)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:45 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1305.eqiad.wmnet with reason: host reimage * 15:36 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1312.eqiad.wmnet with OS bookworm * 15:36 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1311.eqiad.wmnet with OS bookworm * 15:31 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1310.eqiad.wmnet with OS bookworm * 15:31 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1309.eqiad.wmnet with OS bookworm * 15:31 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1092259{{!}}Unified dashboard: Add UI for page collection recommendations (T368718)]] * 15:30 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1308.eqiad.wmnet with OS bookworm * 15:29 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1307.eqiad.wmnet with OS bookworm * 15:27 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1306.eqiad.wmnet with OS bookworm * 15:26 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1305.eqiad.wmnet with OS bookworm * 15:11 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1091605{{!}}Revert "Allow other input and changes to trigger searchsuggestions to update" (T379983)]] (duration: 08m 14s) * 15:07 lucaswerkmeister-wmde@deploy2002: samtar, lucaswerkmeister-wmde: Continuing with sync * 15:06 lucaswerkmeister-wmde@deploy2002: samtar, lucaswerkmeister-wmde: Backport for [[gerrit:1091605{{!}}Revert "Allow other input and changes to trigger searchsuggestions to update" (T379983)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:03 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1091605{{!}}Revert "Allow other input and changes to trigger searchsuggestions to update" (T379983)]] * 15:00 arnaudb@cumin1002: dbctl commit (dc=all): 'manual depool commit', diff saved to https://phabricator.wikimedia.org/P71077 and previous config saved to /var/cache/conftool/dbconfig/20241118-150020-arnaudb.json * 14:59 arnaudb@cumin1002: dbctl commit (dc=all): 'manual repool commit', diff saved to https://phabricator.wikimedia.org/P71076 and previous config saved to /var/cache/conftool/dbconfig/20241118-145946-arnaudb.json * 14:56 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db2216 slowly with 10 steps - slow motion repool [[phab:T380131|T380131]] * 14:56 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db2216 slowly with 10 steps - slow motion repool [[phab:T380131|T380131]] * 14:52 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2150 slowly with 10 steps - slow repool db2150 [[phab:T380117|T380117]] * 14:32 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[1305-1312].eqiad.wmnet * 14:28 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[1305-1312].eqiad.wmnet * 14:16 claime: running homer 'cr*-eqiad' '[[phab:T379454|T379454]]' * 14:11 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1004.eqiad.wmnet * 14:09 btullis@cumin1002: END (PASS) - Cookbook sre.presto.roll-restart-workers (exit_code=0) for Presto an-presto cluster: Roll restart of all Presto's jvm daemons. * 14:04 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-gp1004.eqiad.wmnet * 13:50 jelto@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikidata-query-gui: apply * 13:49 jelto@deploy2002: helmfile [codfw] START helmfile.d/services/wikidata-query-gui: apply * 13:49 jelto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikidata-query-gui: apply * 13:48 jelto@deploy2002: helmfile [eqiad] START helmfile.d/services/wikidata-query-gui: apply * 13:47 jelto@deploy2002: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply * 13:46 jelto@deploy2002: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply * 13:37 jelto@deploy2002: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply * 13:37 jelto@deploy2002: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply * 13:35 jelto@deploy2002: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply * 13:35 jelto@deploy2002: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply * 13:35 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 13:34 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply * 13:34 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 13:33 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply * 13:31 jelto@deploy2002: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply * 13:31 jelto@deploy2002: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply * 13:31 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply * 13:30 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply * 13:28 jelto@deploy2002: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply * 13:28 jelto@deploy2002: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply * 13:27 btullis@cumin1002: START - Cookbook sre.presto.roll-restart-workers for Presto an-presto cluster: Roll restart of all Presto's jvm daemons. * 13:26 topranks: stopping netbox service on netbox-next test server to restore new database backup from production * 13:25 jelto@deploy2002: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply * 13:25 jelto@deploy2002: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply * 13:20 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1018.eqiad.wmnet with OS bullseye * 13:16 urbanecm: mwmaint2002: Run `extensions/GrowthExperiments/maintenance/refreshLinkRecommendations.php` at `testwiki` for a bunch of pages (P71064 is list of commands executed; [[phab:T378983|T378983]]) * 13:04 jelto@deploy2002: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply * 13:03 jelto@deploy2002: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply * 13:01 moritzm: removing ganeti1021 from active Ganeti nodes [[phab:T378921|T378921]] * 12:56 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1018.eqiad.wmnet with reason: host reimage * 12:54 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1018.eqiad.wmnet with reason: host reimage * 12:39 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host an-presto1018.eqiad.wmnet with OS bullseye * 12:38 btullis@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1018.eqiad.wmnet with OS bullseye * 12:38 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:37 kart_: Updated recommendation api to 2024-11-13-183159-production ([[phab:T379592|T379592]], [[phab:T379037|T379037]]) * 12:36 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db2150 slowly with 10 steps - slow repool db2150 [[phab:T380117|T380117]] * 12:36 cgoubert@cumin1002: START - Cookbook sre.dns.netbox * 12:24 kartik@deploy2002: helmfile [ml-serve-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 12:22 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host an-presto1018.eqiad.wmnet with OS bullseye * 12:22 kartik@deploy2002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 12:21 btullis@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1018.eqiad.wmnet with OS bullseye * 12:19 btullis@cumin1002: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid jvm daemons. * 12:15 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-product: apply * 12:14 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-product: apply * 12:13 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-ulsfo * 12:13 kartik@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 12:10 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 12:09 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-product: apply * 12:08 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host an-presto1018.eqiad.wmnet with OS bullseye * 12:02 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-product: apply * 12:00 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 11:59 elukey@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:59 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 11:58 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:58 elukey@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1021.eqiad.wmnet * 11:45 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:45 elukey@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:41 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 11:41 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2216.codfw.wmnet with reason: [[phab:T380131|T380131]] - table corruption * 11:41 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2216.codfw.wmnet with reason: [[phab:T380131|T380131]] - table corruption * 11:41 elukey@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 11:41 urbanecm: mwmaint2002: Run `extensions/GrowthExperiments/maintenance/refreshLinkRecommendations.php` at `testwiki` for a bunch of pages (P71064 is list of commands executed; [[phab:T378983|T378983]]) * 11:33 btullis@cumin1002: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid jvm daemons. * 11:25 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:25 elukey@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:21 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:16 elukey@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:50 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:50 elukey@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:50 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:49 elukey@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:46 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply * 10:46 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply * 10:45 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:45 elukey@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:43 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:43 elukey@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:41 dcausse@deploy2002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply * 10:41 dcausse@deploy2002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply * 10:39 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:37 elukey@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:27 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:27 elukey@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:15 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:14 fabfur: upgrade haproxy on cp-ulsfo ([[phab:T379891|T379891]]) * 10:14 elukey@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:14 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-ulsfo * 10:13 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:13 elukey@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:47 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply * 09:47 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply * 09:42 moritzm: restarting nginx on acmechief hosts to pick up openssl updates * 09:24 moritzm: installing openssl security updates * 09:18 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:17 elukey@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:57 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1091932{{!}}Enable the Contribute menu in 2nd group of Wikis (T375300)]] (duration: 11m 45s) * 08:55 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 40850 * 08:55 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 40850 * 08:53 kartik@deploy2002: kartik: Continuing with sync * 08:49 kartik@deploy2002: kartik: Backport for [[gerrit:1091932{{!}}Enable the Contribute menu in 2nd group of Wikis (T375300)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:45 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1091932{{!}}Enable the Contribute menu in 2nd group of Wikis (T375300)]] * 08:44 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on registry1004.eqiad.wmnet with reason: testing * 08:44 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on registry1004.eqiad.wmnet with reason: testing * 08:43 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1091912{{!}}bjnwikiquote: Add local logo (T375054)]] (duration: 22m 55s) * 08:31 kartik@deploy2002: kartik, hamishz: Continuing with sync * 08:30 kartik@deploy2002: kartik, hamishz: Backport for [[gerrit:1091912{{!}}bjnwikiquote: Add local logo (T375054)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:20 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1091912{{!}}bjnwikiquote: Add local logo (T375054)]] * 08:07 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1021.eqiad.wmnet * 08:07 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1021.eqiad.wmnet * 08:05 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1021.eqiad.wmnet * 08:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1021.eqiad.wmnet * 08:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1021.eqiad.wmnet * 08:01 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1021.eqiad.wmnet * 07:56 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1021.eqiad.wmnet * 07:54 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1020.eqiad.wmnet * 07:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1020.eqiad.wmnet * 07:51 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1020.eqiad.wmnet * 07:47 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1020.eqiad.wmnet * 07:46 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:46 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:46 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 07:46 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 06:31 kart_: Updated MinT to 2024-10-16-065051-production on eqiad * 06:28 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply * 06:19 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply == 2024-11-17 == * 16:41 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2216.codfw.wmnet with reason: Sad * 16:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db2216.codfw.wmnet with reason: Sad * 16:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2216 sad', diff saved to https://phabricator.wikimedia.org/P71059 and previous config saved to /var/cache/conftool/dbconfig/20241117-163522-ladsgroup.json == 2024-11-16 == * 20:30 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-jumbo1017.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:29 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-jumbo1016.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:29 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-jumbo1018.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:09 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:09 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for wikikube-worker - jclark@cumin1002" * 18:08 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for wikikube-worker - jclark@cumin1002" * 18:06 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1183.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:05 jclark@cumin1002: START - Cookbook sre.dns.netbox * 18:01 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:59 jclark@cumin1002: START - Cookbook sre.dns.netbox * 17:59 jclark@cumin1002: START - Cookbook sre.hosts.provision for host kafka-jumbo1018.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:56 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-jumbo1018.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:56 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:56 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for wikikube-worker - jclark@cumin1002" * 17:56 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for wikikube-worker - jclark@cumin1002" * 17:55 jclark@cumin1002: START - Cookbook sre.hosts.provision for host kafka-jumbo1016.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:55 jclark@cumin1002: START - Cookbook sre.hosts.provision for host kafka-jumbo1017.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:53 jclark@cumin1002: START - Cookbook sre.hosts.provision for host kafka-jumbo1018.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:52 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1313.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:52 jclark@cumin1002: START - Cookbook sre.dns.netbox * 17:50 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:50 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for wikikube-worker - jclark@cumin1002" * 17:50 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for wikikube-worker - jclark@cumin1002" * 17:45 jclark@cumin1002: START - Cookbook sre.dns.netbox * 17:14 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1323.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:11 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1327.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:11 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1327.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:09 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:09 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for wikikube-worker - jclark@cumin1002" * 17:09 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for wikikube-worker - jclark@cumin1002" * 17:08 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1313.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:05 jclark@cumin1002: START - Cookbook sre.dns.netbox * 17:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1327.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:01 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1326.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:57 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1321.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:55 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1324.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:54 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1322.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:54 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1320.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:53 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1325.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:52 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1319.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:52 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1316.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:51 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1318.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:50 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1315.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:49 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1317.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:49 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1314.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:42 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1326.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:42 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1327.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:36 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1323.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:36 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1324.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:36 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1322.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:36 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1321.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:36 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1320.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1325.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:32 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1318.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:32 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1317.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:32 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1316.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:31 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1315.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:31 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1314.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:31 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1319.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:30 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:30 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for wikikube-worker - jclark@cumin1002" * 16:30 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for wikikube-worker - jclark@cumin1002" * 16:27 jclark@cumin1002: START - Cookbook sre.dns.netbox * 00:44 tzatziki: removing 103 files for legal compliance == 2024-11-15 == * 23:42 tzatziki: removing 1 file for legal compliance * 23:19 tzatziki: removing 3 files for legal compliance * 22:34 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2112.codfw.wmnet with OS bullseye * 21:59 Dreamy_Jazz: Started MediaModeration scan on all wikis other than commonswiki attempting to scan all failed to be scanned images - https://wikitech.wikimedia.org/wiki/MediaModeration * 21:59 Dreamy_Jazz: Started MediaModeration scan on commons wiki attempting to scan all failed to be scanned images - https://wikitech.wikimedia.org/wiki/MediaModeration * 21:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2115.codfw.wmnet with OS bullseye * 21:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:56 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2114.codfw.wmnet with OS bullseye * 21:53 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:53 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:51 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2111.codfw.wmnet with OS bullseye * 21:50 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2115.codfw.wmnet with reason: host reimage * 21:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2038.codfw.wmnet with OS bullseye * 21:35 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2114.codfw.wmnet with reason: host reimage * 21:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2036.codfw.wmnet with OS bullseye * 21:35 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:33 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2111.codfw.wmnet with reason: host reimage * 21:30 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2115.codfw.wmnet with reason: host reimage * 21:30 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2114.codfw.wmnet with reason: host reimage * 21:30 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2111.codfw.wmnet with reason: host reimage * 21:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:14 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2115.codfw.wmnet with OS bullseye * 21:14 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2114.codfw.wmnet with OS bullseye * 21:14 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2112.codfw.wmnet with OS bullseye * 21:14 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2111.codfw.wmnet with OS bullseye * 21:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2038.codfw.wmnet with reason: host reimage * 21:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2115'] * 21:13 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2115'] * 21:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2114'] * 21:12 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2114'] * 21:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2112'] * 21:12 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2112'] * 21:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2111'] * 21:12 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2111'] * 21:11 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2110'] * 21:11 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic2113.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2036.codfw.wmnet with reason: host reimage * 21:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2114.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2111.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2038.codfw.wmnet with reason: host reimage * 21:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2115.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2112.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2036.codfw.wmnet with reason: host reimage * 21:04 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic2110.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:56 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2115.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:56 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2114.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:56 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2113.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:56 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2112.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:56 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2111.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:56 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2110.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding elastic2110 to codfw - jhancock@cumin2002" * 20:54 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding elastic2110 to codfw - jhancock@cumin2002" * 20:50 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:45 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host restbase2038.codfw.wmnet with OS bullseye * 20:45 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host restbase2036.codfw.wmnet with OS bullseye * 20:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['restbase2036'] * 20:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['restbase2038'] * 20:43 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase2038'] * 20:43 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase2036'] * 20:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host restbase2038.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host restbase2036.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:41 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host restbase2037 * 20:40 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host restbase2037 * 20:40 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host restbase2037.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host restbase2038.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host restbase2037.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host restbase2036.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:31 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding restbase2036 to codfw - jhancock@cumin2002" * 20:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding restbase2036 to codfw - jhancock@cumin2002" * 20:27 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 19:54 dancy@deploy2002: Finished scap sync-world: Testing [[phab:T377883|T377883]] (duration: 03m 06s) * 19:51 dancy@deploy2002: Started scap sync-world: Testing [[phab:T377883|T377883]] * 19:50 dancy@deploy2002: Installation of scap version "4.124.0" completed for 206 hosts * 19:46 dancy@deploy2002: Installing scap version "4.124.0" for 206 hosts * 18:53 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 18:52 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 18:35 cjming@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply * 18:34 cjming@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply * 18:32 cjming@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply * 18:31 cjming@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply * 18:15 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 18:15 elukey@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 18:09 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 18:08 elukey@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:58 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@82083c4]: image suggestions hotfix - section titles denylist dependency (duration: 01m 58s) * 16:57 taavi: copy python3-flask-<nowiki>{</nowiki>keystone,oslolog<nowiki>}</nowiki> from bullseye-wikimedia to bookworm-wikimedia * 16:56 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@82083c4]: image suggestions hotfix - section titles denylist dependency * 16:27 herron@cumin2002: conftool action : set/pooled=yes; selector: name=aux-k8s-worker1005.eqiad.wmnet,cluster=aux-k8s,service=kubesvc * 16:27 herron@cumin2002: conftool action : set/weight=10; selector: name=aux-k8s-worker1005.eqiad.wmnet,cluster=aux-k8s,service=kubesvc * 16:22 herron@cumin2002: conftool action : set/pooled=yes; selector: name=aux-k8s-worker1004.eqiad.wmnet,cluster=aux-k8s,service=kubesvc * 16:22 herron@cumin2002: conftool action : set/weight=10; selector: name=aux-k8s-worker1004.eqiad.wmnet,cluster=aux-k8s,service=kubesvc * 16:09 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp4043.ulsfo.wmnet [reason: ATS fixed] * 16:08 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp4043.ulsfo.wmnet * 16:08 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for cp4043.ulsfo.wmnet * 16:06 sukhe@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on P<nowiki>{</nowiki>cp4051*<nowiki>}</nowiki> and A:cp for 9.2.6-1wm2 * 16:03 sukhe@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on P<nowiki>{</nowiki>cp4051*<nowiki>}</nowiki> and A:cp for 9.2.6-1wm2 * 16:00 sukhe: reprepro -C main include bullseye-wikimedia trafficserver_9.2.6-1wm2_amd64.changes: [[phab:T379797|T379797]] * 15:47 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on db2230.codfw.wmnet,db1125.eqiad.wmnet with reason: testing stuff on test-s4 * 15:47 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on db2230.codfw.wmnet,db1125.eqiad.wmnet with reason: testing stuff on test-s4 * 15:42 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from eqiad to codfw * 15:41 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.finalize for the switch from eqiad to codfw * 15:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from codfw to eqiad * 15:39 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.finalize for the switch from codfw to eqiad * 15:39 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from codfw to eqiad * 15:38 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-platform-eng: apply * 15:38 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from codfw to eqiad * 15:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-platform-eng: apply * 15:35 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 15:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 13:59 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:59 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove e8 lo0 IP - ayounsi@cumin1002" * 13:59 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove e8 lo0 IP - ayounsi@cumin1002" * 13:55 ayounsi@cumin1002: START - Cookbook sre.dns.netbox * 13:55 ayounsi@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 13:52 ayounsi@cumin1002: START - Cookbook sre.dns.netbox * 13:41 XioNoX: test no-passwords on mr1-eqsin - [[phab:T379464|T379464]] * 13:31 ayounsi@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts sretest1004.eqiad.wmnet * 13:31 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:31 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: sretest1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ayounsi@cumin1002" * 13:31 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: sretest1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ayounsi@cumin1002" * 13:27 ayounsi@cumin1002: START - Cookbook sre.dns.netbox * 13:24 cmooney@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1002.eqiad.wmnet with reason: Update homer wmf-plugin to export Netbox ipsec data - cmooney@cumin1002 * 13:23 ayounsi@cumin1002: START - Cookbook sre.hosts.decommission for hosts sretest1004.eqiad.wmnet * 13:21 cmooney@cumin1002: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1002.eqiad.wmnet with reason: Update homer wmf-plugin to export Netbox ipsec data - cmooney@cumin1002 * 13:19 cmooney@cumin1002: END (FAIL) - Cookbook sre.deploy.python-code (exit_code=99) homer to cumin2002.codfw.wmnet,cumin1002.eqiad.wmnet with reason: Update homer wmf-plugin to export Netbox ipsec data - cmooney@cumin1002 * 13:17 cmooney@cumin1002: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1002.eqiad.wmnet with reason: Update homer wmf-plugin to export Netbox ipsec data - cmooney@cumin1002 * 13:01 moritzm: imported 8u432-b06-2~deb12u1 to component/jdk8 for bookworm (forward port of the latest Java 8 security fixes for Bookworm) * 12:54 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host build2002.codfw.wmnet * 12:54 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host build2002.codfw.wmnet with OS bookworm * 12:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on build2002.codfw.wmnet with reason: host reimage * 12:32 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on build2002.codfw.wmnet with reason: host reimage * 12:27 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics: apply * 12:26 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics: apply * 12:19 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics: apply * 12:18 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 12:17 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host build2002.codfw.wmnet with OS bookworm * 12:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 12:16 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM build2002.codfw.wmnet - jmm@cumin2002" * 12:15 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM build2002.codfw.wmnet - jmm@cumin2002" * 12:15 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) build2002.codfw.wmnet on all recursors * 12:15 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache build2002.codfw.wmnet on all recursors * 12:15 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:15 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM build2002.codfw.wmnet - jmm@cumin2002" * 12:11 cmooney@cumin1002: END (FAIL) - Cookbook sre.netbox.update-extras (exit_code=1) rolling restart_daemons on A:netbox * 12:11 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM build2002.codfw.wmnet - jmm@cumin2002" * 12:08 aokoth@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Security Update * 12:03 jmm@cumin2002: START - Cookbook sre.dns.netbox * 12:03 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host build2002.codfw.wmnet * 12:01 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox * 12:01 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.resource-report (exit_code=0) * 12:01 jmm@cumin2002: START - Cookbook sre.ganeti.resource-report * 12:00 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary * 11:58 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary * 11:38 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@2c533d6]: hotfix image suggestions weekly snapshots (duration: 00m 57s) * 11:37 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@2c533d6]: hotfix image suggestions weekly snapshots * 11:27 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 11:24 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[1305-1312].eqiad.wmnet * 11:24 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[1305-1312].eqiad.wmnet * 11:22 claime: homer 'lsw1-f5-eqiad*' commit '[[phab:T377022|T377022]]' * 11:22 claime: homer 'lsw1-f6-eqiad*' commit '[[phab:T377022|T377022]]' * 11:22 elukey@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 11:21 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 11:21 claime: homer 'lsw1-f7-eqiad*' commit '[[phab:T377022|T377022]]' * 11:21 elukey@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 11:20 claime: homer 'lsw1-e7-eqiad*' commit '[[phab:T377022|T377022]]' * 11:20 claime: homer 'lsw1-e6-eqiad*' commit '[[phab:T377022|T377022]]' * 11:19 claime: homer 'lsw1-e5-eqiad*' commit '[[phab:T377022|T377022]]' * 11:15 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:14 elukey@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:12 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:12 elukey@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:06 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:06 elukey@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:05 claime: homer 'cr*eqiad*' commit '[[phab:T377022|T377022]]' * 10:36 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:36 elukey@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:36 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:34 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:34 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:31 elukey@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:28 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:28 elukey@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:28 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.provision (exit_code=97) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:27 elukey@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:23 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:23 elukey@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:22 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:21 elukey@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:15 aokoth@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Security Update * 08:48 moritzm: installing Linux 6.1.115 kernel updates from Bookworm point release * 04:54 rzl@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 12:00:00 on db1246.eqiad.wmnet with reason: depooled * 04:54 rzl@cumin2002: START - Cookbook sre.hosts.downtime for 3 days, 12:00:00 on db1246.eqiad.wmnet with reason: depooled * 04:51 rzl@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 12:00:00 on db1246.eqiad.wmnet with reason: depooled * 04:50 rzl@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 12:00:00 on db1246.eqiad.wmnet with reason: depooled * 04:47 rzl@cumin2002: dbctl commit (dc=all): 'db1246 depooled', diff saved to https://phabricator.wikimedia.org/P71052 and previous config saved to /var/cache/conftool/dbconfig/20241115-044705-rzl.json * 03:44 ejegg: fundraising python tools upgraded from {{Gerrit|c6e2dbcc}} to {{Gerrit|b230f718}} == 2024-11-14 == * 23:17 eileen: civicrm upgraded from {{Gerrit|2a53f697}} to {{Gerrit|d49a064d}} * 22:59 eileen: civicrm upgraded from {{Gerrit|2ab8334a}} to {{Gerrit|2a53f697}} * 22:37 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp4043.ulsfo.wmnet with reason: ATS upgrade 9.2.6 * 22:37 brett@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp4043.ulsfo.wmnet with reason: ATS upgrade 9.2.6 * 22:30 ryankemper: [[phab:T376150|T376150]] Depooled `wdqs20[18-20]` in preparation of merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/1088185 * 21:49 aqu@deploy2002: Finished deploy [airflow-dags/analytics@7a66849]: Stage Refine: fix Airflow skip (duration: 00m 59s) * 21:48 aqu@deploy2002: Started deploy [airflow-dags/analytics@7a66849]: Stage Refine: fix Airflow skip * 21:47 aqu@deploy2002: Finished deploy [airflow-dags/analytics_test@7a66849]: Stage Refine: fix Airflow skip (duration: 00m 14s) * 21:47 aqu@deploy2002: Started deploy [airflow-dags/analytics_test@7a66849]: Stage Refine: fix Airflow skip * 21:26 aqu@deploy2002: Finished deploy [airflow-dags/analytics_test@2220747]: Stage Refine test fix (duration: 00m 16s) * 21:26 aqu@deploy2002: Started deploy [airflow-dags/analytics_test@2220747]: Stage Refine test fix * 21:20 cjming: end of UTC late backport window * 21:17 cjming@deploy2002: Finished scap sync-world: Backport for [[gerrit:1082853{{!}}Redirect to wikis using subpages rather than namespaces too (T376923)]] (duration: 13m 44s) * 21:13 cjming@deploy2002: cjming, pppery: Continuing with sync * 21:08 cjming@deploy2002: cjming, pppery: Backport for [[gerrit:1082853{{!}}Redirect to wikis using subpages rather than namespaces too (T376923)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:04 cjming@deploy2002: Started scap sync-world: Backport for [[gerrit:1082853{{!}}Redirect to wikis using subpages rather than namespaces too (T376923)]] * 20:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2139.codfw.wmnet with OS bookworm * 20:47 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 20:38 bvibber@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 20:37 bvibber@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 20:37 bvibber@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 20:36 bvibber@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 20:35 bvibber@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 20:35 bvibber@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 20:29 swfrench@cumin2002: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) * 20:28 swfrench@cumin2002: START - Cookbook sre.discovery.datacenter * 20:24 bvibber@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 20:24 bvibber@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 20:24 bvibber@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 20:24 bvibber@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 20:23 bvibber@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 20:23 bvibber@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 20:23 swfrench@cumin2002: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all active/active services in eqiad: Network maintenance complete - None * 20:01 swfrench@cumin2002: START - Cookbook sre.discovery.datacenter pool all active/active services in eqiad: Network maintenance complete - None * 19:55 brennen@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.3 refs [[phab:T375662|T375662]] * 19:40 eileen: tools upgraded from {{Gerrit|68f64e43}} to {{Gerrit|c6e2dbcc}} * 19:37 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site eqiad [reason: junos upgrade done, [[phab:T364092|T364092]]] * 19:37 sukhe@cumin1002: START - Cookbook sre.dns.admin DNS admin: pool site eqiad [reason: junos upgrade done, [[phab:T364092|T364092]]] * 19:20 James_F: Running `mwscript-k8s -f -- extensions/WikiLambda/maintenance/updateSecondaryTables.php --wiki=wikifunctionswiki --zType Z8 --report --verbose` for [[phab:T375972|T375972]], [[phab:T367005|T367005]], [[phab:T373038|T373038]], [[phab:T358737|T358737]] * 19:19 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-restart-ntp (exit_code=0) rolling restart_daemons on A:dnsbox * 19:14 swfrench@cumin2002: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) * 19:14 swfrench@cumin2002: START - Cookbook sre.discovery.datacenter * 19:14 swfrench-wmf: running sre.discovery.datacenter status all to test deployed fix * 19:00 brennen: 1.44.0-wmf.3 train status ([[phab:T375662|T375662]]): no current blockers, but holding for network maintenance. * 18:20 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1312.eqiad.wmnet with OS bullseye * 18:19 swfrench@cumin2002: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) * 18:18 swfrench@cumin2002: START - Cookbook sre.discovery.datacenter * 18:16 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1310.eqiad.wmnet with OS bullseye * 18:13 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cp4043.ulsfo.wmnet with reason: depooled, debugging * 18:13 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on cp4043.ulsfo.wmnet with reason: depooled, debugging * 18:11 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:09 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1311.eqiad.wmnet with OS bullseye * 18:05 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1308.eqiad.wmnet with OS bullseye * 18:04 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1190 gradually with 4 steps - Maint over * 18:02 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1309.eqiad.wmnet with OS bullseye * 18:01 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1312.eqiad.wmnet with reason: host reimage * 17:59 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1307.eqiad.wmnet with OS bullseye * 17:57 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1310.eqiad.wmnet with reason: host reimage * 17:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2139.codfw.wmnet with reason: host reimage * 17:52 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1306.eqiad.wmnet with OS bullseye * 17:49 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1311.eqiad.wmnet with reason: host reimage * 17:46 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1308.eqiad.wmnet with reason: host reimage * 17:45 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1312.eqiad.wmnet with reason: host reimage * 17:45 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2139.codfw.wmnet with reason: host reimage * 17:44 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1311.eqiad.wmnet with reason: host reimage * 17:43 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1310.eqiad.wmnet with reason: host reimage * 17:42 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1309.eqiad.wmnet with reason: host reimage * 17:39 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1309.eqiad.wmnet with reason: host reimage * 17:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1307.eqiad.wmnet with reason: host reimage * 17:37 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1308.eqiad.wmnet with reason: host reimage * 17:37 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1307.eqiad.wmnet with reason: host reimage * 17:32 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1306.eqiad.wmnet with reason: host reimage * 17:29 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1306.eqiad.wmnet with reason: host reimage * 17:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2139.codfw.wmnet with OS bookworm * 17:26 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1312.eqiad.wmnet with OS bullseye * 17:25 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1311.eqiad.wmnet with OS bullseye * 17:25 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1310.eqiad.wmnet with OS bullseye * 17:24 swfrench@cumin2002: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) status all services in all: None - None * 17:24 swfrench@cumin2002: START - Cookbook sre.discovery.datacenter status all services in all: None - None * 17:21 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1309.eqiad.wmnet with OS bullseye * 17:19 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1308.eqiad.wmnet with OS bullseye * 17:19 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool db1190 gradually with 4 steps - Maint over * 17:18 swfrench@cumin2002: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) depool all active/active services in eqiad: Network maintenance - None * 17:18 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1307.eqiad.wmnet with OS bullseye * 17:15 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=4043.ulsfo.wmnet * 17:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:13 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 17:13 elukey@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 17:10 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1306.eqiad.wmnet with OS bullseye * 16:59 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1305.eqiad.wmnet with OS bullseye * 16:57 swfrench@cumin2002: START - Cookbook sre.discovery.datacenter depool all active/active services in eqiad: Network maintenance - None * 16:52 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@7c4873e]: decouple article-level image suggestions from section-level ones (duration: 00m 53s) * 16:51 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@7c4873e]: decouple article-level image suggestions from section-level ones * 16:45 swfrench@cumin2002: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) status all services in all: None - None * 16:45 swfrench@cumin2002: START - Cookbook sre.discovery.datacenter status all services in all: None - None * 16:40 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1305.eqiad.wmnet with reason: host reimage * 16:38 swfrench@cumin2002: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) * 16:37 swfrench@cumin2002: START - Cookbook sre.discovery.datacenter * 16:36 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1305.eqiad.wmnet with reason: host reimage * 16:36 swfrench@cumin2002: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) * 16:36 swfrench@cumin2002: START - Cookbook sre.discovery.datacenter * 16:33 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1190.eqiad.wmnet with reason: Sad * 16:33 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1190.eqiad.wmnet with reason: Sad * 16:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1190 sad', diff saved to https://phabricator.wikimedia.org/P71044 and previous config saved to /var/cache/conftool/dbconfig/20241114-163317-ladsgroup.json * 16:31 klausman@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'. * 16:31 klausman@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'. * 16:18 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1305.eqiad.wmnet with OS bullseye * 16:04 cmooney@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 151575 * 16:03 cmooney@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 151575 * 16:01 papaul: ongoing maintenance on cr1-eqiad * 16:00 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:57 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cr1-eqiad,cr1-eqiad IPV6,re0.cr1-eqiad.mgmt with reason: router upgrade * 15:57 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cr1-eqiad,cr1-eqiad IPV6,re0.cr1-eqiad.mgmt with reason: router upgrade * 15:56 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cp4043.ulsfo.wmnet with reason: depooled, debugging * 15:56 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on cp4043.ulsfo.wmnet with reason: depooled, debugging * 15:55 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cr1-eqiad,cr1-eqiad IPV6,cr1-eqiad.mgmt with reason: router upgrade * 15:55 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cr1-eqiad,cr1-eqiad IPV6,cr1-eqiad.mgmt with reason: router upgrade * 15:49 moritzm: installing nss security updates * 15:48 reedy@deploy2002: Synchronized wmf-config/CommonSettings.php: [[phab:T379834|T379834]] (duration: 08m 02s) * 15:47 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=cp4043.ulsfo.wmnet * 15:47 sukhe@cumin1002: END (ERROR) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=97) Rolling upgrade/restart of Apache Traffic Server on P<nowiki>{</nowiki>cp4043*,cp4051*<nowiki>}</nowiki> and A:cp for 9.2.6-1wm1 * 15:45 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wikikube-ctrl2002.codfw.wmnet * 15:45 jayme@cumin2002: START - Cookbook sre.hosts.remove-downtime for wikikube-ctrl2002.codfw.wmnet * 15:45 jayme@cumin2002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-ctrl2002.codfw.wmnet * 15:45 jayme@cumin2002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-ctrl2002.codfw.wmnet * 15:43 pt1979@cumin2002: END (PASS) - Cookbook sre.network.cf (exit_code=0) * 15:43 pt1979@cumin2002: START - Cookbook sre.network.cf * 15:42 sukhe@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on P<nowiki>{</nowiki>cp4043*,cp4051*<nowiki>}</nowiki> and A:cp for 9.2.6-1wm1 * 15:40 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1016.eqiad.wmnet with OS bullseye * 15:39 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1020.eqiad.wmnet with OS bullseye * 15:37 volans: installed spicerack v8.16.1 to cumin hosts * 15:36 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site eqiad [reason: junos upgrade, [[phab:T364092|T364092]]] * 15:36 sukhe@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site eqiad [reason: junos upgrade, [[phab:T364092|T364092]]] * 15:35 ladsgroup@deploy2002: Finished scap sync-world: Backport for [[gerrit:1091248{{!}}Revert "mmv.js: Store comingFromHashChange as a class property" (T379835)]] (duration: 12m 10s) * 15:33 sukhe: reprepro -C main include bullseye-wikimedia trafficserver_9.2.6-1wm1_amd64.changes: [[phab:T379797|T379797]] * 15:30 sukhe@cumin1002: START - Cookbook sre.dns.roll-restart-ntp rolling restart_daemons on A:dnsbox * 15:29 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-ctrl2002.codfw.wmnet with reason: [[phab:T379719|T379719]] * 15:29 jayme@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-ctrl2002.codfw.wmnet with reason: [[phab:T379719|T379719]] * 15:28 jayme@cumin2002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-ctrl2002.codfw.wmnet * 15:28 jayme@cumin2002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-ctrl2002.codfw.wmnet * 15:27 ladsgroup@deploy2002: ladsgroup: Continuing with sync * 15:27 ladsgroup@deploy2002: ladsgroup: Backport for [[gerrit:1091248{{!}}Revert "mmv.js: Store comingFromHashChange as a class property" (T379835)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:24 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 15:24 elukey@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 15:24 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-restart (exit_code=0) rolling restart_daemons on A:dnsbox and not A:magru and A:dnsbox * 15:23 ladsgroup@deploy2002: Started scap sync-world: Backport for [[gerrit:1091248{{!}}Revert "mmv.js: Store comingFromHashChange as a class property" (T379835)]] * 15:16 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-search: apply * 15:15 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-search: apply * 15:07 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 15:07 sergi0: UTC afternoon deploys done * 15:06 sgimeno@deploy2002: Finished scap sync-world: Backport for [[gerrit:1091231{{!}}HomepageHooks: run metrics increment in deferred update (T379682)]] (duration: 11m 15s) * 15:02 elukey@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 15:02 sgimeno@deploy2002: sgimeno: Continuing with sync * 14:59 sgimeno@deploy2002: sgimeno: Backport for [[gerrit:1091231{{!}}HomepageHooks: run metrics increment in deferred update (T379682)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:55 sgimeno@deploy2002: Started scap sync-world: Backport for [[gerrit:1091231{{!}}HomepageHooks: run metrics increment in deferred update (T379682)]] * 14:53 volans: uploaded spicerack_8.16.1 to apt.wikimedia.org bullseye-wikimedia * 14:50 sgimeno@deploy2002: Finished scap sync-world: Backport for [[gerrit:1090830{{!}}GrowthExperiments: set experiment config only in pilot wikis (T379681)]] (duration: 13m 02s) * 14:45 sgimeno@deploy2002: sgimeno: Continuing with sync * 14:41 sgimeno@deploy2002: sgimeno: Backport for [[gerrit:1090830{{!}}GrowthExperiments: set experiment config only in pilot wikis (T379681)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:37 sgimeno@deploy2002: Started scap sync-world: Backport for [[gerrit:1090830{{!}}GrowthExperiments: set experiment config only in pilot wikis (T379681)]] * 14:33 sukhe@cumin1002: START - Cookbook sre.dns.roll-restart rolling restart_daemons on A:dnsbox and not A:magru and A:dnsbox * 14:30 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-restart (exit_code=0) rolling restart_daemons on A:dnsbox and A:magru and A:dnsbox * 14:27 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1091227{{!}}CX3 Build 0.2.0+20241114]] (duration: 13m 23s) * 14:25 sukhe@cumin1002: START - Cookbook sre.dns.roll-restart rolling restart_daemons on A:dnsbox and A:magru and A:dnsbox * 14:22 kartik@deploy2002: kartik: Continuing with sync * 14:18 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns (exit_code=0) rolling restart_daemons on A:wikidough and A:wikidough * 14:17 kartik@deploy2002: kartik: Backport for [[gerrit:1091227{{!}}CX3 Build 0.2.0+20241114]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:13 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1091227{{!}}CX3 Build 0.2.0+20241114]] * 14:05 sukhe@cumin1002: START - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns rolling restart_daemons on A:wikidough and A:wikidough * 13:50 aqu@deploy2002: Finished deploy [airflow-dags/analytics@2220747]: Stage Refine parallelization improvment [airflow-dags@2220747d] (duration: 01m 08s) * 13:49 aqu@deploy2002: Started deploy [airflow-dags/analytics@2220747]: Stage Refine parallelization improvment [airflow-dags@2220747d] * 13:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti7004.magru.wmnet * 13:36 aqu@deploy2002: Finished deploy [airflow-dags/analytics_test@2220747]: Stage Refine parallelization improvment [airflow-dags@2220747d] (duration: 00m 15s) * 13:36 aqu@deploy2002: Started deploy [airflow-dags/analytics_test@2220747]: Stage Refine parallelization improvment [airflow-dags@2220747d] * 13:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti7004.magru.wmnet * 13:21 kcvelaga@deploy2002: Finished deploy [airflow-dags/analytics_product@c5ab766]: [[phab:T379546|T379546]] (duration: 00m 54s) * 13:21 kcvelaga@deploy2002: Started deploy [airflow-dags/analytics_product@c5ab766]: [[phab:T379546|T379546]] * 13:19 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Fix search button height - oblivian@cumin1002" * 13:18 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix search button height - oblivian@cumin1002 * 13:18 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix search button height - oblivian@cumin1002 * 13:18 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Fix search button height - oblivian@cumin1002" * 13:05 jayme@cumin2002: END (PASS) - Cookbook sre.k8s.reimage-stacked-control-plane (exit_code=0) Reimaging k8s control planes of cluster wikikube-codfw: containerd migration * 13:04 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-ctrl2003.codfw.wmnet with OS bookworm * 12:54 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas (exit_code=0) rolling restart_daemons on A:schema-eqiad * 12:53 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas rolling restart_daemons on A:schema-eqiad * 12:53 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti7004.magru.wmnet * 12:52 moritzm: installing apache2 security updates * 12:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti7004.magru.wmnet * 12:51 dreamyjazz@deploy2002: Finished scap sync-world: Backport for [[gerrit:1090511{{!}}Hide IP reveal tools on Special:AbuseLog and Special:GlobalBlockList (T379583)]] (duration: 09m 08s) * 12:49 moritzm: failover ganeti master of magru02 to ganeti7002 * 12:46 dreamyjazz@deploy2002: dreamyjazz: Continuing with sync * 12:45 dreamyjazz@deploy2002: dreamyjazz: Backport for [[gerrit:1090511{{!}}Hide IP reveal tools on Special:AbuseLog and Special:GlobalBlockList (T379583)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 12:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti7002.magru.wmnet * 12:42 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-ctrl2003.codfw.wmnet with reason: host reimage * 12:41 dreamyjazz@deploy2002: Started scap sync-world: Backport for [[gerrit:1090511{{!}}Hide IP reveal tools on Special:AbuseLog and Special:GlobalBlockList (T379583)]] * 12:38 jayme@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-ctrl2003.codfw.wmnet with reason: host reimage * 12:35 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti7002.magru.wmnet * 12:29 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti7002.magru.wmnet * 12:25 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti7002.magru.wmnet * 12:22 jayme@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2003.codfw.wmnet with OS bookworm * 12:19 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas (exit_code=0) rolling restart_daemons on A:schema-codfw * 12:18 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas rolling restart_daemons on A:schema-codfw * 12:17 jayme@cumin2002: START - Cookbook sre.k8s.reimage-stacked-control-plane Reimaging k8s control planes of cluster wikikube-codfw: containerd migration * 12:10 jmm@cumin2002: END (PASS) - Cookbook sre.cdn.roll-restart-reboot-ncredir (exit_code=0) rolling restart_daemons on A:ncredir * 12:00 jmm@cumin2002: START - Cookbook sre.cdn.roll-restart-reboot-ncredir rolling restart_daemons on A:ncredir * 11:57 moritzm: restarting postfix on inbound/outbound servers to pick up openssl updates * 11:17 moritzm: installing openssl security updates * 11:08 jayme@cumin2002: END (PASS) - Cookbook sre.k8s.reimage-stacked-control-plane (exit_code=0) Reimaging k8s control planes of cluster wikikube-codfw: containerd migration * 11:08 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-ctrl2001.codfw.wmnet with OS bookworm * 10:47 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production * 10:45 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-ctrl2001.codfw.wmnet with reason: host reimage * 10:44 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production * 10:42 jayme@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-ctrl2001.codfw.wmnet with reason: host reimage * 10:16 moritzm: remove ganeti2017 from active ganeti nodes [[phab:T376594|T376594]] * 10:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2017.codfw.wmnet * 10:11 jayme@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2001.codfw.wmnet with OS bookworm * 10:07 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@34b35a5] (releasing): (no justification provided) (duration: 00m 47s) * 10:06 jayme@cumin2002: START - Cookbook sre.k8s.reimage-stacked-control-plane Reimaging k8s control planes of cluster wikikube-codfw: containerd migration * 10:06 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@34b35a5] (releasing): (no justification provided) * 10:03 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@34b35a5] (releasing): (no justification provided) (duration: 00m 21s) * 10:03 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@34b35a5] (releasing): (no justification provided) * 09:43 kart_: Done: UTC morning backport window * 09:37 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1090988{{!}}Correction to virtual-globaljsonlinks mapping (T374746)]] (duration: 10m 03s) * 09:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 09:36 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 09:35 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 09:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 09:32 kartik@deploy2002: bvibber, kartik: Continuing with sync * 09:31 kartik@deploy2002: bvibber, kartik: Backport for [[gerrit:1090988{{!}}Correction to virtual-globaljsonlinks mapping (T374746)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 09:27 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1090988{{!}}Correction to virtual-globaljsonlinks mapping (T374746)]] * 09:25 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1091007{{!}}CX3 Build 0.2.0+20241113 (T368718 T374567)]] (duration: 29m 40s) * 09:21 kartik@deploy2002: kartik: Continuing with sync * 09:17 volans: installed spicerack v8.16.0 on cumin2002 * 09:08 vgutierrez@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P<nowiki>{</nowiki>cp4044.ulsfo.wmnet,cp4052.ulsfo.wmnet<nowiki>}</nowiki> and A:cp * 09:04 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P<nowiki>{</nowiki>cp4044.ulsfo.wmnet,cp4052.ulsfo.wmnet<nowiki>}</nowiki> and A:cp * 09:00 kartik@deploy2002: kartik: Backport for [[gerrit:1091007{{!}}CX3 Build 0.2.0+20241113 (T368718 T374567)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:56 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1091007{{!}}CX3 Build 0.2.0+20241113 (T368718 T374567)]] * 08:55 vgutierrez: import haproxy 2.8.12 to thirtdparty/haproxy28 component for bullseye-wikimedia (apt.wm.o) - [[phab:T379891|T379891]] * 08:54 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1090937{{!}}Allow Wikidata bureaucrats to remove admin rights (T379635)]] (duration: 11m 49s) * 08:49 kartik@deploy2002: dreamrimmer, kartik: Continuing with sync * 08:47 kartik@deploy2002: dreamrimmer, kartik: Backport for [[gerrit:1090937{{!}}Allow Wikidata bureaucrats to remove admin rights (T379635)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:42 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1090937{{!}}Allow Wikidata bureaucrats to remove admin rights (T379635)]] * 08:38 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 26744 * 08:37 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 26744 * 08:35 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 141082 * 08:35 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 141082 * 08:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 9299 * 08:33 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 9299 * 08:33 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 140407 * 08:33 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 140407 * 08:28 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084704{{!}}Update stream registration and config for MinT for Readers (T378565)]] (duration: 24m 50s) * 08:23 kartik@deploy2002: kcvelaga, kartik: Continuing with sync * 08:08 kartik@deploy2002: kcvelaga, kartik: Backport for [[gerrit:1084704{{!}}Update stream registration and config for MinT for Readers (T378565)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:03 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1084704{{!}}Update stream registration and config for MinT for Readers (T378565)]] * 07:42 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2017.codfw.wmnet * 07:41 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2017.codfw.wmnet * 07:34 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2017.codfw.wmnet * 07:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove office link dns records - ayounsi@cumin1002" * 07:34 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove office link dns records - ayounsi@cumin1002" * 07:30 ayounsi@cumin1002: START - Cookbook sre.dns.netbox * 07:06 XioNoX: delete office interco IP/prefixes/vlan in ulsfo - [[phab:T379778|T379778]] * 04:34 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 04:11 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 04:09 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 03:56 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 02:32 eileen: config revision changed from {{Gerrit|7af5769b}} to {{Gerrit|fbddc1f5}} * 02:29 eileen: civicrm upgraded from {{Gerrit|7b300007}} to {{Gerrit|2ab8334a}} * 00:14 eileen: config revision changed from {{Gerrit|2b08b881}} to {{Gerrit|7af5769b}} * 00:13 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es1046.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:13 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:12 eileen: civicrm upgraded from {{Gerrit|23e08fc2}} to {{Gerrit|7b300007}} * 00:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es1043.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es1042.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es1045.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es1041.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2024-11-13 == * 23:45 jclark@cumin1002: START - Cookbook sre.hosts.provision for host es1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:43 jclark@cumin1002: START - Cookbook sre.hosts.provision for host es1042.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:43 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host es1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:43 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host es1042.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:42 jclark@cumin1002: START - Cookbook sre.hosts.provision for host es1046.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:42 jclark@cumin1002: START - Cookbook sre.hosts.provision for host es1045.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:42 jclark@cumin1002: START - Cookbook sre.hosts.provision for host es1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:42 jclark@cumin1002: START - Cookbook sre.hosts.provision for host es1043.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:42 jclark@cumin1002: START - Cookbook sre.hosts.provision for host es1042.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:42 jclark@cumin1002: START - Cookbook sre.hosts.provision for host es1041.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for es104 - jclark@cumin1002" * 23:41 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for es104 - jclark@cumin1002" * 23:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs1027.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs1026.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs1025.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:37 jclark@cumin1002: START - Cookbook sre.dns.netbox * 23:20 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bookworm * 23:04 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:04 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for wikikube-worker - jclark@cumin1002" * 23:04 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for wikikube-worker - jclark@cumin1002" * 22:59 jclark@cumin1002: START - Cookbook sre.dns.netbox * 22:58 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wdqs1025.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:58 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wdqs1026.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:58 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wdqs1027.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:57 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:55 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:30 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:25 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 22:21 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 22:20 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 22:20 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 22:19 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 22:18 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 22:17 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 22:14 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:11 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 22:11 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 22:10 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 22:10 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 22:09 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:04 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 22:03 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 22:00 tchanders@deploy2002: Finished scap sync-world: Backport for [[gerrit:1090965{{!}}Revert "Disallow AbuseFilter protected variables use on non-temp-user wikis" (T379503)]] (duration: 09m 03s) * 21:55 tchanders@deploy2002: tchanders: Continuing with sync * 21:55 tchanders@deploy2002: tchanders: Backport for [[gerrit:1090965{{!}}Revert "Disallow AbuseFilter protected variables use on non-temp-user wikis" (T379503)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:51 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1090965{{!}}Revert "Disallow AbuseFilter protected variables use on non-temp-user wikis" (T379503)]] * 21:48 cjming@deploy2002: Finished scap sync-world: Backport for [[gerrit:1090953{{!}}Enable autocreateaccount on testcommonswiki (T378216)]] (duration: 12m 59s) * 21:44 cjming@deploy2002: aude, cjming: Continuing with sync * 21:40 cjming@deploy2002: aude, cjming: Backport for [[gerrit:1090953{{!}}Enable autocreateaccount on testcommonswiki (T378216)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:36 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bookworm * 21:36 cjming@deploy2002: Started scap sync-world: Backport for [[gerrit:1090953{{!}}Enable autocreateaccount on testcommonswiki (T378216)]] * 21:34 cjming@deploy2002: Finished scap sync-world: Backport for [[gerrit:1090928{{!}}GlobalJsonLinksCachePurgeJob to actually invalidate caches (T374746)]] (duration: 13m 27s) * 21:27 cjming@deploy2002: cjming, bvibber: Continuing with sync * 21:27 cjming@deploy2002: cjming, bvibber: Backport for [[gerrit:1090928{{!}}GlobalJsonLinksCachePurgeJob to actually invalidate caches (T374746)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:21 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:21 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 21:20 cjming@deploy2002: Started scap sync-world: Backport for [[gerrit:1090928{{!}}GlobalJsonLinksCachePurgeJob to actually invalidate caches (T374746)]] * 21:19 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 21:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 21:15 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 21:09 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:09 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:09 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:09 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:07 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:07 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:07 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host thanos-be2005 * 21:07 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host thanos-be2005 * 21:05 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 21:05 jclark@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 21:01 aqu@deploy2002: Finished deploy [airflow-dags/analytics@3487da3]: Stage Refine [airflow-dags@3487da3a] (duration: 01m 22s) * 21:00 aqu@deploy2002: Started deploy [airflow-dags/analytics@3487da3]: Stage Refine [airflow-dags@3487da3a] * 20:56 aqu@deploy2002: Finished deploy [airflow-dags/analytics@3fc12d6]: Stage Refine [airflow-dags@3fc12d60] (duration: 01m 14s) * 20:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 20:56 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 20:55 aqu@deploy2002: Started deploy [airflow-dags/analytics@3fc12d6]: Stage Refine [airflow-dags@3fc12d60] * 20:49 cdanis@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply * 20:49 cdanis@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply * 20:48 swfrench-wmf: deployed changeprop to clear no-op chart version diffs from CR {{Gerrit|1089313}} * 20:47 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply * 20:47 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply * 20:46 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 20:39 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bookworm * 20:37 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply * 20:37 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply * 20:36 cdanis@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply * 20:36 cdanis@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply * 20:35 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply * 20:34 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply * 20:34 aqu@deploy2002: Finished deploy [airflow-dags/analytics_test@3fc12d6]: Stage Refine [airflow-dags@3fc12d60] (duration: 00m 15s) * 20:34 aqu@deploy2002: Started deploy [airflow-dags/analytics_test@3fc12d6]: Stage Refine [airflow-dags@3fc12d60] * 20:31 cdanis@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply * 20:31 cdanis@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply * 20:28 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 20:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 20:16 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 20:14 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 20:02 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 20:02 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 19:59 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 19:59 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 19:59 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host thanos-be2005 * 19:59 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host thanos-be2005 * 19:58 cdanis@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply * 19:58 cdanis@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply * 19:58 brennen@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.3 refs [[phab:T375662|T375662]] (duration: 31m 07s) * 19:57 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 19:55 cdanis@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply * 19:55 cdanis@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply * 19:52 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 19:51 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:51 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding thanos-be2005 to codfw - jhancock@cumin2002" * 19:51 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding thanos-be2005 to codfw - jhancock@cumin2002" * 19:47 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 19:47 cdanis@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply * 19:46 cdanis@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply * 19:44 aokoth@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Security Update * 19:37 aokoth@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Security Update * 19:36 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 19:35 aokoth@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Security Update * 19:27 brennen@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.3 refs [[phab:T375662|T375662]] * 19:26 brennen@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.3 refs [[phab:T375662|T375662]] * 19:21 aokoth@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Security Update * 19:13 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host thanos-be1005.eqiad.wmnet with OS bullseye * 19:11 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 19:10 jclark@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 19:10 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 19:10 jclark@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 19:09 brennen: 1.44.0-wmf.3 train status ([[phab:T375662|T375662]]): no current blockers, rolling to group1. * 19:08 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/hdfs-synchronizer: apply * 19:03 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 19:03 jclark@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 19:02 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 19:02 jclark@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 19:01 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 19:01 jclark@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 19:00 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:00 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for thanos-be1005 - jclark@cumin1002" * 19:00 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for thanos-be1005 - jclark@cumin1002" * 18:58 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/hdfs-synchronizer: apply * 18:56 jclark@cumin1002: START - Cookbook sre.dns.netbox * 18:50 swfrench@deploy2002: Finished scap sync-world: Deployment to switch mwdebug-next to publish-81 - [[phab:T372604|T372604]] (duration: 01m 53s) * 18:48 swfrench@deploy2002: Started scap sync-world: Deployment to switch mwdebug-next to publish-81 - [[phab:T372604|T372604]] * 18:36 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply * 18:33 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply * 18:32 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply * 18:30 cdanis@deploy2002: Finished deploy [docker-pkg/deploy@3499887]: I really hope this works this time (duration: 00m 34s) * 18:29 cdanis@deploy2002: Started deploy [docker-pkg/deploy@3499887]: I really hope this works this time * 18:29 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply * 18:26 cdanis@deploy2002: Finished deploy [docker-pkg/deploy@9d71ac3]: (no justification provided) (duration: 00m 18s) * 18:26 cdanis@deploy2002: Started deploy [docker-pkg/deploy@9d71ac3]: (no justification provided) * 18:22 cdanis@deploy2002: Finished deploy [docker-pkg/deploy@9d71ac3]: (no justification provided) (duration: 00m 40s) * 18:21 cdanis@deploy2002: Started deploy [docker-pkg/deploy@9d71ac3]: (no justification provided) * 18:21 cdanis@deploy2002: Finished deploy [docker-pkg/deploy@9d71ac3]: deploy 4.0.2 for realsies (duration: 02m 41s) * 18:18 cdanis@deploy2002: Started deploy [docker-pkg/deploy@9d71ac3]: deploy 4.0.2 for realsies * 18:13 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 18:13 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 3:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 18:11 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 17:54 urbanecm: mwmaint2002: foreachwikiindblist growthexperiments extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --search-index --verbose --random # [[phab:T379057|T379057]] * 17:49 cdanis@deploy2002: Finished deploy [docker-pkg/deploy@38eb04d]: ship upstream_version helper (duration: 00m 32s) * 17:49 cdanis@deploy2002: Started deploy [docker-pkg/deploy@38eb04d]: ship upstream_version helper * 17:49 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:47 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:46 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:45 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 17:40 jayme@cumin1002: conftool action : set/pooled=yes; selector: name=wikikube-ctrl2002.codfw.wmnet * 17:39 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wikikube-ctrl2002.codfw.wmnet * 17:39 jayme@cumin2002: START - Cookbook sre.hosts.remove-downtime for wikikube-ctrl2002.codfw.wmnet * 17:38 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-ctrl2002.codfw.wmnet with OS bookworm * 17:37 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:35 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 17:33 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:32 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 17:23 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2128-2135].codfw.wmnet * 17:23 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2128-2135].codfw.wmnet * 17:20 claime: homer 'lsw1-d2-codfw*' commit '[[phab:T377008|T377008]]' * 17:18 claime: homer 'lsw1-c2-codfw*' commit '[[phab:T377008|T377008]]' * 17:18 claime: homer 'lsw1-d4-codfw*' commit '[[phab:T377008|T377008]]' * 17:17 claime: homer 'lsw1-c4-codfw*' commit '[[phab:T377008|T377008]]' * 17:15 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:14 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-ctrl2002.codfw.wmnet with reason: host reimage * 17:11 jayme@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-ctrl2002.codfw.wmnet with reason: host reimage * 17:03 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bullseye * 17:02 claime: homer 'cr*codfw*' commit [[phab:T377008|T377008]] * 17:01 claime: homer 'lsw1-b4-codfw*' commit [[phab:T377008|T377008]] * 17:01 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 16:58 claime: homer 'lsw1-b2-codfw*' commit [[phab:T377008|T377008]] * 16:53 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthbook: apply * 16:53 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-ctrl2002 * 16:53 jayme@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-ctrl2002 * 16:53 jayme@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-ctrl2002 * 16:53 jayme@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-ctrl2002.codfw.wmnet 76.32.192.10.in-addr.arpa 6.7.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 16:53 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply * 16:53 jayme@cumin2002: START - Cookbook sre.dns.wipe-cache wikikube-ctrl2002.codfw.wmnet 76.32.192.10.in-addr.arpa 6.7.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 16:53 jayme@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:53 jayme@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-ctrl2002 - jayme@cumin2002" * 16:53 jayme@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-ctrl2002 - jayme@cumin2002" * 16:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2135.codfw.wmnet with OS bookworm * 16:49 jayme@cumin2002: START - Cookbook sre.dns.netbox * 16:48 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2134.codfw.wmnet with OS bookworm * 16:47 jayme@cumin2002: START - Cookbook sre.hosts.move-vlan for host wikikube-ctrl2002 * 16:47 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthbook: apply * 16:47 jayme@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2002.codfw.wmnet with OS bookworm * 16:47 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply * 16:41 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on wikikube-ctrl2002.codfw.wmnet with reason: reimage * 16:40 jayme@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wikikube-ctrl2002.codfw.wmnet with reason: reimage * 16:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti7003.magru.wmnet * 16:31 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 16:31 jayme@cumin2002: conftool action : set/pooled=inactive; selector: name=wikikube-ctrl2002.codfw.wmnet * 16:30 elukey: reload nginx on registry* to pick up logging changes (log of X-Client-IP from the CDN) * 16:30 XioNoX: shutdown old office link interface - [[phab:T379778|T379778]] * 16:29 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2133.codfw.wmnet with OS bookworm * 16:29 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 16:29 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti7003.magru.wmnet * 16:26 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 16:25 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 16:24 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2132.codfw.wmnet with OS bookworm * 16:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti7003.magru.wmnet * 16:14 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti7003.magru.wmnet * 16:08 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 16:08 sukhe: running agent on A:ulsfo and A:lvs * 16:07 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2135.codfw.wmnet with OS bookworm * 16:06 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2134.codfw.wmnet with OS bookworm * 16:05 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 16:04 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 16:02 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 15:56 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2131.codfw.wmnet with OS bookworm * 15:53 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2130.codfw.wmnet with OS bookworm * 15:47 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 15:47 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 15:45 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/hdfs-synchronizer: apply * 15:45 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2133.codfw.wmnet with OS bookworm * 15:42 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2132.codfw.wmnet with OS bookworm * 15:37 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2129.codfw.wmnet with OS bookworm * 15:37 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 15:36 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:35 moritzm: failover ganeti master of magru01 to ganeti7001 * 15:34 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 15:33 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 15:33 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 15:33 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:30 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 15:30 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:30 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for IPs moving from old to new fundraising firewalls - cmooney@cumin1002" * 15:30 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for IPs moving from old to new fundraising firewalls - cmooney@cumin1002" * 15:30 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 15:26 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 15:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti7001.magru.wmnet * 15:18 moritzm: installing apache2 security updates * 15:18 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 15:15 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2131.codfw.wmnet with OS bookworm * 15:15 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 15:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti7001.magru.wmnet * 15:14 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2128.codfw.wmnet with OS bookworm * 15:12 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2130.codfw.wmnet with OS bookworm * 14:59 volans: uploaded spicerack_8.16.0 to apt.wikimedia.org bullseye-wikimedia * 14:57 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2129.codfw.wmnet with OS bookworm * 14:56 aqu@deploy2002: Finished deploy [airflow-dags/analytics_test@2eb8320]: Stage Refine [airflow-dags@2eb8320d] (duration: 00m 14s) * 14:55 aqu@deploy2002: Started deploy [airflow-dags/analytics_test@2eb8320]: Stage Refine [airflow-dags@2eb8320d] * 14:55 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:51 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:51 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti7001.magru.wmnet * 14:50 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti7001.magru.wmnet * 14:37 moritzm: installing openssl security updates * 14:36 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2131.codfw.wmnet with OS bookworm * 14:36 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2130.codfw.wmnet with OS bookworm * 14:35 Lucas_WMDE: UTC afternoon backport+config window done * 14:33 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2128.codfw.wmnet with OS bookworm * 14:32 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1090526{{!}}TimedMediahandler: reenable shellbox-video for commons (T356241)]] (duration: 07m 28s) * 14:30 btullis@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-jumbo-eqiad * 14:27 lucaswerkmeister-wmde@deploy2002: hnowlan, lucaswerkmeister-wmde: Continuing with sync * 14:27 lucaswerkmeister-wmde@deploy2002: hnowlan, lucaswerkmeister-wmde: Backport for [[gerrit:1090526{{!}}TimedMediahandler: reenable shellbox-video for commons (T356241)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:26 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-research: apply * 14:25 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-research: apply * 14:24 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1090526{{!}}TimedMediahandler: reenable shellbox-video for commons (T356241)]] * 14:21 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-research: apply * 14:21 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-research: apply * 14:15 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2128.codfw.wmnet with OS bookworm * 14:14 tchanders@deploy2002: Finished scap sync-world: Backport for [[gerrit:1090515{{!}}Disallow AbuseFilter protected variables use on non-temp-user wikis (T379503)]] (duration: 11m 28s) * 14:12 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-research: apply * 14:10 tchanders@deploy2002: tchanders: Continuing with sync * 14:09 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-research: apply * 14:07 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/ipoid: apply * 14:07 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1052.eqiad.wmnet to cluster eqiad and group D * 14:07 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/ipoid: apply * 14:06 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1052.eqiad.wmnet to cluster eqiad and group D * 14:06 tchanders@deploy2002: tchanders: Backport for [[gerrit:1090515{{!}}Disallow AbuseFilter protected variables use on non-temp-user wikis (T379503)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:03 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1090515{{!}}Disallow AbuseFilter protected variables use on non-temp-user wikis (T379503)]] * 14:03 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/ipoid: apply * 14:02 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/ipoid: apply * 14:01 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply * 14:01 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply * 14:00 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wmde: apply * 13:59 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wmde: apply * 13:32 btullis@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-jumbo-eqiad * 13:21 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wmde: apply * 13:20 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 13:18 moritzm: installing python-cryptography security updates * 13:18 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wmde: apply * 13:18 btullis@cumin1002: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) restart masters for Hadoop test cluster: Restart of jvm daemons. * 13:17 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 13:14 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wmde: apply * 13:13 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wmde: apply * 13:12 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 13:11 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 13:09 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 13:08 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 13:08 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 13:07 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 13:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 13:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 13:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:03 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 12:59 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2129.codfw.wmnet with OS bookworm * 12:56 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 12:56 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 12:55 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2128.codfw.wmnet with OS bookworm * 12:54 cgoubert@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-worker2128.codfw.wmnet with OS bookworm * 12:45 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2128.codfw.wmnet with OS bookworm * 12:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1022 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P71030 and previous config saved to /var/cache/conftool/dbconfig/20241113-124504-ladsgroup.json * 12:44 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2128.codfw.wmnet with OS bookworm * 12:33 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1051.eqiad.wmnet to cluster eqiad and group D * 12:32 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2131.codfw.wmnet with OS bookworm * 12:32 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1051.eqiad.wmnet to cluster eqiad and group D * 12:31 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wmde: apply * 12:31 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2130.codfw.wmnet with OS bookworm * 12:30 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wmde: apply * 12:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1022', diff saved to https://phabricator.wikimedia.org/P71029 and previous config saved to /var/cache/conftool/dbconfig/20241113-122957-ladsgroup.json * 12:29 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2129.codfw.wmnet with OS bookworm * 12:29 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp5017.eqsin.wmnet * 12:28 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2128.codfw.wmnet with OS bookworm * 12:28 btullis@cumin1002: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid test cluster: Roll restart of Druid jvm daemons. * 12:18 btullis@cumin1002: START - Cookbook sre.druid.roll-restart-workers for Druid test cluster: Roll restart of Druid jvm daemons. * 12:15 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/zotero: apply * 12:15 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/zotero: apply * 12:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1022', diff saved to https://phabricator.wikimedia.org/P71028 and previous config saved to /var/cache/conftool/dbconfig/20241113-121450-ladsgroup.json * 12:14 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/zotero: apply * 12:14 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/zotero: apply * 12:13 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/zotero: apply * 12:13 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/zotero: apply * 12:11 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/zotero: apply * 12:11 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/zotero: apply * 12:06 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 12:06 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply * 12:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1052.eqiad.wmnet * 12:03 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 12:03 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply * 12:01 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply * 11:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1022 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P71027 and previous config saved to /var/cache/conftool/dbconfig/20241113-115943-ladsgroup.json * 11:57 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/ipoid: apply * 11:57 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/ipoid: apply * 11:57 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply * 11:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1052.eqiad.wmnet * 11:57 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply * 11:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1051.eqiad.wmnet * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1052 * 11:54 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1052 * 11:52 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wmde: apply * 11:51 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wmde: apply * 11:51 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 11:50 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 11:49 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 11:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1022 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P71026 and previous config saved to /var/cache/conftool/dbconfig/20241113-114913-ladsgroup.json * 11:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1051.eqiad.wmnet * 11:49 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1022.eqiad.wmnet with reason: Maintenance * 11:48 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1022.eqiad.wmnet with reason: Maintenance * 11:48 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1051 * 11:46 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 11:46 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1051 * 11:45 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 11:41 jayme@cumin2002: END (PASS) - Cookbook sre.k8s.reimage-stacked-control-plane (exit_code=0) Reimaging k8s control planes of cluster wikikube-eqiad: containerd migration * 11:41 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-ctrl1003.eqiad.wmnet with OS bookworm * 11:34 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 11:34 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 11:26 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on wikikube-worker1256.eqiad.wmnet with reason: Degraded RAID * 11:26 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on wikikube-worker1256.eqiad.wmnet with reason: Degraded RAID * 11:25 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker1256.eqiad.wmnet * 11:25 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker1256.eqiad.wmnet * 11:19 btullis@cumin1002: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid test cluster: Roll restart of Druid jvm daemons. * 11:18 btullis@cumin1002: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop test cluster: Restart of jvm daemons. * 11:17 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-ctrl1003.eqiad.wmnet with reason: host reimage * 11:14 jayme@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-ctrl1003.eqiad.wmnet with reason: host reimage * 11:10 btullis@cumin1002: START - Cookbook sre.druid.roll-restart-workers for Druid test cluster: Roll restart of Druid jvm daemons. * 11:09 btullis@cumin1002: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid public cluster: Roll restart of Druid jvm daemons. * 10:42 ladsgroup@deploy2002: Finished scap sync-world: Backport for [[gerrit:1090809{{!}}Set the ratio of the new ParserCache keys to 100 for prod (T373037)]] (duration: 07m 32s) * 10:37 ladsgroup@deploy2002: ladsgroup: Continuing with sync * 10:36 ladsgroup@deploy2002: ladsgroup: Backport for [[gerrit:1090809{{!}}Set the ratio of the new ParserCache keys to 100 for prod (T373037)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 10:35 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet * 10:34 ladsgroup@deploy2002: Started scap sync-world: Backport for [[gerrit:1090809{{!}}Set the ratio of the new ParserCache keys to 100 for prod (T373037)]] * 10:32 btullis@cumin1002: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0) restart workers for Hadoop test cluster: Roll restart of jvm daemons for openjdk upgrade. * 10:27 jayme@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1003.eqiad.wmnet with OS bookworm * 10:26 ladsgroup@deploy2002: ladsgroup: Continuing with sync * 10:26 jayme@cumin2002: START - Cookbook sre.k8s.reimage-stacked-control-plane Reimaging k8s control planes of cluster wikikube-eqiad: containerd migration * 10:24 jayme@cumin2002: END (PASS) - Cookbook sre.k8s.reimage-stacked-control-plane (exit_code=0) Reimaging k8s control planes of cluster wikikube-eqiad: containerd migration * 10:24 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-ctrl1002.eqiad.wmnet with OS bookworm * 10:21 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet * 10:20 btullis@cumin1002: START - Cookbook sre.hadoop.roll-restart-workers restart workers for Hadoop test cluster: Roll restart of jvm daemons for openjdk upgrade. * 10:20 ladsgroup@deploy2002: ladsgroup: Backport for [[gerrit:1090809{{!}}Set the ratio of the new ParserCache keys to 100 for prod (T373037)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 10:18 btullis@cumin1002: START - Cookbook sre.druid.roll-restart-workers for Druid public cluster: Roll restart of Druid jvm daemons. * 10:17 ladsgroup@deploy2002: Started scap sync-world: Backport for [[gerrit:1090809{{!}}Set the ratio of the new ParserCache keys to 100 for prod (T373037)]] * 10:09 elukey: disallow calls to /v2/_catalog from the outside internet on Docker Registry hosts - [[phab:T378618|T378618]] * 10:04 claime: Manual restart of dump_cloud_ip_ranges.service on 'A:puppetserver or A:puppetmaster' * 10:01 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-ctrl1002.eqiad.wmnet with reason: host reimage * 10:01 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2088.codfw.wmnet with OS bullseye * 10:00 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 10:00 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 09:55 jayme@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-ctrl1002.eqiad.wmnet with reason: host reimage * 09:41 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2088.codfw.wmnet with reason: host reimage * 09:38 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2088.codfw.wmnet with reason: host reimage * 09:25 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2088.codfw.wmnet with OS bullseye * 09:20 jayme@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1002.eqiad.wmnet with OS bookworm * 09:20 jayme@cumin2002: START - Cookbook sre.k8s.reimage-stacked-control-plane Reimaging k8s control planes of cluster wikikube-eqiad: containerd migration * 09:11 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2088.codfw.wmnet with OS bullseye * 09:01 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2088.codfw.wmnet with OS bullseye * 08:54 kart_: Updated recommedation-api to 2024-11-08-142328-production and fix wikidata host header ([[phab:T379592|T379592]]) * 08:49 kartik@deploy2002: helmfile [ml-serve-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 08:49 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2088.codfw.wmnet with OS bullseye * 08:46 kartik@deploy2002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 08:33 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2088.codfw.wmnet with reason: host reimage * 08:27 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2088.codfw.wmnet with reason: host reimage * 08:14 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2088.codfw.wmnet with OS bullseye * 08:13 ladsgroup@deploy2002: Finished scap sync-world: Backport for [[gerrit:1090493{{!}}Revert "cswiki: Add celebration logo"]] (duration: 09m 18s) * 08:08 ladsgroup@deploy2002: ladsgroup, hamishz: Continuing with sync * 08:07 ladsgroup@deploy2002: ladsgroup, hamishz: Backport for [[gerrit:1090493{{!}}Revert "cswiki: Add celebration logo"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:06 kartik@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 08:04 ladsgroup@deploy2002: Started scap sync-world: Backport for [[gerrit:1090493{{!}}Revert "cswiki: Add celebration logo"]] * 07:47 Amir1: running extensions/Echo/maintenance/removeOrphanedEvents.php --force on all wikis ([[phab:T308084|T308084]]) * 05:17 eileen: civicrm upgraded from {{Gerrit|ad008134}} to {{Gerrit|23e08fc2}} * 02:56 tchin@deploy2002: Finished deploy [airflow-dags/analytics@58d7b82]: (no justification provided) (duration: 00m 10s) * 02:56 tchin@deploy2002: Started deploy [airflow-dags/analytics@58d7b82]: (no justification provided) * 02:55 tchin@deploy2002: deploy aborted: failedpythonlol (duration: 00m 05s) * 02:55 tchin@deploy2002: Started deploy [airflow-dags/analytics@58d7b82]: failedpythonlol * 00:54 tchin@deploy2002: Started deploy [airflow-dags/analytics@58d7b82]: (no justification provided) * 00:35 ejegg: payments-wiki upgraded from {{Gerrit|7d24a942}} to {{Gerrit|459f259b}} == 2024-11-12 == * 23:28 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 23:11 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 23:08 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:35 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:11 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 21:55 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 21:55 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 21:28 ebysans@deploy2002: Finished deploy [airflow-dags/analytics@58d7b82]: (no justification provided) (duration: 03m 50s) * 21:27 SandraEbele_: deploying airflow as part of weekly deployment train * 21:27 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088770{{!}}Fix warning about missing central account for temp users (T378289)]], [[gerrit:1088771{{!}}Check session provider when autocreating (T378289)]] (duration: 16m 11s) * 21:25 ebysans@deploy2002: Started deploy [airflow-dags/analytics@58d7b82]: (no justification provided) * 21:23 SandraEbele_: Deployed refinery using scap, then deployed onto hdfs * 21:22 urbanecm@deploy2002: urbanecm, tgr: Continuing with sync * 21:22 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 21:13 urbanecm@deploy2002: urbanecm, tgr: Backport for [[gerrit:1088770{{!}}Fix warning about missing central account for temp users (T378289)]], [[gerrit:1088771{{!}}Check session provider when autocreating (T378289)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:11 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1088770{{!}}Fix warning about missing central account for temp users (T378289)]], [[gerrit:1088771{{!}}Check session provider when autocreating (T378289)]] * 21:09 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1090550{{!}}Revert^2 "[CirrusSearch] testwiki: enable offloading weighted tags via EventBus" (T378983)]] (duration: 07m 18s) * 21:04 ebysans@deploy2002: Finished deploy [analytics/refinery@113ea5a] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@113ea5ac] (duration: 04m 09s) * 21:02 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1090550{{!}}Revert^2 "[CirrusSearch] testwiki: enable offloading weighted tags via EventBus" (T378983)]] * 20:59 ebysans@deploy2002: Started deploy [analytics/refinery@113ea5a] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@113ea5ac] * 20:59 ebysans@deploy2002: Finished deploy [analytics/refinery@113ea5a] (thin): Regular analytics weekly train THIN [analytics/refinery@113ea5ac] (duration: 04m 54s) * 20:54 ebysans@deploy2002: Started deploy [analytics/refinery@113ea5a] (thin): Regular analytics weekly train THIN [analytics/refinery@113ea5ac] * 20:53 ebysans@deploy2002: Finished deploy [analytics/refinery@113ea5a]: Regular analytics weekly train [analytics/refinery@113ea5ac] (duration: 07m 37s) * 20:49 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . * 20:46 ebysans@deploy2002: Started deploy [analytics/refinery@113ea5a]: Regular analytics weekly train [analytics/refinery@113ea5ac] * 19:42 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wikikube-ctrl1001.eqiad.wmnet * 19:42 jayme@cumin2002: START - Cookbook sre.hosts.remove-downtime for wikikube-ctrl1001.eqiad.wmnet * 19:42 jayme@cumin2002: conftool action : set/pooled=yes; selector: name=wikikube-ctrl1001.* * 19:40 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-ctrl1001.eqiad.wmnet with OS bookworm * 19:16 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-ctrl1001.eqiad.wmnet with reason: host reimage * 19:14 brennen@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.3 refs [[phab:T375662|T375662]] * 19:13 jayme@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-ctrl1001.eqiad.wmnet with reason: host reimage * 19:06 brennen: 1.44.0-wmf.3 train status ([[phab:T375662|T375662]]): no current blockers, rolling to group0. * 18:55 moritzm: installing libarchive security updates * 18:55 jayme@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1001.eqiad.wmnet with OS bookworm * 18:31 swfrench@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087604{{!}}Add title-case mapping to support migration to PHP 8.1 (T372603)]] (duration: 18m 48s) * 18:25 swfrench@deploy2002: swfrench: Continuing with sync * 18:24 swfrench-wmf: verified consistent 7.4-like title-case behavior in 7.4- and 8.1-based images, verified expected treatment of eszett in mwdebug - [[phab:T372603|T372603]] * 18:19 swfrench@deploy2002: swfrench: Backport for [[gerrit:1087604{{!}}Add title-case mapping to support migration to PHP 8.1 (T372603)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 18:12 swfrench@deploy2002: Started scap sync-world: Backport for [[gerrit:1087604{{!}}Add title-case mapping to support migration to PHP 8.1 (T372603)]] * 18:08 jayme@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl1001.eqiad.wmnet with OS bookworm * 18:01 moritzm: remove ganeti1012 from active ganeti nodes [[phab:T378921|T378921]] * 17:59 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:57 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 17:57 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 17:35 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:34 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 17:26 brennen@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.3 refs [[phab:T375662|T375662]] (duration: 45m 29s) * 16:55 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/push-notifications: apply * 16:54 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/push-notifications: apply * 16:54 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/push-notifications: apply * 16:53 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/push-notifications: apply * 16:48 jayme@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1001.eqiad.wmnet with OS bookworm * 16:47 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl1001.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 16:40 brennen@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.3 refs [[phab:T375662|T375662]] * 16:39 jayme@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-ctrl1001.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 16:37 jayme@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl1001.eqiad.wmnet with OS bookworm * 16:34 dancy@deploy2002: Installation of scap version "4.123.0" completed for 209 hosts * 16:30 dancy@deploy2002: Installing scap version "4.123.0" for 209 hosts * 16:18 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/push-notifications: apply * 16:18 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/push-notifications: apply * 16:17 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/push-notifications: apply * 16:17 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/push-notifications: apply * 16:16 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/push-notifications: apply * 16:15 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/push-notifications: apply * 16:13 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cr[1-2]-eqiad * 16:13 cmooney@cumin1002: START - Cookbook sre.hosts.remove-downtime for cr[1-2]-eqiad * 16:08 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 16:07 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 15:57 jayme@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1001.eqiad.wmnet with OS bookworm * 15:56 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 15:55 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 15:52 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 15:52 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 15:47 jayme@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl1001.eqiad.wmnet with OS bookworm * 15:42 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:42 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for IPs moving from old to new fundraising firewalls - cmooney@cumin1002" * 15:35 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for IPs moving from old to new fundraising firewalls - cmooney@cumin1002" * 15:27 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 15:19 jayme@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1001.eqiad.wmnet with OS bookworm * 15:16 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wikikube-ctrl1002.eqiad.wmnet * 15:16 jayme@cumin2002: START - Cookbook sre.hosts.remove-downtime for wikikube-ctrl1002.eqiad.wmnet * 15:16 topranks: moving fundraising links in eqiad from old to new firewall cluster and switches ([[phab:T377381|T377381]]) * 15:14 jayme@cumin2002: START - Cookbook sre.k8s.reimage-stacked-control-plane Reimaging k8s control planes of cluster wikikube-eqiad: containerd migration * 15:13 jayme@cumin2002: END (FAIL) - Cookbook sre.k8s.reimage-stacked-control-plane (exit_code=99) Reimaging k8s control planes of cluster wikikube-eqiad: containerd migration * 15:10 jayme@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl1001.eqiad.wmnet with OS bookworm * 15:04 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cr[1-2]-eqiad,pfw3-eqiad with reason: fundraising tech migration to new equipment * 15:04 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on cr[1-2]-eqiad,pfw3-eqiad with reason: fundraising tech migration to new equipment * 15:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1012.eqiad.wmnet * 14:30 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on fasw-c-eqiad with reason: fundraising tech migration to new equipment * 14:30 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on fasw-c-eqiad with reason: fundraising tech migration to new equipment * 14:28 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:28 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for IPs moving from old to new fundraising firewalls - cmooney@cumin1002" * 14:28 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for IPs moving from old to new fundraising firewalls - cmooney@cumin1002" * 14:26 moritzm: installing apache2 security updates * 14:23 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 14:08 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 14:08 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 14:03 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1090455{{!}}[CirrusSearch] testwiki: enable offloading weighted tags via EventBus (T378983)]] * 13:58 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1090455{{!}}[CirrusSearch] testwiki: enable offloading weighted tags via EventBus (T378983)]] * 13:48 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 13:47 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 13:43 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.3 refs [[phab:T375662|T375662]] * 13:37 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.3 refs [[phab:T375662|T375662]] * 13:21 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1012.eqiad.wmnet * 13:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1003.eqiad.wmnet to plain * 13:14 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1003.eqiad.wmnet to plain * 13:11 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1012.eqiad.wmnet * 13:11 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1012.eqiad.wmnet * 13:10 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:10 jayme@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1001.eqiad.wmnet with OS bookworm * 13:09 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1003.eqiad.wmnet to drbd * 13:09 jayme@cumin2002: START - Cookbook sre.k8s.reimage-stacked-control-plane Reimaging k8s control planes of cluster wikikube-eqiad: containerd migration * 13:09 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 12:59 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1003.eqiad.wmnet to drbd * 12:54 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1002.eqiad.wmnet to plain * 12:53 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1002.eqiad.wmnet to plain * 12:53 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1012.eqiad.wmnet * 12:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1012.eqiad.wmnet * 12:45 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1002.eqiad.wmnet to drbd * 12:35 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1002.eqiad.wmnet to drbd * 12:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1012.eqiad.wmnet * 12:28 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2236 slowly with 10 steps - slow repool [[phab:T373579|T373579]] * 12:25 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1012.eqiad.wmnet * 12:09 moritzm: remove ganeti1015 from active ganeti nodes [[phab:T378921|T378921]] * 12:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti1010.eqiad.wmnet * 12:08 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:08 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti1010.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 12:04 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti1010.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:54 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1015.eqiad.wmnet * 11:54 jmm@cumin2002: START - Cookbook sre.dns.netbox * 11:52 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 11:48 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp5017.eqsin.wmnet * 11:47 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti1010.eqiad.wmnet * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti1013.eqiad.wmnet * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:40 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:37 jmm@cumin2002: START - Cookbook sre.dns.netbox * 11:27 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti1013.eqiad.wmnet * 11:23 btullis@cumin1002: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid jvm daemons. * 11:01 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 11:01 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:45 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2217 gradually with 4 steps - [[phab:T379491|T379491]] * 10:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 10:37 btullis@cumin1002: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid jvm daemons. * 10:36 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 10:36 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 10:36 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 10:12 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db2236 slowly with 10 steps - slow repool [[phab:T373579|T373579]] * 09:59 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db2217 gradually with 4 steps - [[phab:T379491|T379491]] * 09:48 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P71006 and previous config saved to /var/cache/conftool/dbconfig/20241112-094851-arnaudb.json * 09:41 moritzm: update d-i netboot image for 12.8 point release [[phab:T379600|T379600]] * 09:33 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P71005 and previous config saved to /var/cache/conftool/dbconfig/20241112-093343-arnaudb.json * 09:18 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1090428{{!}}Revert "CirrusSearch: re-enable offloading weighted tags via EventBus"]] (duration: 06m 46s) * 09:18 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P71004 and previous config saved to /var/cache/conftool/dbconfig/20241112-091836-arnaudb.json * 09:17 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 09:14 urbanecm@deploy2002: trainbranchbot, urbanecm: Continuing with sync * 09:14 urbanecm@deploy2002: trainbranchbot, urbanecm: Backport for [[gerrit:1090428{{!}}Revert "CirrusSearch: re-enable offloading weighted tags via EventBus"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 09:11 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1090428{{!}}Revert "CirrusSearch: re-enable offloading weighted tags via EventBus"]] * 09:10 urbanecm@deploy2002: Sync cancelled. * 09:03 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P71002 and previous config saved to /var/cache/conftool/dbconfig/20241112-090329-arnaudb.json * 08:38 urbanecm@deploy2002: pfischer, urbanecm: Backport for [[gerrit:1089826{{!}}CirrusSearch: re-enable offloading weighted tags via EventBus (T378983)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:36 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1089826{{!}}CirrusSearch: re-enable offloading weighted tags via EventBus (T378983)]] * 08:32 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1015.eqiad.wmnet * 08:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1015.eqiad.wmnet * 08:28 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1089230{{!}}Fix WeightedTagsUpdater (T378664 T378983)]] (duration: 06m 59s) * 08:25 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1015.eqiad.wmnet * 08:21 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1089230{{!}}Fix WeightedTagsUpdater (T378664 T378983)]] * 08:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1009.eqiad.wmnet * 08:17 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1009.eqiad.wmnet * 08:04 moritzm: installing apache security updates * 08:03 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P71001 and previous config saved to /var/cache/conftool/dbconfig/20241112-080303-arnaudb.json * 08:02 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:02 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:02 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 08:02 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 07:53 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti-test2003 * 07:53 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti-test2003 * 07:52 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:52 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 05:01 mwpresync@deploy2002: Pruned MediaWiki: 1.43.0-wmf.28 (duration: 01m 52s) == 2024-11-11 == * away: UTC late deploys done * 23:08 tgr@deploy2002: scap failed: <CalledProcessError> Command '['sudo', '-u', 'mwbuilder', '-n', '--', '/usr/bin/scap', 'mwscript', '--no-local-config', '--directory', '/srv/mediawiki-staging', '--user', 'www-data', '--network', '--', 'purgeMessageBlobStore.php']' returned non-zero exit status 1. (scap version: 4.122.0) (duration: 11m 44s) * 23:02 tgr@deploy2002: d3r1ck01, tgr: Continuing with sync * 22:59 tgr@deploy2002: d3r1ck01, tgr: Backport for [[gerrit:1089807{{!}}PageUpdater: restore call to RevisionFromEditComplete (T379152)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 22:56 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1089807{{!}}PageUpdater: restore call to RevisionFromEditComplete (T379152)]] * 22:30 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1089864{{!}}contactpage: Update AffCom contact form messages (Resubmit) (T375392)]] (duration: 25m 48s) * 22:21 tgr@deploy2002: tgr: Continuing with sync * 22:19 tgr@deploy2002: tgr: Backport for [[gerrit:1089864{{!}}contactpage: Update AffCom contact form messages (Resubmit) (T375392)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 22:13 eileen: civicrm upgraded from {{Gerrit|4330588d}} to {{Gerrit|bcd072a1}} * 22:05 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1089864{{!}}contactpage: Update AffCom contact form messages (Resubmit) (T375392)]] * 21:38 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1082174{{!}}contactpages: Update Affcom UserGroup application form (T375392)]] (duration: 28m 07s) * 21:33 tgr@deploy2002: ammarpad, tgr: Continuing with sync * 21:12 tgr@deploy2002: ammarpad, tgr: Backport for [[gerrit:1082174{{!}}contactpages: Update Affcom UserGroup application form (T375392)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:10 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1082174{{!}}contactpages: Update Affcom UserGroup application form (T375392)]] * 20:21 eileen: civicrm upgraded from {{Gerrit|65a8de90}} to {{Gerrit|4330588d}} * 17:55 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Add superset links - oblivian@cumin1002 - [[phab:T379567|T379567]]" * 17:55 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Add superset links - oblivian@cumin1002 - [[phab:T379567|T379567]] * 17:54 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Add superset links - oblivian@cumin1002 - [[phab:T379567|T379567]] * 17:54 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Add superset links - oblivian@cumin1002 - [[phab:T379567|T379567]]" * 16:19 elukey: restart pybal on lvs2013 (primary) to pick up new kartotherian-k8s-ssl service * 16:17 elukey: restart pybal on lvs2014 (secondary) to pick up new kartotherian-k8s-ssl service * 16:10 elukey: restart pybal on lvs1019 (primary) to pick up new kartotherian-k8s-ssl service * 16:09 elukey: restart pybal on lvs1020 (secondary) to pick up new kartotherian-k8s-ssl service * 16:09 moritzm: installing libarchive security updates * 15:55 elukey@puppetserver1001: conftool action : set/pooled=yes:weight=10; selector: dc=codfw,cluster=maps,service=kartotherian-k8s-ssl * 15:55 elukey@puppetserver1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=maps,service=kartotherian-k8s-ssl * 15:54 elukey@puppetserver1001: conftool action : set/pooled=yes:weight=1; selector: cluster=codfw,service=kartotherian-k8s-ssl * 15:04 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1311.eqiad.wmnet with OS bookworm * 15:04 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 15:04 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 15:03 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1309.eqiad.wmnet with OS bookworm * 15:03 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 15:00 Lucas_WMDE: UTC afternoon backport+config window done * 15:00 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1089739{{!}}wikipedias: clear link-recommendations on page save (T379522)]] (duration: 10m 59s) * 14:58 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 14:56 lucaswerkmeister-wmde@deploy2002: migr, lucaswerkmeister-wmde: Continuing with sync * 14:51 lucaswerkmeister-wmde@deploy2002: migr, lucaswerkmeister-wmde: Backport for [[gerrit:1089739{{!}}wikipedias: clear link-recommendations on page save (T379522)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:49 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1089739{{!}}wikipedias: clear link-recommendations on page save (T379522)]] * 14:44 btullis@cumin1002: END (FAIL) - Cookbook sre.presto.roll-restart-workers (exit_code=99) for Presto an-presto cluster: Roll restart of all Presto's jvm daemons. * 14:37 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1310.eqiad.wmnet with OS bookworm * 14:37 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 14:36 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 14:35 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2088.codfw.wmnet with OS bullseye * 14:33 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1312.eqiad.wmnet with OS bookworm * 14:33 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 14:32 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 14:32 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1306.eqiad.wmnet with OS bookworm * 14:32 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 14:32 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 14:28 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1308.eqiad.wmnet with OS bookworm * 14:28 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 14:28 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 14:27 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2088.codfw.wmnet with OS bullseye * 14:27 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1309.eqiad.wmnet with reason: host reimage * 14:26 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1307.eqiad.wmnet with OS bookworm * 14:26 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 14:25 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 14:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1311.eqiad.wmnet with reason: host reimage * 14:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1305.eqiad.wmnet with OS bookworm * 14:22 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 14:21 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 14:20 zabe@deploy2002: Finished scap sync-world: Backport for [[gerrit:1078764{{!}}zhwiki: Allow event-organizer self remove usergroup (T376061)]] (duration: 10m 40s) * 14:20 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2088.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 14:19 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1310.eqiad.wmnet with reason: host reimage * 14:16 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1306.eqiad.wmnet with reason: host reimage * 14:15 zabe@deploy2002: zabe, zhaofjx: Continuing with sync * 14:13 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1312.eqiad.wmnet with reason: host reimage * 14:12 zabe@deploy2002: zabe, zhaofjx: Backport for [[gerrit:1078764{{!}}zhwiki: Allow event-organizer self remove usergroup (T376061)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:10 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1308.eqiad.wmnet with reason: host reimage * 14:09 zabe@deploy2002: Started scap sync-world: Backport for [[gerrit:1078764{{!}}zhwiki: Allow event-organizer self remove usergroup (T376061)]] * 14:07 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2088.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 14:07 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1307.eqiad.wmnet with reason: host reimage * 14:06 btullis@cumin1002: START - Cookbook sre.presto.roll-restart-workers for Presto an-presto cluster: Roll restart of all Presto's jvm daemons. * 14:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts irc2002.wikimedia.org * 14:05 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:05 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: irc2002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:05 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: irc2002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:04 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1312.eqiad.wmnet with reason: host reimage * 14:04 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1308.eqiad.wmnet with reason: host reimage * 14:04 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1309.eqiad.wmnet with reason: host reimage * 14:04 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1311.eqiad.wmnet with reason: host reimage * 14:04 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1305.eqiad.wmnet with reason: host reimage * 14:04 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1310.eqiad.wmnet with reason: host reimage * 14:03 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1307.eqiad.wmnet with reason: host reimage * 14:03 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1306.eqiad.wmnet with reason: host reimage * 14:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1305.eqiad.wmnet with reason: host reimage * 13:55 moritzm: powercycled ganeti2031 * 13:44 jmm@cumin2002: START - Cookbook sre.dns.netbox * 13:39 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts irc2002.wikimedia.org * 13:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts irc1002.wikimedia.org * 13:38 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:38 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: irc1002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 13:34 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1312.eqiad.wmnet with OS bookworm * 13:34 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1311.eqiad.wmnet with OS bookworm * 13:34 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: irc1002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 13:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1311.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:33 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1312.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:33 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1310.eqiad.wmnet with OS bookworm * 13:32 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1309.eqiad.wmnet with OS bookworm * 13:32 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1308.eqiad.wmnet with OS bookworm * 13:32 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1307.eqiad.wmnet with OS bookworm * 13:32 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1306.eqiad.wmnet with OS bookworm * 13:31 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1306.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:31 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1305.eqiad.wmnet with OS bookworm * 13:30 jmm@cumin2002: START - Cookbook sre.dns.netbox * 13:29 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1307.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:29 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1309.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:29 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1310.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:29 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1308.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:29 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1305.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:25 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts irc1002.wikimedia.org * 13:22 jynus: reverting deleted rows on db1176 (mailman3) [[phab:T379519|T379519]] * 13:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1312.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:15 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1311.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:12 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1050.eqiad.wmnet to cluster eqiad and group D * 13:12 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1306.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:11 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1050.eqiad.wmnet to cluster eqiad and group D * 13:11 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1310.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:11 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1306.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:11 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1309.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:11 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1308.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:11 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1307.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:10 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1306.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:10 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1305.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:10 dreamyjazz@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085593{{!}}Exclude temp account viewer autopromotions from RC (T377829)]] (duration: 07m 07s) * 13:08 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:08 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for wikikube-worker - jclark@cumin1002" * 13:08 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for wikikube-worker - jclark@cumin1002" * 13:05 dreamyjazz@deploy2002: mszabo, dreamyjazz: Continuing with sync * 13:05 dreamyjazz@deploy2002: mszabo, dreamyjazz: Backport for [[gerrit:1085593{{!}}Exclude temp account viewer autopromotions from RC (T377829)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 13:05 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Fix bug in requestctl commit - oblivian@cumin1002" * 13:05 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix bug in requestctl commit - oblivian@cumin1002 * 13:04 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix bug in requestctl commit - oblivian@cumin1002 * 13:04 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Fix bug in requestctl commit - oblivian@cumin1002" * 13:04 jclark@cumin1002: START - Cookbook sre.dns.netbox * 13:03 dreamyjazz@deploy2002: Started scap sync-world: Backport for [[gerrit:1085593{{!}}Exclude temp account viewer autopromotions from RC (T377829)]] * 13:00 btullis@cumin1002: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-druid-analytics cluster: Roll restart of jvm daemons. * 12:54 btullis@cumin1002: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-druid-analytics cluster: Roll restart of jvm daemons. * 12:48 btullis@cumin1002: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-druid-public cluster: Roll restart of jvm daemons. * 12:42 btullis@cumin1002: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-druid-public cluster: Roll restart of jvm daemons. * 12:41 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1049.eqiad.wmnet to cluster eqiad and group D * 12:40 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1049.eqiad.wmnet to cluster eqiad and group D * 12:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1050.eqiad.wmnet * 12:29 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1050.eqiad.wmnet * 12:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1049.eqiad.wmnet * 12:23 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 12:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1049.eqiad.wmnet * 12:18 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1050 * 12:16 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1050 * 12:16 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1049 * 12:15 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1049 * 12:13 btullis@cumin1002: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-analytics cluster: Roll restart of jvm daemons. * 12:06 btullis@cumin1002: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-analytics cluster: Roll restart of jvm daemons. * 12:01 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 11:56 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 11:56 btullis@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host an-redacteddb1001.eqiad.wmnet * 11:54 btullis@cumin1002: END (PASS) - Cookbook sre.opensearch.roll-restart-reboot (exit_code=0) rolling restart_daemons on A:datahubsearch * 11:46 btullis@cumin1002: START - Cookbook sre.opensearch.roll-restart-reboot rolling restart_daemons on A:datahubsearch * 11:44 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 11:43 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-redacteddb1001.eqiad.wmnet * 11:43 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 11:43 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 11:30 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 11:06 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 11:04 btullis@cumin1002: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0) * 10:57 btullis@cumin1002: START - Cookbook sre.wikireplicas.update-views * 10:55 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 10:01 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 10:01 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 10:00 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 10:00 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 09:10 moritzm: remove ganeti1011 from active ganeti nodes [[phab:T378921|T378921]] * 09:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1011.eqiad.wmnet * 08:40 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088628{{!}}Update Wikimedia Foundation primary address. (T379417)]], [[gerrit:1082559{{!}}Update Office Wiki favicon to use wmf.ico and also delete now unused office.ico file. (T378026)]] (duration: 07m 15s) * 08:35 urbanecm@deploy2002: urbanecm, varnent: Continuing with sync * 08:35 urbanecm@deploy2002: urbanecm, varnent: Backport for [[gerrit:1088628{{!}}Update Wikimedia Foundation primary address. (T379417)]], [[gerrit:1082559{{!}}Update Office Wiki favicon to use wmf.ico and also delete now unused office.ico file. (T378026)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:32 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1088628{{!}}Update Wikimedia Foundation primary address. (T379417)]], [[gerrit:1082559{{!}}Update Office Wiki favicon to use wmf.ico and also delete now unused office.ico file. (T378026)]] * 08:32 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] (duration: 20m 59s) * 08:24 urbanecm@deploy2002: urbanecm, hamishz: Continuing with sync * 08:22 urbanecm@deploy2002: urbanecm, hamishz: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:18 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 08:18 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 08:17 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 08:17 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 08:11 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] * 07:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet * 07:49 _joe_: installing conftool 4.1.0 on puppetservers * 07:15 kartik@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . == 2024-11-10 == * 23:43 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 23:17 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 23:14 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:51 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:29 jhathaway: re-imaging ms-be2082 to test efi boot order * 12:32 elukey: optimize table `archive` on db2217 - frwiki db - corrupt index error (host already depooled) * 12:26 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:26 slyngshede@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:25 slyngshede@cumin1002: dbctl commit (dc=all): 'Depool db2217', diff saved to https://phabricator.wikimedia.org/P70997 and previous config saved to /var/cache/conftool/dbconfig/20241110-122532-slyngshede.json == 2024-11-09 == * 14:49 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 14:49 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply == 2024-11-08 == * 23:35 zabe: attach Sotiale's local accounts on newly created wikis * 23:16 Reedy: ran `delete from oathauth_devices where oad_id=4506;` on centralauth for [[phab:T379398|T379398]] because oad_user=0 * 23:07 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 22:54 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:52 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:44 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:41 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:39 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:29 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:28 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 21:18 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:18 denisse: disabling Puppet on grafana2001 - [[phab:T379043|T379043]] * 21:17 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:12 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 21:08 mutante: cumint2002 [cumin2002:~] $ sudo systemctl reset-failed * 21:05 mutante: cumin2002 - sudo systemctl status httpbb_kubernetes_mw-api-int_hourly * 20:28 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] (duration: 10m 19s) * 20:24 aude@deploy2002: seddon, aude: Continuing with sync * 20:21 aude@deploy2002: seddon, aude: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:20 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 20:18 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] * 20:15 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] (duration: 10m 55s) * 20:10 aude@deploy2002: aude: Continuing with sync * 20:06 aude@deploy2002: aude: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:04 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] * 20:02 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] (duration: 14m 33s) * 19:59 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:59 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:57 aude@deploy2002: aude: Continuing with sync * 19:50 aude@deploy2002: aude: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:47 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2168.codfw.wmnet with OS bookworm * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:40 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 18:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2167.codfw.wmnet with OS bookworm * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2170.codfw.wmnet with OS bookworm * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:32 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2169.codfw.wmnet with OS bookworm * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2166.codfw.wmnet with OS bookworm * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2165.codfw.wmnet with OS bookworm * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2164.codfw.wmnet with OS bookworm * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:19 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:17 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 18:06 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:04 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 18:03 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2145.codfw.wmnet with OS bookworm * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 17:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2144.codfw.wmnet with OS bookworm * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:56 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-worker1005.eqiad.wmnet * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:54 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2170.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2157.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 17:47 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2169.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2160.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2168.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2158.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:43 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2167.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2162.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2166.codfw.wmnet with OS bookworm * 17:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:40 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2156.codfw.wmnet with OS bookworm * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2165.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2161.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:37 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2164.codfw.wmnet with OS bookworm * 17:37 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2159.codfw.wmnet with OS bookworm * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:34 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:32 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:30 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:29 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:27 jynus: rebuild frwiki.geo_tags @ an-redacteddb1001 * 17:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:17 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:17 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bullseye * 17:15 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:13 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:10 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:09 herron@cumin1002: START - Cookbook sre.dns.netbox * 17:09 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1005.eqiad.wmnet * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:05 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:58 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:55 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2162.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2161.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2160.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2159.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2158.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2157.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2156.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2145.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2144.codfw.wmnet with OS bookworm * 16:43 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:35 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:25 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:22 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 16:10 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2139.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:48 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2142.codfw.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2143.codfw.wmnet with OS bookworm * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2141.codfw.wmnet with OS bookworm * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2129.codfw.wmnet with OS bookworm * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2140.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2138.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2137.codfw.wmnet with OS bookworm * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:25 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 15:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2128.codfw.wmnet with OS bookworm * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:19 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2087.codfw.wmnet with OS bullseye * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 15:15 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 15:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:08 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 15:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 15:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 14:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:55 elukey@cumin1002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 14:52 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2143.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2142.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2141.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2140.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2139.codfw.wmnet with OS bookworm * 14:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 14:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2138.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2137.codfw.wmnet with OS bookworm * 14:38 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2129.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2128.codfw.wmnet with OS bookworm * 14:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2143'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2143'] * 14:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2140'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2140'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2137'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2137'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:27 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2086.codfw.wmnet with OS bullseye * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 13:31 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 13:30 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:29 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 12:32 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 12:29 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:28 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 12:07 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 12:04 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2087.codfw.wmnet with OS bullseye * 11:59 apergos: testing of account creation backfill script on mwmaint2001 complete for the moment * 11:53 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 11:51 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:48 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:37 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 11:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2016.codfw.wmnet * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:24 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:16 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:13 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 11:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:04 jmm@cumin2002: START - Cookbook sre.dns.netbox * 11:00 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:58 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:56 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2016.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2015.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:51 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:45 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2015.codfw.wmnet * 10:45 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:39 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 10:34 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:29 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1011.eqiad.wmnet * 10:18 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet * 10:02 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:01 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 09:57 apergos: testing account creation backfill script on mwmaint2001 in screen session as ariel * 09:49 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2085.codfw.wmnet with OS bullseye * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:39 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:38 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:29 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:29 stevemunene@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:24 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:20 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:09 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 09:09 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2085.codfw.wmnet with OS bullseye * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a1-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a1-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b7-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b7-codfw * 08:56 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 08:54 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b6-codfw * 08:54 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b6-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b2-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b2-codfw * 08:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a7-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a7-codfw * 08:43 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a6-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a6-codfw * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a5-codfw * 08:42 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a5-codfw * 08:42 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a3-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a3-codfw * 08:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a2-codfw * 08:40 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a2-codfw * 08:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-f1-eqiad * 08:39 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-f1-eqiad * 08:35 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-e1-eqiad * 08:35 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-e1-eqiad * 08:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cloudsw2-d5-eqiad * 08:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:34 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cloudsw2-d5-eqiad * 08:33 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:31 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:30 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr2-eqsin * 08:30 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cr2-eqsin * 08:27 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:27 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:26 moritzm: upgraded ircstream on irc.wikimedia.org to 1.0.1 * 08:08 XioNoX: update gnmic to 0.39 on all netflow hosts * 08:05 XioNoX: add gnmic 0.39 from official git repo to bookworm reprepro - [[phab:T347461|T347461]] * 07:48 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:48 XioNoX: manually install/test gnmic 0.39 on netflow6001 * 07:46 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:33 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C == 2024-11-07 == * 23:00 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bookworm * 22:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 22:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:19 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2026.codfw.wmnet with OS bullseye * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:10 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2027.codfw.wmnet with OS bullseye * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:53 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:52 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:51 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:47 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:41 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:33 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:30 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:27 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:26 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:22 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:21 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bookworm * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2027.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2026.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2026'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2026'] * 21:11 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:11 jsn@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] (duration: 08m 28s) * 21:09 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:06 jsn@deploy2002: suecarmol, jsn: Continuing with sync * 21:06 jsn@deploy2002: suecarmol, jsn: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 21:02 jsn@deploy2002: Started scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:59 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:46 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 20:43 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:35 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] (duration: 13m 02s) * 20:30 cdanis@deploy2002: cdanis, aude: Continuing with sync * 20:25 cdanis@deploy2002: cdanis, aude: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:22 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] * 20:21 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] (duration: 10m 45s) * 20:15 cdanis@deploy2002: cdanis, bvibber: Continuing with sync * 20:13 cdanis@deploy2002: cdanis, bvibber: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:10 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] * 20:02 dduvall@deploy2002: Installing scap version "4.122.0" for 209 hosts * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:42 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:37 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:33 cmooney@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 19:33 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:23 cdanis: [[phab:T379199|T379199]] 💙cdanis@mwmaint2002.codfw.wmnet ~ 🕝☕ mwscript sql.php --wiki=testcommonswiki /srv/mediawiki/php-1.44.0-wmf.2/extensions/JsonConfig/sql/mysql/tables-generated.sql * 19:19 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:19 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:18 aokoth@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host vrts1003.eqiad.wmnet * 19:11 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:11 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:10 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:10 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:08 mutante: VRTS - switching firewall provider from iptables to nftables * 19:06 aokoth@cumin1002: START - Cookbook sre.hosts.reboot-single for host vrts1003.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host aux-k8s-worker1004.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 19:00 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:58 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:58 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:50 herron@cumin1002: START - Cookbook sre.dns.netbox * 18:50 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1004.eqiad.wmnet * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:14 swfrench-wmf: updated changeprop-jobqueue to 2024-11-05-170900-production - [[phab:T356241|T356241]] * 18:13 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 18:11 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 18:01 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:59 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 17:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 17:55 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cloudvirt1063.eqiad.wmnet * 17:55 fnegri@cumin1002: START - Cookbook sre.hosts.remove-downtime for cloudvirt1063.eqiad.wmnet * 17:48 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply * 17:48 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply * 17:44 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply * 17:43 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply * 17:42 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply * 17:41 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1063.eqiad.wmnet with OS bookworm * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:27 fnegri@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:18 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device fasw2-c1a-eqiad * 17:16 cmooney@cumin1002: START - Cookbook sre.network.tls for network device fasw2-c1a-eqiad * 17:12 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-global # [[phab:T375508|T375508]] * 17:09 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 17:08 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 17:06 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-per-wiki # [[phab:T375508|T375508]] * 17:03 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 17:02 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 17:01 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:57 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2084.codfw.wmnet with OS bullseye * 16:57 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 16:56 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:54 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 16:54 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 16:48 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:46 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2084.codfw.wmnet with OS bullseye * 16:45 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:41 fnegri@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bookworm * 16:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:32 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:28 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:28 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:24 arlolra@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply * 16:23 arlolra@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply * 16:15 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:07 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 16:04 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 15:57 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-eqiad * 15:54 moritzm: remove ganeti1010 from active ganeti nodes [[phab:T378921|T378921]] * 15:53 joelyrookewmde: Finished populateSitesTable for tcywiktionary ([[phab:T378466|T378466]]) and tcywikisource ([[phab:T378474|T378474]]) * 15:53 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 15:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 15:39 jgiannelos@deploy2002: Finished deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase (duration: 21m 33s) * 15:33 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-eqiad * 15:31 taavi: taavi@deploy2002 ~ $ mwscript-k8s migrateUserGroup.php -- --wiki=labswiki contentadmin sysop # [[phab:T375950|T375950]] * 15:31 joelyrookewmde: joelyrookewmde@mwmaint2002:~$ foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https * 15:29 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-codfw * 15:18 jgiannelos@deploy2002: Started deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase * 15:16 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 15:15 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 01m 13s) * 15:14 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:11 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 00m 52s) * 15:10 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:07 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-codfw * 14:55 hashar: Restarted CI Jenkins for plugins update * 14:41 moritzm: installing python-git security updates * 14:29 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 14:25 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] (duration: 09m 37s) * 14:20 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Continuing with sync * 14:18 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:15 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 14:15 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] * 14:13 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] (duration: 10m 08s) * 14:09 kartik@deploy2002: kartik: Continuing with sync * 14:06 kartik@deploy2002: kartik: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:04 joal@deploy2002: Finished deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] (duration: 01m 44s) * 14:03 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] * 14:03 joal@deploy2002: Started deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] * 13:52 cwhite: running thanos bucket cleanup on titan1001 - [[phab:T351927|T351927]] * 13:37 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1048 * 13:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1048 * 13:35 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1047 * 13:34 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 13:23 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] (duration: 03m 44s) * 13:20 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] * 13:13 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] (duration: 05m 03s) * 13:08 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] * 12:53 joal@deploy2002: Finished deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] (duration: 16m 47s) * 12:40 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:40 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:39 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:37 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:36 joal@deploy2002: Started deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] * 12:16 vgutierrez: repool liberica on lvs1013 * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply * 11:27 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: sync * 11:25 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/proton: sync * 11:19 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply * 11:18 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 11:16 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 11:11 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:10 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 11:09 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 11:03 vgutierrez: depool liberica on lvs1013 * 11:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 10:58 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-test-eqiad * 10:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2081.codfw.wmnet with OS bullseye * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:33 jmm@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-test-eqiad * 10:21 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:18 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:07 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 10:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1009.eqiad.wmnet * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70981 and previous config saved to /var/cache/conftool/dbconfig/20241107-095205-arnaudb.json * 09:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1009.eqiad.wmnet * 09:41 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2081.codfw.wmnet with OS bullseye * 09:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70980 and previous config saved to /var/cache/conftool/dbconfig/20241107-093657-arnaudb.json * 09:29 vgutierrez: upload liberica 0.4 to apt.wm.o (bookworm-wikimedia) * 09:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70979 and previous config saved to /var/cache/conftool/dbconfig/20241107-092150-arnaudb.json * 09:21 moritzm: installing openjdk-8 security updates * 09:21 moritzm: uploaded openjdk-8 8u412-ga-1~deb11u1 to apt.wikimedia.org for bookworm-wikimedia * 09:14 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70978 and previous config saved to /var/cache/conftool/dbconfig/20241107-090643-arnaudb.json * 08:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 08:40 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:26 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] (duration: 18m 39s) * 08:25 _joe_: runing scap pull on mwdebug2001/2002 * 08:19 kartik@deploy2002: kartik, abi: Continuing with sync * 08:13 kartik@deploy2002: kartik, abi: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:07 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] * 08:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70977 and previous config saved to /var/cache/conftool/dbconfig/20241107-080618-arnaudb.json * 08:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 07:50 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:28 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:18 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply * 07:03 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply * 06:55 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply * 06:47 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply * 06:44 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply * 06:39 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply == 2024-11-06 == * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2152.codfw.wmnet with OS bookworm * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1006.eqiad.wmnet with OS bookworm * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:41 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2151.codfw.wmnet with OS bookworm * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2154.codfw.wmnet with OS bookworm * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:34 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1005.eqiad.wmnet with OS bookworm * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:30 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2153.codfw.wmnet with OS bookworm * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1004.eqiad.wmnet with OS bookworm * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2155.codfw.wmnet with OS bookworm * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 23:12 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 23:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:02 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1004.eqiad.wmnet with OS bookworm * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1005.eqiad.wmnet with OS bookworm * 22:43 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1006.eqiad.wmnet with OS bookworm * 22:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2155.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2154.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2153.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2152.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2151.codfw.wmnet with OS bookworm * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2155'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2154'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2154'] * 22:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2155'] * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:18 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:14 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:10 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2150.codfw.wmnet with OS bookworm * 21:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2148.codfw.wmnet with OS bookworm * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2147.codfw.wmnet with OS bookworm * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2146.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2149.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:20 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:18 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 21:12 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet [reason: PSU replaced] * 21:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 21:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 21:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2148.codfw.wmnet with OS bookworm * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2150.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2149.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2147.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2146.codfw.wmnet with OS bookworm * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2149'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2148'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2147'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2146'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2149'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2148'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2147'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2146'] * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:24 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:19 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 19:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2006.codfw.wmnet with OS bookworm * 19:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:41 brett: Remove RSA cert support from P:idp clients (icinga, karma, klaxon, librenms, orchestrator) ([[phab:T375569|T375569]]) * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 18:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:03 sukhe: dummy authdns-update to test CR {{Gerrit|10857508}} * 17:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:45 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:35 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 17:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:17 hnowlan: importing debs for mercurius-1.0.1 * 17:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:14 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:11 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:05 vriley@cumin1002: START - Cookbook sre.dns.netbox * 16:58 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:37 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:36 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:32 moritzm: remove ganeti1014 from active ganeti nodes [[phab:T378921|T378921]] * 16:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 16:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:26 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:25 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 16:24 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:23 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:20 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:17 jclark@cumin1002: START - Cookbook sre.dns.netbox * 16:10 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2136 gradually with 4 steps - cloned on db2236 * 16:10 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:01 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4010.ulsfo.wmnet * 15:59 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:58 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:57 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 (duration: 01m 23s) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 * 15:55 topranks: rebooting lvs4010 to verify new IPv6 sysctl's for RA processing work [[phab:T358260|T358260]] * 15:55 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs4010.ulsfo.wmnet * 15:53 vriley@cumin1002: START - Cookbook sre.dns.netbox * 15:51 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:50 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:43 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:31 moritzm: installing Linux 5.10.226 on bullseye hosts * 15:24 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db2136 gradually with 4 steps - cloned on db2236 * 15:18 mutante: gitlab1004 - systemctl start wmf_auto_restart_ssh-gitlab (because it had failed with "Service ssh-gitlab not present or not running") but now it's just fine and exits with "No restart necessary" [[phab:T379166|T379166]] * 15:13 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 15:12 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] (duration: 38m 45s) * 15:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2136.codfw.wmnet onto db2236.codfw.wmnet * 15:00 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 14:59 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:51 moritzm: installing php7.4 security updates * 14:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1046.eqiad.wmnet * 14:48 moritzm: installing usb.ids updates from Bookworm point release * 14:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1046.eqiad.wmnet * 14:42 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1046 * 14:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1046 * 14:33 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] * 14:31 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] (duration: 15m 01s) * 14:31 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:31 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:27 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Continuing with sync * 14:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1045.eqiad.wmnet * 14:20 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet * 14:19 sukhe: depool cp2031 * 14:19 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1045.eqiad.wmnet * 14:16 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] * 14:16 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1045 * 14:14 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1045 * 14:02 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:02 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 13:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:52 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:47 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:43 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1041.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 13:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 13:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2136.codfw.wmnet onto db2236.codfw.wmnet * 12:58 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Cloning db2136 in db2236 for [[phab:T373579|T373579]]', diff saved to https://phabricator.wikimedia.org/P70964 and previous config saved to /var/cache/conftool/dbconfig/20241106-125648-arnaudb.json * 12:55 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: START - Cookbook sre.mysql.depool db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:52 slyngs: IDP/CAS-SSO Enable Redis TGT backend * 12:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 12:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:25 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 depool to test cookbook hotfix on CR 1087895', diff saved to https://phabricator.wikimedia.org/P70960 and previous config saved to /var/cache/conftool/dbconfig/20241106-122348-arnaudb.json * 12:23 marostegui: Migrate db1125 to MariaDB 10.6.20 [[phab:T378940|T378940]] * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): '"db1206 pending"', diff saved to https://phabricator.wikimedia.org/P70959 and previous config saved to /var/cache/conftool/dbconfig/20241106-122318-arnaudb.json * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:09 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db1206 quickly with 2 steps - repool * 12:09 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - repool * 12:06 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 12:06 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply * 12:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db1206', diff saved to https://phabricator.wikimedia.org/P70957 and previous config saved to /var/cache/conftool/dbconfig/20241106-120536-arnaudb.json * 12:03 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 12:03 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:32 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1041.eqiad.wmnet * 11:08 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 10:50 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 10:43 fabfur: rolling out haproxykafka on all ULSFO cp hosts (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087862) ([[phab:T378578|T378578]]) * 10:43 elukey: depool maps1005 to test an nginx config - [[phab:T378944|T378944]] * 10:41 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 10:32 XioNoX: push new pfw policies - [[phab:T379127|T379127]] * 10:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:27 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jnuche@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] (duration: 08m 03s) * 09:55 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jnuche@deploy2002: jnuche: Continuing with sync * 09:54 jnuche@deploy2002: jnuche: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 09:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:52 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:51 jnuche@deploy2002: Started scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] * 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1044.eqiad.wmnet * 09:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1044.eqiad.wmnet * 09:38 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1043.eqiad.wmnet * 09:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1043.eqiad.wmnet * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1044 * 09:28 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1044 * 09:27 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1043 * 09:25 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1043 * 09:20 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 09:10 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 08:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:46 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:12 volans: manually cleared /root/.ssh/known_hosts on the cumin hosts - [[phab:T336485|T336485]] * 05:52 kart_: Updated cxserver to 2024-10-25-044319-production ([[phab:T377160|T377160]], [[phab:T375102|T375102]], [[phab:T371420|T371420]]) * 05:38 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 05:38 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 05:37 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 05:36 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply * 05:34 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 05:33 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply * 01:30 zabe@deploy2002: Finished scap sync-world: [[phab:T378260|T378260]] (duration: 07m 34s) * 01:23 zabe@deploy2002: Started scap sync-world: [[phab:T378260|T378260]] * 00:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es1021 gradually with 4 steps - Maint over * 00:21 ryankemper: [[phab:T377594|T377594]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087598; ran puppet on `snapshot101[0-7]*`. These dumps should be re-enabled now * 00:02 ebernhardson@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] (duration: 08m 48s) == 2024-11-05 == * 23:59 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool es1021 gradually with 4 steps - Maint over * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2134.codfw.wmnet with OS bookworm * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 ebernhardson@deploy2002: ebernhardson: Continuing with sync * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:56 ebernhardson@deploy2002: ebernhardson: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2132.codfw.wmnet with OS bookworm * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:55 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2130.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2133.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2131.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:53 ebernhardson@deploy2002: Started scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] * 23:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:44 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:17 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2134.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2133.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2132.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2131.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2130.codfw.wmnet with OS bookworm * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2135'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2134'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2133'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2132'] * 22:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2130'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2135'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2134'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2133'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2132'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2130'] * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2134 * 22:30 jhancock@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host wikikube-worker2135 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2133 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2132 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2131 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2130 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2135 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2134 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2133 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2131 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2130 * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:26 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:47 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] (duration: 12m 39s) * 21:34 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] * 21:33 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] (duration: 31m 18s) * 21:11 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:06 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:02 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] * 21:01 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:00 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:56 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 20:56 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:14 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:07 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 20:07 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 20:02 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 19:57 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:57 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:20 eileen: civicrm upgraded from {{Gerrit|26d8013c}} to {{Gerrit|65a8de90}} * 18:45 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:10 Amir1: gradual delete of thumbs in fawiki local images in both dcs * 18:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1021 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70948 and previous config saved to /var/cache/conftool/dbconfig/20241105-180013-ladsgroup.json * 18:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:59 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70947 and previous config saved to /var/cache/conftool/dbconfig/20241105-175851-ladsgroup.json * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 17:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70946 and previous config saved to /var/cache/conftool/dbconfig/20241105-174344-ladsgroup.json * 17:42 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:32 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:32 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70945 and previous config saved to /var/cache/conftool/dbconfig/20241105-172837-ladsgroup.json * 17:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70943 and previous config saved to /var/cache/conftool/dbconfig/20241105-171330-ladsgroup.json * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70942 and previous config saved to /var/cache/conftool/dbconfig/20241105-170636-ladsgroup.json * 17:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70941 and previous config saved to /var/cache/conftool/dbconfig/20241105-170609-ladsgroup.json * 16:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70940 and previous config saved to /var/cache/conftool/dbconfig/20241105-165103-ladsgroup.json * 16:37 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] (duration: 08m 02s) * 16:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70939 and previous config saved to /var/cache/conftool/dbconfig/20241105-163556-ladsgroup.json * 16:34 cdanis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 16:32 cdanis@cumin1002: START - Cookbook sre.dns.netbox * 16:29 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] * 16:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70938 and previous config saved to /var/cache/conftool/dbconfig/20241105-162048-ladsgroup.json * 16:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70937 and previous config saved to /var/cache/conftool/dbconfig/20241105-161455-ladsgroup.json * 16:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70936 and previous config saved to /var/cache/conftool/dbconfig/20241105-161340-ladsgroup.json * 16:01 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 16:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 15:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70935 and previous config saved to /var/cache/conftool/dbconfig/20241105-155833-ladsgroup.json * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:50 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:48 moritzm: remove ganeti1013 from active ganeti nodes [[phab:T378921|T378921]] * 15:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 15:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70934 and previous config saved to /var/cache/conftool/dbconfig/20241105-154326-ladsgroup.json * 15:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:37 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:32 hashar: Switched PCC workers to Java 17 via https://horizon.wikimedia.org/project/prefixpuppet/?tab=prefix_puppet__puppet-pcc-worker # [[phab:T359795|T359795]] * 15:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70933 and previous config saved to /var/cache/conftool/dbconfig/20241105-152819-ladsgroup.json * 15:27 hashar: Switched deployment-deploy04.deployment-prep.eqiad1.wikimedia.cloud to Java 17 # [[phab:T359795|T359795]] * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70932 and previous config saved to /var/cache/conftool/dbconfig/20241105-152139-ladsgroup.json * 15:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70931 and previous config saved to /var/cache/conftool/dbconfig/20241105-152114-ladsgroup.json * 15:20 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 15:18 hashar: Switched WMCS integration instances from Java 11 to Java 17 via Horizon project wide config. That was forgotten in [[phab:T359795|T359795]] and blocks today Jenkins upgrade ( [[phab:T379059|T379059]] ) * 15:15 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 15:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70929 and previous config saved to /var/cache/conftool/dbconfig/20241105-150607-ladsgroup.json * 15:02 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 15:01 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 15:01 hashar: Upgrading CI Jenkins {{!}} [[phab:T379059|T379059]] * 14:53 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70928 and previous config saved to /var/cache/conftool/dbconfig/20241105-145059-ladsgroup.json * 14:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:48 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 14:44 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 14:44 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 14:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70927 and previous config saved to /var/cache/conftool/dbconfig/20241105-143552-ladsgroup.json * 14:34 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 14:33 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * away: UTC afternoon deploys done * 14:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70926 and previous config saved to /var/cache/conftool/dbconfig/20241105-142959-ladsgroup.json * 14:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 vgutierrez: upload liberica 0.3 to apt.wm.o (bookworm-wikimedia) * 14:28 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] (duration: 17m 24s) * 14:24 tgr@deploy2002: tgr: Continuing with sync * 14:16 tgr@deploy2002: tgr: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:11 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] * 14:10 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 14:10 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 14:09 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:08 moritzm: installing PHP 7.4 security updates on bullseye (as packaged in Debian) * 14:08 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:57 moritzm: installed libapache2-mod-auth-openidc bugfix updates from Bookworm point release * 13:54 arnaudb: reimage pc1017 [[phab:T378068|T378068]] * 13:53 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 13:52 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:52 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:39 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:34 moritzm: imported jenkins 2.479.1 to thirdparty/ci for bullseye-wikimedia [[phab:T379059|T379059]] * 13:29 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:16 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:10 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1042.eqiad.wmnet * 13:10 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox * 13:09 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary * 13:09 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary * 13:08 moritzm: installing php7.4 security updates on remaining non-wikikube servers [[phab:T378173|T378173]] * 13:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1042.eqiad.wmnet * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1041.eqiad.wmnet * 12:50 kharlan@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] (duration: 11m 46s) * 12:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1041.eqiad.wmnet * 12:46 kharlan@deploy2002: kharlan: Continuing with sync * 12:42 kharlan@deploy2002: kharlan: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 12:40 fnegri@cumin1002: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0) * 12:39 kharlan@deploy2002: Started scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:35 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:34 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:34 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:33 urbanecm: eswiki,x1: `delete from growthexperiments_link_recommendations where gelr_page=10598298;` (to verify updates are flowing in; [[phab:T378983|T378983]]) * 12:33 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:33 urbanecm: mwmaint2002: kill all instances of refreshLinkRecommendation ([[phab:T378983|T378983]]) * 12:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 12:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:23 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] (duration: 07m 39s) * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:17 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:16 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] * 12:10 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 07m 43s) * 12:04 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 12:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1040.eqiad.wmnet * 11:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1040.eqiad.wmnet * 11:53 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1042 * 11:53 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70922 and previous config saved to /var/cache/conftool/dbconfig/20241105-115301-ladsgroup.json * 11:52 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1042 * 11:49 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1041 * 11:47 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1041 * 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1040 * 11:46 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1040 * 11:39 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 36m 28s) * 11:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70921 and previous config saved to /var/cache/conftool/dbconfig/20241105-113754-ladsgroup.json * 11:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70920 and previous config saved to /var/cache/conftool/dbconfig/20241105-112246-ladsgroup.json * 11:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70919 and previous config saved to /var/cache/conftool/dbconfig/20241105-110739-ladsgroup.json * 11:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70918 and previous config saved to /var/cache/conftool/dbconfig/20241105-110139-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70917 and previous config saved to /var/cache/conftool/dbconfig/20241105-110115-ladsgroup.json * 10:46 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70916 and previous config saved to /var/cache/conftool/dbconfig/20241105-104608-ladsgroup.json * 10:44 jnuche@deploy2002: install-world aborted: (no justification provided) (duration: 03m 09s) * 10:41 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:41 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:40 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70915 and previous config saved to /var/cache/conftool/dbconfig/20241105-103101-ladsgroup.json * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70914 and previous config saved to /var/cache/conftool/dbconfig/20241105-101553-ladsgroup.json * 10:11 elukey: set proxy timeouts of docker registry's nginx instances from 300s to 180s - [[phab:T378618|T378618]] * 10:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70913 and previous config saved to /var/cache/conftool/dbconfig/20241105-100953-ladsgroup.json * 10:09 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:09 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:07 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1013.eqiad.wmnet with OS bookworm * 10:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 10:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 09:49 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:45 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:33 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 09:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:31 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:22 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:21 _joe_: restarted rsyslog on deploy2002 [[phab:T379044|T379044]] * 08:57 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087373{{!}}Revert "temp accounts: Enable temp account creation on second-round pilots"]] * 08:24 vgutierrez: uploaded ipip-multiqueue-optimizer 0.3+deb12u1 to apt.wm.o (bookworm) * 08:10 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087195{{!}}temp accounts: Enable temp account creation on second-round pilots (T378336)]] * 08:06 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 14593 * 07:55 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 14593 * 07:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 11414 * 07:39 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 11414 * 05:10 mwpresync@deploy2002: Pruned MediaWiki: 1.43.0-wmf.27 (duration: 10m 37s) * 04:03 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 00:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:10 rzl@deploy2002: Finished scap sync-world: {{Gerrit|1085506}} (duration: 02m 50s) * 00:08 rzl@deploy2002: Started scap sync-world: {{Gerrit|1085506}} * 00:04 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2024-11-04 == * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host mc-gp2006 * 23:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host mc-gp2006 * 23:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mc-gp2006.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2005.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2004.codfw.wmnet with OS bookworm * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2005.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2004.codfw.wmnet with OS bookworm * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2006'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2005'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2004'] * 22:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2006'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2005'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2004'] * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 damilare: civicrm upgraded from {{Gerrit|31f5cbdb}} to {{Gerrit|26d8013c}} * 22:22 damilare: SmashPig upgraded from {{Gerrit|be47dddd}} to {{Gerrit|601405dc}} * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:16 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:12 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2003.codfw.wmnet with OS bookworm * 22:00 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70912 and previous config saved to /var/cache/conftool/dbconfig/20241104-220026-ladsgroup.json * 22:00 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2004.codfw.wmnet with OS bookworm * 21:58 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70911 and previous config saved to /var/cache/conftool/dbconfig/20241104-214519-ladsgroup.json * away: UTC late deploys done * 21:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:41 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] (duration: 08m 40s) * 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:36 tgr@deploy2002: tgr, kemayo: Continuing with sync * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:35 tgr@deploy2002: tgr, kemayo: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:32 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] * 21:31 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70910 and previous config saved to /var/cache/conftool/dbconfig/20241104-213012-ladsgroup.json * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2004.codfw.wmnet with OS bookworm * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2003.codfw.wmnet with OS bookworm * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2003'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2003'] * 21:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70909 and previous config saved to /var/cache/conftool/dbconfig/20241104-211505-ladsgroup.json * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70908 and previous config saved to /var/cache/conftool/dbconfig/20241104-210800-ladsgroup.json * 21:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:05 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70907 and previous config saved to /var/cache/conftool/dbconfig/20241104-210224-ladsgroup.json * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:47 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 20:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70906 and previous config saved to /var/cache/conftool/dbconfig/20241104-204717-ladsgroup.json * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1013.eqiad.wmnet * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 eevans@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70905 and previous config saved to /var/cache/conftool/dbconfig/20241104-203210-ladsgroup.json * 20:27 eevans@cumin1002: START - Cookbook sre.dns.netbox * 20:26 swfrench-wmf: zero-replica "migration" releases created for all shellbox instances - [[phab:T375243|T375243]] * 20:23 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply * 20:23 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 20:21 eevans@cumin1002: START - Cookbook sre.hosts.decommission for hosts aqs1013.eqiad.wmnet * 20:21 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 20:21 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply * 20:19 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox: apply * 20:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70904 and previous config saved to /var/cache/conftool/dbconfig/20241104-201703-ladsgroup.json * 20:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70903 and previous config saved to /var/cache/conftool/dbconfig/20241104-200905-ladsgroup.json * 20:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70902 and previous config saved to /var/cache/conftool/dbconfig/20241104-200840-ladsgroup.json * 20:00 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] (duration: 09m 12s) * 19:55 urbanecm@deploy2002: urbanecm: Continuing with sync * 19:54 urbanecm@deploy2002: urbanecm: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70901 and previous config saved to /var/cache/conftool/dbconfig/20241104-195333-ladsgroup.json * 19:51 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] * 19:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70900 and previous config saved to /var/cache/conftool/dbconfig/20241104-193826-ladsgroup.json * 19:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70899 and previous config saved to /var/cache/conftool/dbconfig/20241104-192319-ladsgroup.json * 19:23 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 19:20 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 19:19 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply * 19:17 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox: apply * 19:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70898 and previous config saved to /var/cache/conftool/dbconfig/20241104-191519-ladsgroup.json * 19:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70897 and previous config saved to /var/cache/conftool/dbconfig/20241104-191454-ladsgroup.json * 19:09 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:09 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply * 19:04 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:03 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70896 and previous config saved to /var/cache/conftool/dbconfig/20241104-185947-ladsgroup.json * 18:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply * 18:53 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply * 18:47 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:47 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70895 and previous config saved to /var/cache/conftool/dbconfig/20241104-184440-ladsgroup.json * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70894 and previous config saved to /var/cache/conftool/dbconfig/20241104-182933-ladsgroup.json * 18:25 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70893 and previous config saved to /var/cache/conftool/dbconfig/20241104-182140-ladsgroup.json * 18:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70892 and previous config saved to /var/cache/conftool/dbconfig/20241104-182125-ladsgroup.json * 18:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70891 and previous config saved to /var/cache/conftool/dbconfig/20241104-180618-ladsgroup.json * 18:01 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:56 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70890 and previous config saved to /var/cache/conftool/dbconfig/20241104-175111-ladsgroup.json * 17:43 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 17:43 vgutierrez: upload liberica 0.2 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 17:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 17:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70889 and previous config saved to /var/cache/conftool/dbconfig/20241104-173604-ladsgroup.json * 17:35 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host lvs1013.eqiad.wmnet * 17:35 vgutierrez@cumin1002: START - Cookbook sre.puppet.migrate-host for host lvs1013.eqiad.wmnet * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70888 and previous config saved to /var/cache/conftool/dbconfig/20241104-172638-ladsgroup.json * 17:26 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70887 and previous config saved to /var/cache/conftool/dbconfig/20241104-172612-ladsgroup.json * 17:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:20 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70886 and previous config saved to /var/cache/conftool/dbconfig/20241104-171105-ladsgroup.json * 17:07 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 17:06 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 16:59 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 16:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70885 and previous config saved to /var/cache/conftool/dbconfig/20241104-165558-ladsgroup.json * 16:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70883 and previous config saved to /var/cache/conftool/dbconfig/20241104-164051-ladsgroup.json * 16:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70882 and previous config saved to /var/cache/conftool/dbconfig/20241104-163129-ladsgroup.json * 16:31 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70881 and previous config saved to /var/cache/conftool/dbconfig/20241104-163104-ladsgroup.json * 16:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:21 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70880 and previous config saved to /var/cache/conftool/dbconfig/20241104-161557-ladsgroup.json * 16:15 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:01 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70879 and previous config saved to /var/cache/conftool/dbconfig/20241104-160050-ladsgroup.json * 16:00 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 16:00 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 15:58 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:54 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:51 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:47 pt1979@cumin2002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 15:46 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70878 and previous config saved to /var/cache/conftool/dbconfig/20241104-154543-ladsgroup.json * 15:40 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 15:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70877 and previous config saved to /var/cache/conftool/dbconfig/20241104-153613-ladsgroup.json * 15:36 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 vgutierrez: upload liberica 0.1 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 15:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70876 and previous config saved to /var/cache/conftool/dbconfig/20241104-153548-ladsgroup.json * 15:29 sukhe: running authdns-update to move CN traffic to eqsin from ulsfo: [[phab:T378744|T378744]] * 15:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70874 and previous config saved to /var/cache/conftool/dbconfig/20241104-152041-ladsgroup.json * 15:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70873 and previous config saved to /var/cache/conftool/dbconfig/20241104-150534-ladsgroup.json * 14:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70872 and previous config saved to /var/cache/conftool/dbconfig/20241104-145027-ladsgroup.json * 14:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70871 and previous config saved to /var/cache/conftool/dbconfig/20241104-144101-ladsgroup.json * 14:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70870 and previous config saved to /var/cache/conftool/dbconfig/20241104-144037-ladsgroup.json * 14:38 Lucas_WMDE: UTC afternoon backport+config window done * 14:36 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] (duration: 23m 39s) * 14:28 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Continuing with sync * 14:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70869 and previous config saved to /var/cache/conftool/dbconfig/20241104-142530-ladsgroup.json * 14:24 moritzm: uploaded php7.4 7.4.33-1+0~20221108.73+debian10~1.gbpa00350a+wmf10u2+icu67u3 to component/icu67 (backports of latest security fixes to our PHP 7.4 build) * 14:23 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] * 14:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70868 and previous config saved to /var/cache/conftool/dbconfig/20241104-141023-ladsgroup.json * 13:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70867 and previous config saved to /var/cache/conftool/dbconfig/20241104-135516-ladsgroup.json * 13:51 marostegui: Start schema change on redacteddb1001:s8 [[phab:T367856|T367856]] (this will make replication in s8 lag for around 2-3 days) * 13:50 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:50 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70866 and previous config saved to /var/cache/conftool/dbconfig/20241104-134605-ladsgroup.json * 13:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70865 and previous config saved to /var/cache/conftool/dbconfig/20241104-134021-ladsgroup.json * 13:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70864 and previous config saved to /var/cache/conftool/dbconfig/20241104-132513-ladsgroup.json * 13:24 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:11 Dreamy_Jazz: Started slow MediaModeration scan for commonswiki to be scanning as close to upload as possible - https://wikitech.wikimedia.org/wiki/MediaModeration * 13:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70862 and previous config saved to /var/cache/conftool/dbconfig/20241104-131006-ladsgroup.json * 13:06 Dreamy_Jazz: Started MediaModeration scan on all wikis other than s4 (commonswiki + testcommonswiki) - https://wikitech.wikimedia.org/wiki/MediaModeration * 12:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70861 and previous config saved to /var/cache/conftool/dbconfig/20241104-125459-ladsgroup.json * 12:49 XioNoX: deploy "Add temporary LVS community for liberica test" - [[phab:T378453|T378453]] * 12:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70860 and previous config saved to /var/cache/conftool/dbconfig/20241104-124533-ladsgroup.json * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:35 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:34 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 12:24 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 12:20 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 12:11 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:11 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:10 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 12:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 12:08 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 11:58 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70859 and previous config saved to /var/cache/conftool/dbconfig/20241104-115514-ladsgroup.json * 11:45 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:44 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70858 and previous config saved to /var/cache/conftool/dbconfig/20241104-114008-ladsgroup.json * 11:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70857 and previous config saved to /var/cache/conftool/dbconfig/20241104-112501-ladsgroup.json * 11:22 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:12 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70856 and previous config saved to /var/cache/conftool/dbconfig/20241104-110953-ladsgroup.json * 11:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70855 and previous config saved to /var/cache/conftool/dbconfig/20241104-110141-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70854 and previous config saved to /var/cache/conftool/dbconfig/20241104-110113-ladsgroup.json * 10:54 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:52 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:48 XioNoX: eqiad: Prefer Lumen to reach ATT - [[phab:T377844|T377844]] * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70853 and previous config saved to /var/cache/conftool/dbconfig/20241104-104606-ladsgroup.json * 10:42 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 moritzm: installing libtool updates from Bookworm point release * 10:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:31 moritzm: installing libseccomp updates from Bookworm point release * 10:31 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70852 and previous config saved to /var/cache/conftool/dbconfig/20241104-103059-ladsgroup.json * 10:20 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:17 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70851 and previous config saved to /var/cache/conftool/dbconfig/20241104-101552-ladsgroup.json * 10:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70850 and previous config saved to /var/cache/conftool/dbconfig/20241104-100813-ladsgroup.json * 10:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:06 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:02 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 10:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 09:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:56 volans: deploying spicerack v8.15.2 to cumin[12]002 * 09:55 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:42 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:37 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 08:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 08:57 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:51 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2014.codfw.wmnet * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:22 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:21 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:21 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:16 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:15 XioNoX: push Drop labtestwikitech return traffic term to eqiad routers - CR1083589 * 08:12 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2014.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2013.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:09 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 08:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 08:03 jmm@cumin2002: START - Cookbook sre.dns.netbox * 07:59 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2013.codfw.wmnet == 2024-11-02 == * 15:48 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] (duration: 12m 09s) * 15:44 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Continuing with sync * 15:38 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:36 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] * 15:26 reedy@deploy2002: Finished scap sync-world: use statemnts (duration: 07m 13s) * 15:19 reedy@deploy2002: Started scap sync-world: use statemnts * 15:13 reedy@deploy2002: Synchronized wmf-config/: Comment updates (duration: 07m 31s) == 2024-11-01 == * 20:27 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1016.eqiad.wmnet with OS bullseye * 19:47 inflatador: bking@an-presto[1016:1020].eqiad.wmnet temporarily install perccli to check disk status without requiring reboot [[phab:T374924|T374924]] * 19:34 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:31 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:16 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1016.eqiad.wmnet with OS bullseye * 19:12 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 19:07 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 19:02 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 18:56 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:51 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:47 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:44 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:43 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:42 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:39 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:38 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:33 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:32 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:25 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:19 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:11 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:10 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:09 bking@cumin2002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:07 dancy@deploy2002: Installation of scap version "4.120.0" completed for 1 hosts * 18:07 bking@cumin2002: START - Cookbook sre.puppet.renew-cert for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:06 dancy@deploy2002: Installing scap version "4.120.0" for 1 hosts * 18:04 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 17:00 Dreamy_Jazz: Ran `/usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/all.dblist extensions/WikimediaEvents/maintenance/UpdatePeriodicMetrics.php --verbose` * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:18 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 16:17 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:17 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:16 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:16 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:05 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 16:05 thcipriani@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] (duration: 07m 46s) * 16:00 thcipriani@deploy2002: thcipriani: Continuing with sync * 16:00 thcipriani@deploy2002: thcipriani: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:57 thcipriani@deploy2002: Started scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] * 15:55 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 15:55 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 15:19 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2003.codfw.wmnet * 15:05 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2003.codfw.wmnet * 14:54 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:40 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 14:29 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:27 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host an-presto1020.eqiad.wmnet with OS bookworm * 14:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2190 gradually with 4 steps - Maint over * 13:55 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bookworm * 13:43 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:38 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:33 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool db2190 gradually with 4 steps - Maint over * 12:43 cmooney@cumin1002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:42 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1025.eqiad.wmnet * 12:28 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti1025.eqiad.wmnet * 12:28 topranks: rebooting ganeti1025 as VMs are unresponsive and will not shutdown or move * 10:38 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . * off: sudo cumin -b4 "A:cp and A:magru" "run-puppet-agent" to pick up CR {{Gerrit|1085569}} * 02:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70840 and previous config saved to /var/cache/conftool/dbconfig/20241101-022447-ladsgroup.json * 02:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70839 and previous config saved to /var/cache/conftool/dbconfig/20241101-020940-ladsgroup.json * 01:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1019.eqiad.wmnet with OS bullseye * 01:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70838 and previous config saved to /var/cache/conftool/dbconfig/20241101-015433-ladsgroup.json * 01:42 urandom: Decommissioning Cassandra/aqs1013-<nowiki>{</nowiki>a,b<nowiki>}</nowiki> — [[phab:T378725|T378725]] * 01:41 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:40 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70837 and previous config saved to /var/cache/conftool/dbconfig/20241101-013926-ladsgroup.json * 01:39 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs1022.eqiad.wmnet * 01:39 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for aqs1022.eqiad.wmnet * 01:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70836 and previous config saved to /var/cache/conftool/dbconfig/20241101-013102-ladsgroup.json * 01:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70835 and previous config saved to /var/cache/conftool/dbconfig/20241101-013035-ladsgroup.json * 01:25 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70834 and previous config saved to /var/cache/conftool/dbconfig/20241101-011528-ladsgroup.json * 01:07 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1019.eqiad.wmnet with OS bullseye * 01:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70833 and previous config saved to /var/cache/conftool/dbconfig/20241101-010021-ladsgroup.json * 00:54 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:54 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70832 and previous config saved to /var/cache/conftool/dbconfig/20241101-004514-ladsgroup.json * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70831 and previous config saved to /var/cache/conftool/dbconfig/20241101-003546-ladsgroup.json * 00:35 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70830 and previous config saved to /var/cache/conftool/dbconfig/20241101-003520-ladsgroup.json * 00:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70829 and previous config saved to /var/cache/conftool/dbconfig/20241101-002013-ladsgroup.json * 00:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70828 and previous config saved to /var/cache/conftool/dbconfig/20241101-000506-ladsgroup.json ==Archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> 6e7ot2afgywk51o42wnf0ec99mkngdk 2249715 2249709 2024-12-01T00:28:57Z JrandWP 37706 november 2024 archive 2249715 wikitext text/x-wiki ==Archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> hitbpvy0lmd99jwko64glc2u2ilgz27 2249721 2249715 2024-12-01T06:18:42Z Stashbot 7414 marostegui@cumin2002: dbctl commit (dc=all): 'Depoll db1233', diff saved to https://phabricator.wikimedia.org/P71450 and previous config saved to /var/cache/conftool/dbconfig/20241201-061841-marostegui.json 2249721 wikitext text/x-wiki == 2024-12-01 == * 06:18 marostegui@cumin2002: dbctl commit (dc=all): 'Depoll db1233', diff saved to https://phabricator.wikimedia.org/P71450 and previous config saved to /var/cache/conftool/dbconfig/20241201-061841-marostegui.json ==Archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> jyzsgfci4j4ms1w5rxqrwv9gblu3d50 2249730 2249721 2024-12-01T10:44:43Z Stashbot 7414 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depool to reclone (T381213)', diff saved to https://phabricator.wikimedia.org/P71451 and previous config saved to /var/cache/conftool/dbconfig/20241201-104441-ladsgroup.json 2249730 wikitext text/x-wiki == 2024-12-01 == * 10:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depool to reclone ([[phab:T381213|T381213]])', diff saved to https://phabricator.wikimedia.org/P71451 and previous config saved to /var/cache/conftool/dbconfig/20241201-104441-ladsgroup.json * 06:18 marostegui@cumin2002: dbctl commit (dc=all): 'Depoll db1233', diff saved to https://phabricator.wikimedia.org/P71450 and previous config saved to /var/cache/conftool/dbconfig/20241201-061841-marostegui.json ==Archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> fj5pfr466ydp7hrctrp24b5jtjz6vfu 2249731 2249730 2024-12-01T10:45:43Z Stashbot 7414 ladsgroup@cumin1002: START - Cookbook sre.mysql.clone of db1156.eqiad.wmnet onto db1233.eqiad.wmnet 2249731 wikitext text/x-wiki == 2024-12-01 == * 10:45 ladsgroup@cumin1002: START - Cookbook sre.mysql.clone of db1156.eqiad.wmnet onto db1233.eqiad.wmnet * 10:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depool to reclone ([[phab:T381213|T381213]])', diff saved to https://phabricator.wikimedia.org/P71451 and previous config saved to /var/cache/conftool/dbconfig/20241201-104441-ladsgroup.json * 06:18 marostegui@cumin2002: dbctl commit (dc=all): 'Depoll db1233', diff saved to https://phabricator.wikimedia.org/P71450 and previous config saved to /var/cache/conftool/dbconfig/20241201-061841-marostegui.json ==Archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> 3wh3bfao87a327k7sscwscuetxac8xk Nova Resource:Tools/Documentation/Admin/Archive 498 21489 2249729 1765483 2024-12-01T08:32:06Z MABot 10804 Bot: Fixing double redirect from [[Portal:Toolforge/Admin/Archive]] to [[Obsolete:Portal:Toolforge/Admin/Archive]] 2249729 wikitext text/x-wiki #REDIRECT [[Obsolete:Portal:Toolforge/Admin/Archive]] 77hpni96aitnwrdg3ohsf7jmejsifqo HTTPS/Browser Recommendations 0 221541 2249711 2240270 2024-11-30T19:00:14Z Tacsipacsi 7034 Chrome has dropped support for Win7 and Win8.1 in February 2023 (https://support.google.com/chrome/a/answer/7100626), I guess Opera had no other choice than following Chrome, so the only browser still supporting these OSes is Firefox ESR 115, but even that one can go unsupported within months. 2249711 wikitext text/x-wiki <div style="float: right;clear: right;width: auto;background: none;padding: .5em 0 .8em 1.4em;margin-bottom: .5em; ">__TOC__</div> Wikimedia encourages its readers to use modern [[:en:web browsers|web browsers]] which support secure internet connections. Below are recommendations for how to update to a modern web browser. Many older computers, mobile devices or web browsers only support outdated cryptographic methods that are becoming insecure in the face of modern attacks. Wikimedia will no longer support these outdated cryptographic methods to ensure security against eavesdropping and interferance ([[:en:man-in-the-middle attack|man-in-the-middle attack]]s or [[w:en:Downgrade attack|downgrade attacks]]). Many other sites on the Internet also require (or will soon require) a strong minimum levels of cryptographic abilities from your computer or mobile device. Keeping up-to-date with security updates from web browsers and operating systems will be essential for staying secure and continuing full access to all websites on the Internet. == Advice == === For all users === * Please make sure you have applied the latest security updates to your operating system and have updated your web browser. Remember that for most browsers and devices, they will only be updated after you fully close them and restart them. * Disable or uninstall any 3rd party "anti-virus" software. Most of them do more harm than good when they interfere with your browser's secure connections.<ref>http://robert.ocallahan.org/2017/01/disable-your-antivirus-software-except.html</ref><ref>https://jhalderm.com/pub/papers/interception-ndss17.pdf</ref> === For users of Microsoft Windows === ;Windows XP :* If you must use Windows XP, [https://ftp.mozilla.org/pub/firefox/releases/52.0/ install and use Firefox 52 ESR instead of Internet Explorer] to access our sites. <ref group="n">Our sites no longer allow pageviews from IE-on-XP at all, other than a few minor exceptions like Wikitech itself, the site you're reading now.</ref> :** Note that while this release is the latest available to Windows XP it is not maintained and will contain security bugs. :* You should upgrade to Windows 10 or Windows 11. Windows XP has very serious security flaws.{{#tag:ref|Microsoft ended all technical support for this system version in 2014.<ref name="windows-lifecycle-fact-sheet">https://support.microsoft.com/en-us/help/13853/windows-lifecycle-fact-sheet</ref> Microsoft provides no more security updates for the many flaws which have been discovered in Windows XP and its version of Internet Explorer after 2014. More-detailed technical information about removal of IE-on-XP support from our projects by 2017-10-17 is available at [[HTTPS/3DES Deprecation]]|group="n"}} ;Windows Vista :* If you must use Windows Vista, [https://ftp.mozilla.org/pub/firefox/releases/52.0/ install and use Firefox 52 ESR instead of Internet Explorer] to access our sites. :** Note that while this release is the latest available to Windows Vista it is not maintained and will contain security bugs. :* You should upgrade to Windows 10 or Windows 11. Windows Vista has very serious security flaws.<ref group="n">Microsoft no longer supports Vista, and does not provide security updates since April 2017.</ref> ;Windows 7 :* If you must use Windows 7, [https://www.mozilla.org/en-US/firefox/new/ install and use Firefox 115 ESR] instead of Chrome, Internet Explorer or Edge. :** Note that Firefox 115 ESR is the last version that supports Windows 7 and 8.1. It will receive security updates until at least March 2025.<ref name="firefox-115">[https://support.mozilla.org/en-US/kb/firefox-users-windows-7-8-and-81-moving-extended-support Firefox users on Windows 7, 8 and 8.1 moving to Extended Support Release | Firefox Help]</ref> :* If you must use the unsupported Internet Explorer 11 on Windows 7, you're able to do so, but you might need to open Settings and click the checkbox to "Enable TLS 1.2" under <em>Internet Options -> Advanced -> (Security section)</em> :* You should upgrade to Windows 10 or Windows 11.<ref group="n">Microsoft no longer supports Windows 7 (including Internet Explorer on Win7), and does not provide security updates since January 2020.</ref> ;Windows 8.1 :* If you must use Windows 8.1, [https://www.mozilla.org/en-US/firefox/new/ install and use Firefox 115 ESR] instead of Chrome, Internet Explorer or Edge. :** Note that Firefox 115 ESR is the last version that supports Windows 7 and 8.1. It will receive security updates until at least March 2025.<ref name="firefox-115" /> :* You should upgrade to Windows 10 or Windows 11.<ref group="n">Microsoft no longer supports Windows 8.1 (including Internet Explorer on Win8.1), and does not provide security updates since January 2023.</ref> ;Windows 10 or Windows 11 :* You should upgrade the '''[https://www.microsoft.com/en-us/edge Microsoft Edge]''' browser or switch to a different browser such as [https://www.mozilla.org/en-US/firefox/new/ Firefox], [https://www.google.com/chrome/browser/desktop/ Chrome], or [https://www.opera.com/ Opera]. :* Please also ensure you stay up-to-date with security updates from Windows Update, and ensure you regularly upgrade your browser if applicable. === For users of Apple macOS === Upgrade your operating system to macOS 10.12.1 (Sierra) or higher [https://support.apple.com/kb/SP742?viewlocale=en_US&locale=en_US if your hardware supports it]. If that is not possible, upgrade to the latest macOS release available for your computer, and consider installing an alternate secure browser instead of Safari. Such as [https://www.google.com/chrome/browser/desktop/ Chrome], [https://www.mozilla.org/en-US/firefox/new/ Firefox], or [https://www.opera.com/ Opera]. === For users of Apple iPhone, iPad, and iPod === Upgrade to iOS version 10 (or higher) [[:en:IOS 10#Supported devices|if supported on your device]]. If your device is too old for iOS 10, consider a device upgrade. Check to ensure you have the latest version of whatever browser you may use in the App Store. === For users of Android devices === Upgrade to the latest version of Android that is possible for your device. Consider a device upgrade if your Android software cannot be upgraded to at least [[:en:Android KitKat|version 4.4]], which was initially released by Google in 2013. Check the Play Store (or vendor-specific app store) to ensure you've installed the latest updates to core components and the browser (usually Chrome). === For IT personnel that manage outbound Proxy appliances === Please ensure you are running the latest stable software release from your vendor, and that you keep up with this regularly. Please also consult your vendor and/or their documentation as to how you may need to configure your outbound proxy to support stronger encryption/ciphers. See [[HTTPS]] for technical requirements. Logs for Wikipedia have indicated that there are many requests from corporate desktop browsers that meet the version requirements of operating system, web browser, and device - but still suffer from downgraded cipher choice when communicating over the Internet due to outdated or poorly configured outbound proxies. You may use an online tester to check which ciphers are supported by the browser you are currently using, such as the one provided by [https://www.ssllabs.com/ssltest/viewMyClient.html Qualys (SSL Labs)]. == Notes == {{Reflist|group=n}} == References == {{Reflist}} == See also == * [[mw:Compatibility#Browsers]] [[Category:TLS]] k18mggwbc0iup920ed45oaro0ld278b Nova Resource:Tools.yifeibot/SAL 498 293634 2249712 2217762 2024-11-30T21:40:07Z Stashbot 7414 wmbot~lucaswerkmeister@tools-bastion-13: kubectl rollout restart deployment flr # after reports of FlickreviewR 2 not working on IRC and COM:AN 2249712 wikitext text/x-wiki === 2024-11-30 === * 21:40 wmbot~lucaswerkmeister@tools-bastion-13: kubectl rollout restart deployment flr # after reports of FlickreviewR 2 not working on IRC and COM:AN === 2024-08-26 === * 14:17 wmbot~lucaswerkmeister@tools-bastion-13: kubectl rollout restart deployment flr # after reports of FlickreviewR 2 not working on IRC [originally logged 14:11 UTC but stashbot was gone] === 2024-08-18 === * 15:56 wmbot~lucaswerkmeister@tools-bastion-13: kubectl rollout restart deployment flr # after reports of FlickreviewR 2 not working on IRC === 2024-05-31 === * 15:40 wmbot~bd808@tools-bastion-12: `kubectl delete pod flr-6d74b958d9-bgkdw` after reports of FlickreviewR 2 not working on IRC === 2024-04-04 === * 22:03 wmbot~bd808@tools-sgebastion-10: `kubectl delete pod flr-6d74b958d9-4ztff` after reports of FlickreviewR 2 not working on IRC === 2024-03-13 === * 03:06 wmbot~bd808@tools-sgebastion-11: 'kubectl delete pod flr-6d74b958d9-pc6p9' after reports of FlickreviewR 2 not working on IRC === 2024-03-11 === * 15:47 wmbot~bd808@tools-sgebastion-10: 'kubectl delete pod flr-6d74b958d9-w7fhc' after reports of FlickreviewR 2 not working on IRC === 2024-02-18 === * 15:44 wmbot~taavi@tools-sgebastion-11: 'kubectl delete pod flr-6d74b958d9-b28dz' after reports of FlickreviewR 2 not working on IRC === 2023-02-10 === * 10:59 taavi: bump quotas per request in [[phab:T329350|T329350]] === 2022-06-04 === * 18:20 wm-bot: <multichill> Fixed Flickr bot by casting license[id] to string in /data/project/yifeibot/o/toolserver/bryan/flickr/shared/flickr.py === 2020-02-28 === * 19:10 wm-bot: <root> Migrated to 2020 Kubernetes cluster === 2016-11-30 === * 22:47 bd808: Deleted 2 jobs running on tools-exec-1210 for many hours/days ([[phab:T151980|T151980]]) <noinclude>[[Category:SAL]]</noinclude> 7o16chffyhk8ye9ku92h9zhv7npvyex Nova Resource:Tools/Admin/Archive 498 440938 2249728 1765472 2024-12-01T08:31:56Z MABot 10804 Bot: Fixing double redirect from [[Portal:Toolforge/Admin/Archive]] to [[Obsolete:Portal:Toolforge/Admin/Archive]] 2249728 wikitext text/x-wiki #REDIRECT [[Obsolete:Portal:Toolforge/Admin/Archive]] 77hpni96aitnwrdg3ohsf7jmejsifqo Portal:Tool Labs/Admin/Archive 0 441245 2249723 1764982 2024-12-01T08:30:26Z MABot 10804 Bot: Fixing double redirect from [[Portal:Toolforge/Admin/Archive]] to [[Obsolete:Portal:Toolforge/Admin/Archive]] 2249723 wikitext text/x-wiki #REDIRECT [[Obsolete:Portal:Toolforge/Admin/Archive]] 77hpni96aitnwrdg3ohsf7jmejsifqo Talk:Tools Precise deprecation 1 442232 2249724 1786662 2024-12-01T08:30:36Z MABot 10804 Bot: Fixing double redirect from [[Talk:News/Tools Precise deprecation]] to [[Talk:News/2017 Tools Precise deprecation]] 2249724 wikitext text/x-wiki #REDIRECT [[Talk:News/2017 Tools Precise deprecation]] 6j2ig3hxsmye267irfuvvppng47vrto Help:Version Control in Toolforge 12 444333 2249726 1836936 2024-12-01T08:31:36Z MABot 10804 Bot: Fixing double redirect from [[Help:Toolforge/Version Control in Toolforge]] to [[Help:Toolforge/Version control]] 2249726 wikitext text/x-wiki #REDIRECT [[Help:Toolforge/Version control]] qx12w5irj12n2rvq7qcp4us14sz6r2w Help talk:Version Control in Toolforge 13 444334 2249727 1836938 2024-12-01T08:31:46Z MABot 10804 Bot: Fixing double redirect from [[Help talk:Toolforge/Version Control in Toolforge]] to [[Help talk:Toolforge/Version control]] 2249727 wikitext text/x-wiki #REDIRECT [[Help talk:Toolforge/Version control]] nj34if01kphmk0eq1uy7tcb4lvjjs0z Map of database maintenance 0 449160 2249713 2249655 2024-12-01T00:00:46Z Dexbot 30554 Bot: Updating the report 2249713 wikitext text/x-wiki {{/Header}} == Today (2024-12-01) == == Yesterday (2024-11-30) == == Last seven days == {| class="wikitable" |+ eqiad |- ! Section !! Work |- | s6 || [[phab:T376905|Login (T376905)]] (ladsgroup) |- | s7 || [[phab:T370903|Remove cuc_actiontext, cuc_only_for_read_old, and cuc_private from cu_changes on WMF wikis (T370903)]] (ladsgroup) |- | s8 || * [[phab:T361627|Create cuc_agent_id, cule_agent_id and cupe_agent_id columns in cu_changes, cu_log_event and cu_private_event tables respectively on WMF wikis (T361627)]] (ladsgroup) * [[phab:T370903|Remove cuc_actiontext, cuc_only_for_read_old, and cuc_private from cu_changes on WMF wikis (T370903)]] (ladsgroup) |- | x1 || [[phab:T380449|Optimize two echo tables in x1 (T380449)]] (ladsgroup) |- |} {| class="wikitable" |+ codfw |- ! Section !! Work |- | es1 || [[phab:T376905|Login (T376905)]] (ladsgroup) |- | es2 || [[phab:T376905|Login (T376905)]] (ladsgroup) |- | es3 || [[phab:T376905|Login (T376905)]] (ladsgroup) |- | s4 || * [[phab:T367781|Drop deprecated abuse filter fields on wmf wikis (T367781)]] (arnaudb) * [[phab:T379813|Wikimedia\Rdbms\DBQueryError: Error 1034: Index for table &#039;wbc_entity_usage&#039; is corrupt; try to repair itFunction: Wikibase\Client\Usage\Sql\EntityUsageTable::queryUsagesQuery: SELECT eu_aspect,eu_entity_id FROM `wbc_entity (T379813)]] (ladsgroup) |- | s6 || [[phab:T328817|Drop cuc_user and cuc_user_text from cu_changes in wmf wikis (T328817)]] (ladsgroup) |- | s8 || [[phab:T370903|Remove cuc_actiontext, cuc_only_for_read_old, and cuc_private from cu_changes on WMF wikis (T370903)]] (ladsgroup) |- | x1 || [[phab:T380449|Optimize two echo tables in x1 (T380449)]] (ladsgroup) |- |} [[Category:MariaDB]] gfx1jxg9irh87zjjxgbd4hccrpebdex 2249732 2249713 2024-12-01T10:47:36Z Dexbot 30554 Bot: Updating the report 2249732 wikitext text/x-wiki {{/Header}} == Today (2024-12-01) == {| class="wikitable" |+ eqiad |- ! Section !! Work |- | s2 || [[phab:T381213|db1233 ptwiki.page_props index corrupted (T381213)]] (ladsgroup) |- |} == Yesterday (2024-11-30) == == Last seven days == {| class="wikitable" |+ eqiad |- ! Section !! Work |- | s2 || [[phab:T381213|db1233 ptwiki.page_props index corrupted (T381213)]] (ladsgroup) |- | s6 || [[phab:T376905|Login (T376905)]] (ladsgroup) |- | s7 || [[phab:T370903|Remove cuc_actiontext, cuc_only_for_read_old, and cuc_private from cu_changes on WMF wikis (T370903)]] (ladsgroup) |- | s8 || * [[phab:T361627|Create cuc_agent_id, cule_agent_id and cupe_agent_id columns in cu_changes, cu_log_event and cu_private_event tables respectively on WMF wikis (T361627)]] (ladsgroup) * [[phab:T370903|Remove cuc_actiontext, cuc_only_for_read_old, and cuc_private from cu_changes on WMF wikis (T370903)]] (ladsgroup) |- | x1 || [[phab:T380449|Optimize two echo tables in x1 (T380449)]] (ladsgroup) |- |} {| class="wikitable" |+ codfw |- ! Section !! Work |- | es1 || [[phab:T376905|Login (T376905)]] (ladsgroup) |- | es2 || [[phab:T376905|Login (T376905)]] (ladsgroup) |- | es3 || [[phab:T376905|Login (T376905)]] (ladsgroup) |- | s4 || * [[phab:T367781|Drop deprecated abuse filter fields on wmf wikis (T367781)]] (arnaudb) * [[phab:T379813|Wikimedia\Rdbms\DBQueryError: Error 1034: Index for table &#039;wbc_entity_usage&#039; is corrupt; try to repair itFunction: Wikibase\Client\Usage\Sql\EntityUsageTable::queryUsagesQuery: SELECT eu_aspect,eu_entity_id FROM `wbc_entity (T379813)]] (ladsgroup) |- | s6 || [[phab:T328817|Drop cuc_user and cuc_user_text from cu_changes in wmf wikis (T328817)]] (ladsgroup) |- | s8 || [[phab:T370903|Remove cuc_actiontext, cuc_only_for_read_old, and cuc_private from cu_changes on WMF wikis (T370903)]] (ladsgroup) |- | x1 || [[phab:T380449|Optimize two echo tables in x1 (T380449)]] (ladsgroup) |- |} [[Category:MariaDB]] ga05ci25gcsl130tsw3ct4c5w87fgls Help:Toolforge/Git 12 451282 2249725 2015968 2024-12-01T08:31:26Z MABot 10804 Bot: Fixing double redirect from [[Help:Toolforge/Version Control in Toolforge]] to [[Help:Toolforge/Version control]] 2249725 wikitext text/x-wiki #REDIRECT [[Help:Toolforge/Version control]] qx12w5irj12n2rvq7qcp4us14sz6r2w Tool:Gitlab-account-approval/Log 116 453906 2249719 2246441 2024-12-01T02:39:07Z Gitlabaccountapprovalbot 37332 @kingchristlike1 was approved. 2249719 wikitext text/x-wiki <noinclude>'''Audit log of approvals''' made by [[gitlab:gitlabaccountapprovalbot|@gitlabaccountapprovalbot]]. __NOTOC__</noinclude> === 2024-12-01 === * 02:39 [[gitlab:kingchristlike1|@kingchristlike1]] was approved. === 2024-11-21 === * 13:45 [[gitlab:sascha|@sascha]] was approved. === 2024-11-19 === * 16:36 [[gitlab:jly|@jly]] was approved. === 2024-11-15 === * 02:54 [[gitlab:danielyepezgarces|@danielyepezgarces]] was approved. === 2024-11-14 === * 14:15 [[gitlab:stimoroll|@stimoroll]] was approved. === 2024-11-09 === * 17:15 [[gitlab:f4udeveloper|@f4udeveloper]] was approved. === 2024-11-07 === * 19:15 [[gitlab:zulf|@zulf]] was approved. * 05:33 [[gitlab:hassanamin|@hassanamin]] was approved. === 2024-11-06 === * 19:39 [[gitlab:daniuu|@daniuu]] was approved. * 00:18 [[gitlab:rlopez-wmf|@rlopez-wmf]] was approved. === 2024-10-09 === * 14:45 [[gitlab:jtweed|@jtweed]] was approved. * 10:24 [[gitlab:ifrahkh|@ifrahkh]] was approved. * 09:06 [[gitlab:wikibayer|@wikibayer]] was approved. === 2024-10-06 === * 10:27 [[gitlab:keerthan16|@keerthan16]] was approved. === 2024-10-04 === * 07:45 [[gitlab:hakimi97|@hakimi97]] was approved. === 2024-09-30 === * 07:39 [[gitlab:ninjastrikers|@ninjastrikers]] was approved. === 2024-09-28 === * 17:30 [[gitlab:webrunner95|@webrunner95]] was approved. === 2024-09-18 === * 21:39 [[gitlab:elliottetzkorn|@elliottetzkorn]] was approved. === 2024-09-14 === * 22:06 [[gitlab:humptydumpty|@humptydumpty]] was approved. === 2024-09-06 === * 08:48 [[gitlab:mickabarber|@mickabarber]] was approved. === 2024-08-27 === * 17:36 [[gitlab:edgars|@edgars]] was approved. === 2024-08-22 === * 09:18 [[gitlab:antonkokhwmde|@antonkokhwmde]] was approved. === 2024-08-14 === * 19:21 [[gitlab:jfk|@jfk]] was approved. === 2024-08-13 === * 17:57 [[gitlab:daxserver|@daxserver]] was approved. === 2024-08-11 === * 09:57 [[gitlab:pauliesnug|@pauliesnug]] was approved. === 2024-08-10 === * 08:42 [[gitlab:ashig|@ashig]] was approved. === 2024-08-09 === * 14:09 [[gitlab:masssly|@masssly]] was approved. === 2024-08-05 === * 22:15 [[gitlab:mrtortue|@mrtortue]] was approved. === 2024-08-02 === * 16:21 [[gitlab:dsantini|@dsantini]] was approved. === 2024-07-31 === * 11:54 [[gitlab:cptviraj|@cptviraj]] was approved. === 2024-07-30 === * 19:09 [[gitlab:iniquity|@iniquity]] was approved. * 10:00 [[gitlab:collins|@collins]] was approved. === 2024-07-27 === * 15:57 [[gitlab:songnguxyz|@songnguxyz]] was approved. === 2024-07-25 === * 12:36 [[gitlab:mszabo|@mszabo]] was approved. * 09:21 [[gitlab:agarwalmahima|@agarwalmahima]] was approved. === 2024-07-24 === * 08:05 [[gitlab:dragoniez|@dragoniez]] was approved. === 2024-07-23 === * 06:54 [[gitlab:mirji|@mirji]] was approved. === 2024-07-16 === * 10:00 [[gitlab:lakejason0|@lakejason0]] was approved. === 2024-07-12 === * 11:33 [[gitlab:cn|@cn]] was approved. * 08:12 [[gitlab:unchampignon|@unchampignon]] was approved. === 2024-07-07 === * 17:12 [[gitlab:agamyasamuel|@agamyasamuel]] was approved. * 05:24 [[gitlab:kuldeepburjbhalaike|@kuldeepburjbhalaike]] was approved. === 2024-07-06 === * 11:18 [[gitlab:dibya|@dibya]] was approved. * 04:54 [[gitlab:sarthakparashar|@sarthakparashar]] was approved. === 2024-07-05 === * 18:15 [[gitlab:vanshikarathi|@vanshikarathi]] was approved. === 2024-07-02 === * 19:00 [[gitlab:ebrahim|@ebrahim]] was approved. === 2024-07-01 === * 20:12 [[gitlab:rockingpenny4|@rockingpenny4]] was approved. * 18:15 [[gitlab:balajijagadesh|@balajijagadesh]] was approved. === 2024-06-30 === * 18:24 [[gitlab:hrideshmg|@hrideshmg]] was approved. * 07:18 [[gitlab:chanakyakumardas|@chanakyakumardas]] was approved. * 06:30 [[gitlab:rihaan180|@rihaan180]] was approved. === 2024-06-27 === * 17:36 [[gitlab:driedmueller|@driedmueller]] was approved. === 2024-06-19 === * 12:57 [[gitlab:audreypenven|@audreypenven]] was approved. === 2024-06-16 === * 01:18 [[gitlab:roysmith|@roysmith]] was approved. === 2024-06-08 === * 02:45 [[gitlab:jleedev|@jleedev]] was approved. === 2024-06-03 === * 13:57 [[gitlab:afeder|@afeder]] was approved. === 2024-06-01 === * 10:54 [[gitlab:florianschmitt|@florianschmitt]] was approved. === 2024-05-30 === * 16:42 [[gitlab:krlsca|@krlsca]] was approved. === 2024-05-28 === * 11:24 [[gitlab:rickijay|@rickijay]] was approved. === 2024-05-26 === * 11:18 [[gitlab:ranjithsiji|@ranjithsiji]] was approved. === 2024-05-25 === * 07:24 [[gitlab:jony|@jony]] was approved. === 2024-05-23 === * 08:45 [[gitlab:lepticed7|@lepticed7]] was approved. === 2024-05-22 === * 20:42 [[gitlab:echecs|@echecs]] was approved. === 2024-05-21 === * 13:33 [[gitlab:mbs|@mbs]] was approved. === 2024-05-19 === * 18:06 [[gitlab:ionenlaser|@ionenlaser]] was approved. === 2024-05-18 === * 23:36 [[gitlab:mdaniels5757|@mdaniels5757]] was approved. === 2024-05-17 === * 08:54 [[gitlab:grapedog|@grapedog]] was approved. === 2024-05-08 === * 19:42 [[gitlab:kelhurd|@kelhurd]] was approved. * 19:06 [[gitlab:khurd|@khurd]] was approved. === 2024-05-06 === * 19:48 [[gitlab:j3j5|@j3j5]] was approved. * 12:06 [[gitlab:tk-999|@tk-999]] was approved. === 2024-05-05 === * 22:09 [[gitlab:pppery|@pppery]] was approved. * 20:33 [[gitlab:sakretsu|@sakretsu]] was approved. * 12:12 [[gitlab:waterquark|@waterquark]] was approved. === 2024-05-04 === * 09:03 [[gitlab:multichill|@multichill]] was approved. * 07:42 [[gitlab:abaris|@abaris]] was approved. === 2024-05-03 === * 14:57 [[gitlab:maurusian|@maurusian]] was approved. === 2024-04-24 === * 05:48 [[gitlab:wolfinux|@wolfinux]] was approved. === 2024-04-23 === * 15:48 [[gitlab:dreamrimmer|@dreamrimmer]] was approved. === 2024-04-21 === * 06:51 [[gitlab:alon|@alon]] was approved. === 2024-04-17 === * 23:33 [[gitlab:derenrich|@derenrich]] was approved. === 2024-04-16 === * 17:18 [[gitlab:valcio|@valcio]] was approved. === 2024-04-14 === * 16:51 [[gitlab:wikilucas00|@wikilucas00]] was approved. === 2024-04-06 === * 12:48 [[gitlab:theprotonade|@theprotonade]] was approved. === 2024-04-02 === * 07:30 [[gitlab:bohuizhang|@bohuizhang]] was approved. === 2024-03-30 === * 13:36 [[gitlab:lpintscher|@lpintscher]] was approved. === 2024-03-26 === * 17:09 [[gitlab:eenabulele|@eenabulele]] was approved. === 2024-03-25 === * 14:27 [[gitlab:tuukka|@tuukka]] was approved. === 2024-03-24 === * 12:24 [[gitlab:firefly|@firefly]] was approved. === 2024-03-21 === * 19:33 [[gitlab:universal-omega|@universal-omega]] was approved. === 2024-03-17 === * 10:36 [[gitlab:bisel91|@bisel91]] was approved. === 2024-03-16 === * 10:09 [[gitlab:delord|@delord]] was approved. * 00:42 [[gitlab:athulvis1|@athulvis1]] was approved. === 2024-03-15 === * 19:06 [[gitlab:ignaciorodrguez|@ignaciorodrguez]] was approved. * 08:30 [[gitlab:peachey88|@peachey88]] was approved. * 06:51 [[gitlab:derick|@derick]] was approved. === 2024-03-12 === * 15:06 [[gitlab:xiaoxiao|@xiaoxiao]] was approved. === 2024-03-06 === * 13:21 [[gitlab:desianabae1|@desianabae1]] was approved. === 2024-03-05 === * 19:21 [[gitlab:ep1c|@ep1c]] was approved. * 16:33 [[gitlab:jasmine|@jasmine]] was approved. === 2024-03-02 === * 06:42 [[gitlab:potsdamlamb|@potsdamlamb]] was approved. === 2024-02-29 === * 23:18 [[gitlab:arandomname123|@arandomname123]] was approved. * 18:03 [[gitlab:baba|@baba]] was approved. * 17:48 [[gitlab:yfdyh000|@yfdyh000]] was approved. * 03:09 [[gitlab:sds|@sds]] was approved. === 2024-02-27 === * 23:33 [[gitlab:lofhi|@lofhi]] was approved. === 2024-02-15 === * 19:45 [[gitlab:gergesshamon|@gergesshamon]] was approved. === 2024-02-14 === * 14:33 [[gitlab:philipnelson99|@philipnelson99]] was approved. === 2024-02-13 === * 13:06 [[gitlab:dringsim|@dringsim]] was approved. === 2024-02-12 === * 17:36 [[gitlab:haak|@haak]] was approved. === 2024-02-05 === * 17:33 [[gitlab:qwerfjkl|@qwerfjkl]] was approved. * 17:14 [[gitlab:ahecht|@ahecht]] was approved. === 2024-02-01 === * 09:27 [[gitlab:arinaigum|@arinaigum]] was approved. * 00:15 [[gitlab:jas42|@jas42]] was approved. * 00:15 [[gitlab:edhu|@edhu]] was approved. * 00:15 [[gitlab:marnanel|@marnanel]] was approved. * 00:15 [[gitlab:ibrahemqasim|@ibrahemqasim]] was approved. * 00:15 [[gitlab:amasotti|@amasotti]] was approved. * 00:15 [[gitlab:deni|@deni]] was approved. * 00:15 [[gitlab:cyber|@cyber]] was approved. * 00:15 [[gitlab:saroj|@saroj]] was approved. === 2024-01-29 === * 21:42 [[gitlab:rgupta|@rgupta]] was approved. === 2024-01-07 === * 09:48 [[gitlab:lutrome|@lutrome]] was approved. === 2024-01-05 === * 20:48 [[gitlab:jinoytommanjaly|@jinoytommanjaly]] was approved. * 02:51 [[gitlab:braunobruno|@braunobruno]] was approved. * 01:08 [[gitlab:amorymeltzer|@amorymeltzer]] was approved. * 01:08 [[gitlab:phi22ipus|@phi22ipus]] was approved. === 2024-01-03 === * 14:45 [[gitlab:gabina|@gabina]] was approved. === 2024-01-02 === * 13:18 [[gitlab:arthurtaylor|@arthurtaylor]] was approved. === 2023-12-23 === * 00:33 [[gitlab:aram|@aram]] was approved. === 2023-12-22 === * 16:24 [[gitlab:elpitareio|@elpitareio]] was approved. === 2023-12-21 === * 00:43 [[gitlab:bsadowski1|@bsadowski1]] was approved. * 00:43 [[gitlab:ederporto|@ederporto]] was approved. * 00:43 [[gitlab:sadraiiali|@sadraiiali]] was approved. * 00:43 [[gitlab:wasp-outis|@wasp-outis]] was approved. * 00:43 [[gitlab:bodhisattwa|@bodhisattwa]] was approved. * 00:43 [[gitlab:air7538|@air7538]] was approved. * 00:43 [[gitlab:anzx|@anzx]] was approved. * 00:43 [[gitlab:tekask1903|@tekask1903]] was approved. * 00:42 [[gitlab:kiwi-0x010c|@kiwi-0x010c]] was approved. * 00:42 [[gitlab:mpaa|@mpaa]] was approved. * 00:42 [[gitlab:kutay|@kutay]] was approved. * 00:42 [[gitlab:wattmto|@wattmto]] was approved. jtsc10vmmdc5rjgodr1jyg2i637h84b Grid deprecation 0 453980 2249722 2141578 2024-12-01T08:30:16Z MABot 10804 Bot: Fixing double redirect from [[News/Toolforge Grid Engine deprecation]] to [[News/2024 Toolforge Grid Engine deprecation]] 2249722 wikitext text/x-wiki #REDIRECT [[News/2024 Toolforge Grid Engine deprecation]] dfozcae3fl0j3ryftgaibl7af46ccr1 Nova Resource:Tofuinfratest-d99a59de-405d-4af7-8607-71d04a8d1af5 498 456243 2249710 2024-11-30T12:00:16Z Labslogbot 55 Auto update of instance info. 2249710 wikitext text/x-wiki <!-- autostatus begin --> {{Nova Resource |Resource Type=project |Project ID=a4d4d0a65c1e4c8da39d4767eb730183 |Project Name=tofuinfratest-d99a59de-405d-4af7-8607-71d04a8d1af5}} <!-- autostatus end --> h2ftahcahrkog9g4b8gro4fauzbm1ju Server Admin Log/Archive 87 0 456244 2249714 2024-12-01T00:28:03Z JrandWP 37706 archive November 2024 2249714 wikitext text/x-wiki == 2024-11-30 == * 11:59 joal@deploy2002: Finished deploy [airflow-dags/analytics@fe37cfe]: Hotfix airflow analytics deploy [airflow-dags/analytics@fe37cfec] (duration: 01m 21s) * 11:58 joal@deploy2002: Started deploy [airflow-dags/analytics@fe37cfe]: Hotfix airflow analytics deploy [airflow-dags/analytics@fe37cfec] == 2024-11-29 == * 16:55 jayme: puppet ca destroy mwmaint.discovery.wmnet - [[phab:T341859|T341859]] * 16:22 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2006.codfw.wmnet * 16:21 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1006.eqiad.wmnet * 16:15 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-gp2006.codfw.wmnet * 16:15 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-gp1006.eqiad.wmnet * 15:41 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2005.codfw.wmnet * 15:40 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1005.eqiad.wmnet * 15:34 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-gp1005.eqiad.wmnet * 15:34 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-gp2005.codfw.wmnet * 15:16 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2004.codfw.wmnet * 15:16 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1004.eqiad.wmnet * 15:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P71448 and previous config saved to /var/cache/conftool/dbconfig/20241129-151101-ladsgroup.json * 15:10 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-gp2004.codfw.wmnet * 15:09 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-gp1004.eqiad.wmnet * 14:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P71447 and previous config saved to /var/cache/conftool/dbconfig/20241129-145554-ladsgroup.json * 14:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P71446 and previous config saved to /var/cache/conftool/dbconfig/20241129-144047-ladsgroup.json * 14:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P71445 and previous config saved to /var/cache/conftool/dbconfig/20241129-142540-ladsgroup.json * 14:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1231 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P71444 and previous config saved to /var/cache/conftool/dbconfig/20241129-141931-ladsgroup.json * 14:19 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1231.eqiad.wmnet with reason: Maintenance * 14:19 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1231.eqiad.wmnet with reason: Maintenance * 14:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance * 14:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance * 14:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P71443 and previous config saved to /var/cache/conftool/dbconfig/20241129-141409-ladsgroup.json * 13:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P71442 and previous config saved to /var/cache/conftool/dbconfig/20241129-135902-ladsgroup.json * 13:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P71441 and previous config saved to /var/cache/conftool/dbconfig/20241129-134355-ladsgroup.json * 13:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P71440 and previous config saved to /var/cache/conftool/dbconfig/20241129-132848-ladsgroup.json * 13:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti1021.eqiad.wmnet * 13:23 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:22 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti1021.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 13:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1187 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P71439 and previous config saved to /var/cache/conftool/dbconfig/20241129-132136-ladsgroup.json * 13:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance * 13:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance * 13:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P71438 and previous config saved to /var/cache/conftool/dbconfig/20241129-132111-ladsgroup.json * 13:17 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti1021.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 13:13 jmm@cumin2002: START - Cookbook sre.dns.netbox * 13:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P71437 and previous config saved to /var/cache/conftool/dbconfig/20241129-130604-ladsgroup.json * 13:06 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti1021.eqiad.wmnet * 12:57 klausman@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 12:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P71436 and previous config saved to /var/cache/conftool/dbconfig/20241129-125057-ladsgroup.json * 12:42 klausman@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 12:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P71434 and previous config saved to /var/cache/conftool/dbconfig/20241129-123549-ladsgroup.json * 12:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti1015.eqiad.wmnet * 12:32 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:32 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti1015.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 12:31 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti1015.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 12:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1180 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P71433 and previous config saved to /var/cache/conftool/dbconfig/20241129-122735-ladsgroup.json * 12:27 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance * 12:27 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance * 12:27 jmm@cumin2002: START - Cookbook sre.dns.netbox * 12:21 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti1015.eqiad.wmnet * 12:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P71432 and previous config saved to /var/cache/conftool/dbconfig/20241129-121010-ladsgroup.json * 12:06 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab to new version * 12:04 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2085.codfw.wmnet with OS bullseye * 11:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P71431 and previous config saved to /var/cache/conftool/dbconfig/20241129-115501-ladsgroup.json * 11:44 moritzm: imported mapnik_4.0.3+ds2~wmf12u1 to component/maps [[phab:T216826|T216826]] * 11:43 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 11:40 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 11:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P71430 and previous config saved to /var/cache/conftool/dbconfig/20241129-113954-ladsgroup.json * 11:31 Dreamy_Jazz: Started MediaModeration scanning scripts to scan all wikis * 11:29 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 11:27 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2084.codfw.wmnet with OS bullseye * 11:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P71429 and previous config saved to /var/cache/conftool/dbconfig/20241129-112447-ladsgroup.json * 11:19 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:18 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1173 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P71428 and previous config saved to /var/cache/conftool/dbconfig/20241129-111554-ladsgroup.json * 11:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance * 11:15 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance * 11:01 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2084.codfw.wmnet with reason: host reimage * 11:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2214.codfw.wmnet with reason: Maintenance * 11:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2214.codfw.wmnet with reason: Maintenance * 10:57 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2084.codfw.wmnet with reason: host reimage * 10:45 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2084.codfw.wmnet with OS bullseye * 10:36 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 10:13 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 10:10 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 09:57 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 09:43 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 09:21 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 09:18 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 09:05 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 09:02 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab to new version * 08:54 moritzm: imported mapbox-polylabel 2.0.1-1~wmf12u1 to component/maps [[phab:T216826|T216826]] * 08:51 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:51 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:16 moritzm: imported mapbox-geometry_2.0.3-1~wmf12u1 to component/maps [[phab:T216826|T216826]] * 07:19 marostegui@cumin2002: dbctl commit (dc=all): 'db1223 (re)pooling @ 100%: Repooling after corruption', diff saved to https://phabricator.wikimedia.org/P71427 and previous config saved to /var/cache/conftool/dbconfig/20241129-071905-root.json * 07:10 aqu@deploy2002: Finished deploy [airflow-dags/analytics@656d6df]: Generate canary events faster in Airflow (duration: 03m 15s) * 07:06 aqu@deploy2002: Started deploy [airflow-dags/analytics@656d6df]: Generate canary events faster in Airflow * 07:03 marostegui@cumin2002: dbctl commit (dc=all): 'db1223 (re)pooling @ 75%: Repooling after corruption', diff saved to https://phabricator.wikimedia.org/P71426 and previous config saved to /var/cache/conftool/dbconfig/20241129-070333-root.json * 06:48 marostegui@cumin2002: dbctl commit (dc=all): 'db1223 (re)pooling @ 50%: Repooling after corruption', diff saved to https://phabricator.wikimedia.org/P71425 and previous config saved to /var/cache/conftool/dbconfig/20241129-064801-root.json * 06:28 marostegui@cumin2002: dbctl commit (dc=all): 'Repool', diff saved to https://phabricator.wikimedia.org/P71424 and previous config saved to /var/cache/conftool/dbconfig/20241129-062833-marostegui.json * 06:27 marostegui@cumin2002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db1223 quickly with 2 steps - Fixed corruption * 06:26 marostegui@cumin2002: START - Cookbook sre.mysql.pool db1223 quickly with 2 steps - Fixed corruption * 05:52 taavi@cumin1002: dbctl commit (dc=all): 'depool db1223, replication broken', diff saved to https://phabricator.wikimedia.org/P71423 and previous config saved to /var/cache/conftool/dbconfig/20241129-055245-taavi.json * 04:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T328817|T328817]])', diff saved to https://phabricator.wikimedia.org/P71422 and previous config saved to /var/cache/conftool/dbconfig/20241129-045409-ladsgroup.json * 04:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P71421 and previous config saved to /var/cache/conftool/dbconfig/20241129-043902-ladsgroup.json * 04:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P71420 and previous config saved to /var/cache/conftool/dbconfig/20241129-042355-ladsgroup.json * {{safesubst:SAL entry|1=04:20 tstarling@deploy2002: Finished scap sync-world: Backport for [[gerrit:1098915{{!}}addWiki.php tweaks]], [[gerrit:1098916{{!}}Run dumpInterwiki.php locally with no changes]], [[gerrit:1098917{{!}}Prepare id.wikivoyage.org for installation (T380726 T352113)]], [[gerrit:1099065{{!}}dumpInterwiki: read from preinstall.dblist (T352113)]], [[gerrit:1099066{{!}}addWiki: Move DB_ADMIN to core]], [[gerrit:1099064{{!}}addWiki: Add UpdateSearchIndexCon}} * 04:12 tstarling@deploy2002: tstarling: Continuing with sync * {{safesubst:SAL entry|1=04:12 tstarling@deploy2002: tstarling: Backport for [[gerrit:1098915{{!}}addWiki.php tweaks]], [[gerrit:1098916{{!}}Run dumpInterwiki.php locally with no changes]], [[gerrit:1098917{{!}}Prepare id.wikivoyage.org for installation (T380726 T352113)]], [[gerrit:1099065{{!}}dumpInterwiki: read from preinstall.dblist (T352113)]], [[gerrit:1099066{{!}}addWiki: Move DB_ADMIN to core]], [[gerrit:1099064{{!}}addWiki: Add UpdateSearchIndexConfig]], [[gerrit}} * 04:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T328817|T328817]])', diff saved to https://phabricator.wikimedia.org/P71419 and previous config saved to /var/cache/conftool/dbconfig/20241129-040846-ladsgroup.json * 04:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2229 ([[phab:T328817|T328817]])', diff saved to https://phabricator.wikimedia.org/P71418 and previous config saved to /var/cache/conftool/dbconfig/20241129-040547-ladsgroup.json * 04:05 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2229.codfw.wmnet with reason: Maintenance * 04:05 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2229.codfw.wmnet with reason: Maintenance * 04:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2224 ([[phab:T328817|T328817]])', diff saved to https://phabricator.wikimedia.org/P71417 and previous config saved to /var/cache/conftool/dbconfig/20241129-040523-ladsgroup.json * {{safesubst:SAL entry|1=04:01 tstarling@deploy2002: Started scap sync-world: Backport for [[gerrit:1098915{{!}}addWiki.php tweaks]], [[gerrit:1098916{{!}}Run dumpInterwiki.php locally with no changes]], [[gerrit:1098917{{!}}Prepare id.wikivoyage.org for installation (T380726 T352113)]], [[gerrit:1099065{{!}}dumpInterwiki: read from preinstall.dblist (T352113)]], [[gerrit:1099066{{!}}addWiki: Move DB_ADMIN to core]], [[gerrit:1099064{{!}}addWiki: Add UpdateSearchIndexConf}} * 03:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2224', diff saved to https://phabricator.wikimedia.org/P71416 and previous config saved to /var/cache/conftool/dbconfig/20241129-035016-ladsgroup.json * 03:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2224', diff saved to https://phabricator.wikimedia.org/P71415 and previous config saved to /var/cache/conftool/dbconfig/20241129-033509-ladsgroup.json * 03:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2224 ([[phab:T328817|T328817]])', diff saved to https://phabricator.wikimedia.org/P71414 and previous config saved to /var/cache/conftool/dbconfig/20241129-032002-ladsgroup.json * 03:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2224 ([[phab:T328817|T328817]])', diff saved to https://phabricator.wikimedia.org/P71413 and previous config saved to /var/cache/conftool/dbconfig/20241129-031705-ladsgroup.json * 03:16 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2224.codfw.wmnet with reason: Maintenance * 03:16 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2224.codfw.wmnet with reason: Maintenance * 03:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 ([[phab:T328817|T328817]])', diff saved to https://phabricator.wikimedia.org/P71412 and previous config saved to /var/cache/conftool/dbconfig/20241129-031642-ladsgroup.json * 03:04 tstarling@deploy2002: scap failed: <KeyError> '1 dbs from /srv/mediawiki-staging/wikiversions.json are missing from /srv/mediawiki-staging/dblists/all.dblist: idwikivoyage' (scap version: 4.129.0) (duration: 00m 00s) * 03:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P71411 and previous config saved to /var/cache/conftool/dbconfig/20241129-030133-ladsgroup.json * 02:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P71410 and previous config saved to /var/cache/conftool/dbconfig/20241129-024625-ladsgroup.json * 02:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 ([[phab:T328817|T328817]])', diff saved to https://phabricator.wikimedia.org/P71409 and previous config saved to /var/cache/conftool/dbconfig/20241129-023118-ladsgroup.json * 02:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2217 ([[phab:T328817|T328817]])', diff saved to https://phabricator.wikimedia.org/P71408 and previous config saved to /var/cache/conftool/dbconfig/20241129-022822-ladsgroup.json * 02:28 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2217.codfw.wmnet with reason: Maintenance * 02:28 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2217.codfw.wmnet with reason: Maintenance * 02:27 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2197.codfw.wmnet with reason: Maintenance * 02:26 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2197.codfw.wmnet with reason: Maintenance * 02:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 ([[phab:T328817|T328817]])', diff saved to https://phabricator.wikimedia.org/P71407 and previous config saved to /var/cache/conftool/dbconfig/20241129-022645-ladsgroup.json * 02:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P71406 and previous config saved to /var/cache/conftool/dbconfig/20241129-021138-ladsgroup.json * 01:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P71405 and previous config saved to /var/cache/conftool/dbconfig/20241129-015631-ladsgroup.json * 01:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 ([[phab:T328817|T328817]])', diff saved to https://phabricator.wikimedia.org/P71404 and previous config saved to /var/cache/conftool/dbconfig/20241129-014124-ladsgroup.json * 01:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2193 ([[phab:T328817|T328817]])', diff saved to https://phabricator.wikimedia.org/P71403 and previous config saved to /var/cache/conftool/dbconfig/20241129-013912-ladsgroup.json * 01:39 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2193.codfw.wmnet with reason: Maintenance * 01:38 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2193.codfw.wmnet with reason: Maintenance * 01:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 ([[phab:T328817|T328817]])', diff saved to https://phabricator.wikimedia.org/P71402 and previous config saved to /var/cache/conftool/dbconfig/20241129-013850-ladsgroup.json * 01:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P71401 and previous config saved to /var/cache/conftool/dbconfig/20241129-012343-ladsgroup.json * 01:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P71400 and previous config saved to /var/cache/conftool/dbconfig/20241129-010835-ladsgroup.json * 00:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 ([[phab:T328817|T328817]])', diff saved to https://phabricator.wikimedia.org/P71399 and previous config saved to /var/cache/conftool/dbconfig/20241129-005328-ladsgroup.json * 00:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2180 ([[phab:T328817|T328817]])', diff saved to https://phabricator.wikimedia.org/P71398 and previous config saved to /var/cache/conftool/dbconfig/20241129-005117-ladsgroup.json * 00:51 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2180.codfw.wmnet with reason: Maintenance * 00:50 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2180.codfw.wmnet with reason: Maintenance * 00:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 ([[phab:T328817|T328817]])', diff saved to https://phabricator.wikimedia.org/P71397 and previous config saved to /var/cache/conftool/dbconfig/20241129-005054-ladsgroup.json * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P71396 and previous config saved to /var/cache/conftool/dbconfig/20241129-003547-ladsgroup.json * 00:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P71395 and previous config saved to /var/cache/conftool/dbconfig/20241129-002040-ladsgroup.json * 00:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 ([[phab:T328817|T328817]])', diff saved to https://phabricator.wikimedia.org/P71394 and previous config saved to /var/cache/conftool/dbconfig/20241129-000533-ladsgroup.json * 00:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2169 ([[phab:T328817|T328817]])', diff saved to https://phabricator.wikimedia.org/P71393 and previous config saved to /var/cache/conftool/dbconfig/20241129-000234-ladsgroup.json * 00:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance * 00:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance * 00:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158 ([[phab:T328817|T328817]])', diff saved to https://phabricator.wikimedia.org/P71392 and previous config saved to /var/cache/conftool/dbconfig/20241129-000211-ladsgroup.json == 2024-11-28 == * 23:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P71391 and previous config saved to /var/cache/conftool/dbconfig/20241128-234704-ladsgroup.json * 23:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P71390 and previous config saved to /var/cache/conftool/dbconfig/20241128-233426-ladsgroup.json * 23:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P71389 and previous config saved to /var/cache/conftool/dbconfig/20241128-233157-ladsgroup.json * 23:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P71388 and previous config saved to /var/cache/conftool/dbconfig/20241128-231919-ladsgroup.json * 23:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158 ([[phab:T328817|T328817]])', diff saved to https://phabricator.wikimedia.org/P71387 and previous config saved to /var/cache/conftool/dbconfig/20241128-231650-ladsgroup.json * 23:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2158 ([[phab:T328817|T328817]])', diff saved to https://phabricator.wikimedia.org/P71386 and previous config saved to /var/cache/conftool/dbconfig/20241128-231350-ladsgroup.json * 23:13 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance * 23:13 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance * 23:13 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2158.codfw.wmnet with reason: Maintenance * 23:13 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2158.codfw.wmnet with reason: Maintenance * 23:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 ([[phab:T328817|T328817]])', diff saved to https://phabricator.wikimedia.org/P71385 and previous config saved to /var/cache/conftool/dbconfig/20241128-231312-ladsgroup.json * 23:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P71384 and previous config saved to /var/cache/conftool/dbconfig/20241128-230412-ladsgroup.json * 22:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P71383 and previous config saved to /var/cache/conftool/dbconfig/20241128-225805-ladsgroup.json * 22:50 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1178 gradually with 4 steps - Maint over ([[phab:T361627|T361627]]) * 22:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P71381 and previous config saved to /var/cache/conftool/dbconfig/20241128-224905-ladsgroup.json * 22:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P71380 and previous config saved to /var/cache/conftool/dbconfig/20241128-224258-ladsgroup.json * 22:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1165 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P71379 and previous config saved to /var/cache/conftool/dbconfig/20241128-223959-ladsgroup.json * 22:39 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance * 22:39 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance * 22:39 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance * 22:39 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance * 22:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 ([[phab:T328817|T328817]])', diff saved to https://phabricator.wikimedia.org/P71377 and previous config saved to /var/cache/conftool/dbconfig/20241128-222751-ladsgroup.json * 22:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2151 ([[phab:T328817|T328817]])', diff saved to https://phabricator.wikimedia.org/P71376 and previous config saved to /var/cache/conftool/dbconfig/20241128-222250-ladsgroup.json * 22:22 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2151.codfw.wmnet with reason: Maintenance * 22:22 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2151.codfw.wmnet with reason: Maintenance * away: UTC late deploys done * {{safesubst:SAL entry|1=22:17 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1098990{{!}}Localisation updates (November 26) (T372175)]], [[gerrit:1098956{{!}}extend account creation lookup service to cover forced creations by others (T378401)]], [[gerrit:1098965{{!}}extend account creation backfill script to forced account creations by others (T378401)]], [[gerrit:1098929{{!}}ReportIncident: Setup $wgReportIncidentLocalLinks for ptwiki pilot depl}} * 22:07 tgr@deploy2002: tgr, ariel, matmarex, mszabo: Continuing with sync * 22:05 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool db1178 gradually with 4 steps - Maint over ([[phab:T361627|T361627]]) * {{safesubst:SAL entry|1=21:53 tgr@deploy2002: tgr, ariel, matmarex, mszabo: Backport for [[gerrit:1098990{{!}}Localisation updates (November 26) (T372175)]], [[gerrit:1098956{{!}}extend account creation lookup service to cover forced creations by others (T378401)]], [[gerrit:1098965{{!}}extend account creation backfill script to forced account creations by others (T378401)]], [[gerrit:1098929{{!}}ReportIncident: Setup $wgReportIncidentLocalLinks for ptwiki pilot}} * 21:51 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1178.eqiad.wmnet with reason: Schema change ([[phab:T361627|T361627]]) * 21:51 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1178.eqiad.wmnet with reason: Schema change ([[phab:T361627|T361627]]) * 21:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1178 depool ([[phab:T361627|T361627]])', diff saved to https://phabricator.wikimedia.org/P71373 and previous config saved to /var/cache/conftool/dbconfig/20241128-215026-ladsgroup.json * {{safesubst:SAL entry|1=21:39 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1098990{{!}}Localisation updates (November 26) (T372175)]], [[gerrit:1098956{{!}}extend account creation lookup service to cover forced creations by others (T378401)]], [[gerrit:1098965{{!}}extend account creation backfill script to forced account creations by others (T378401)]], [[gerrit:1098929{{!}}ReportIncident: Setup $wgReportIncidentLocalLinks for ptwiki pilot deplo}} * 21:25 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1098617{{!}}Reader Survey: Undeploy on enwiki (T378660)]], [[gerrit:1098627{{!}}Reader Survey: Deploy on multiple wikis (T378660)]] (duration: 14m 43s) * 21:18 tgr@deploy2002: tgr, dani: Continuing with sync * 21:17 aqu@deploy2002: Finished deploy [airflow-dags/analytics@6d38940]: Generate canary events faster in Airflow (duration: 01m 39s) * 21:16 tgr@deploy2002: tgr, dani: Backport for [[gerrit:1098617{{!}}Reader Survey: Undeploy on enwiki (T378660)]], [[gerrit:1098627{{!}}Reader Survey: Deploy on multiple wikis (T378660)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:15 aqu@deploy2002: Started deploy [airflow-dags/analytics@6d38940]: Generate canary events faster in Airflow * 21:10 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1098617{{!}}Reader Survey: Undeploy on enwiki (T378660)]], [[gerrit:1098627{{!}}Reader Survey: Deploy on multiple wikis (T378660)]] * 20:30 kharlan@deploy2002: Finished scap sync-world: Backport for [[gerrit:1098929{{!}}ReportIncident: Setup $wgReportIncidentLocalLinks for ptwiki pilot deploy (T380277)]] (duration: 13m 08s) * 20:23 kharlan@deploy2002: kharlan, mszabo: Continuing with sync * 20:23 kharlan@deploy2002: kharlan, mszabo: Backport for [[gerrit:1098929{{!}}ReportIncident: Setup $wgReportIncidentLocalLinks for ptwiki pilot deploy (T380277)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:16 kharlan@deploy2002: Started scap sync-world: Backport for [[gerrit:1098929{{!}}ReportIncident: Setup $wgReportIncidentLocalLinks for ptwiki pilot deploy (T380277)]] * 19:50 kamila@cumin1002: END (PASS) - Cookbook sre.k8s.roll-reimage-nodes (exit_code=0) rolling reimage on P<nowiki>{</nowiki>wikikube-worker[1276-1277].eqiad.wmnet<nowiki>}</nowiki> and (A:wikikube-staging-master-codfw or A:wikikube-staging-worker-codfw or A:wikikube-staging-master-eqiad or A:wikikube-staging-worker-eqiad or A:wikikube-master-codfw or A:wikikube-worker-codfw or A:wikikube-master-eqiad or A:wikikube-worker-eqiad or A:ml-serve-master-eqiad or * 19:50 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1277.eqiad.wmnet with OS bookworm * 19:31 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1277.eqiad.wmnet with reason: host reimage * 19:27 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1277.eqiad.wmnet with reason: host reimage * 19:08 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1277.eqiad.wmnet with OS bookworm * 18:28 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1276.eqiad.wmnet with OS bookworm * 18:09 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1276.eqiad.wmnet with reason: host reimage * 18:06 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1276.eqiad.wmnet with reason: host reimage * 17:47 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1276.eqiad.wmnet with OS bookworm * 17:45 kamila@cumin1002: START - Cookbook sre.k8s.roll-reimage-nodes rolling reimage on P<nowiki>{</nowiki>wikikube-worker[1276-1277].eqiad.wmnet<nowiki>}</nowiki> and (A:wikikube-staging-master-codfw or A:wikikube-staging-worker-codfw or A:wikikube-staging-master-eqiad or A:wikikube-staging-worker-eqiad or A:wikikube-master-codfw or A:wikikube-worker-codfw or A:wikikube-master-eqiad or A:wikikube-worker-eqiad or A:ml-serve-master-eqiad or A:ml-serve-worker- * 17:06 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2081.codfw.wmnet with OS bullseye * 16:51 Emperor: depool/restart swift/repool ms-fe2014 * 16:51 Emperor: depool/restart swift/repool ms-fe2009 * 16:44 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 16:41 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 16:39 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2209.codfw.wmnet with reason: Maintenance * 16:39 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2209.codfw.wmnet with reason: Maintenance * 16:37 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1189.eqiad.wmnet with reason: Maintenance * 16:37 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1189.eqiad.wmnet with reason: Maintenance * 16:28 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 16:27 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2227.codfw.wmnet with reason: Maintenance * 16:27 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2227.codfw.wmnet with reason: Maintenance * 16:24 gmodena@deploy2002: Finished deploy [airflow-dags/analytics@d7c0f58]: webrequest_frontend post deployment fixes (duration: 02m 22s) * 16:24 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2088.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 16:24 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2088.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 16:23 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2087.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 16:23 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2087.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 16:23 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 16:22 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 16:22 gmodena@deploy2002: Started deploy [airflow-dags/analytics@d7c0f58]: webrequest_frontend post deployment fixes * 16:22 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2085.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 16:22 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2085.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 16:21 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2084.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 16:21 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2084.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 16:21 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2083.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 16:20 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2083.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 16:20 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 16:19 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 16:19 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2081.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 16:19 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2081.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 16:17 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2205.codfw.wmnet with reason: Maintenance * 16:17 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2205.codfw.wmnet with reason: Maintenance * 16:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2194.codfw.wmnet with reason: Maintenance * 16:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2194.codfw.wmnet with reason: Maintenance * 16:01 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 16:01 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 15:58 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2190.codfw.wmnet with reason: Maintenance * 15:58 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2190.codfw.wmnet with reason: Maintenance * 15:48 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2177.codfw.wmnet with reason: Maintenance * 15:48 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2177.codfw.wmnet with reason: Maintenance * 15:46 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host idp-test2004.wikimedia.org * 15:39 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2186.codfw.wmnet with reason: Maintenance * 15:39 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2186.codfw.wmnet with reason: Maintenance * 15:39 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2156.codfw.wmnet with reason: Maintenance * 15:39 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2156.codfw.wmnet with reason: Maintenance * 15:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test2004.wikimedia.org * 15:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test2005.wikimedia.org * 15:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test2005.wikimedia.org * 15:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P71371 and previous config saved to /var/cache/conftool/dbconfig/20241128-153202-ladsgroup.json * 15:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2149.codfw.wmnet with reason: Maintenance * 15:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2149.codfw.wmnet with reason: Maintenance * 15:27 gmodena@deploy2002: Finished deploy [analytics/refinery@ac87303] (hadoop-test): Gobblin config changes [analytics/refinery@ac873037] (duration: 00m 26s) * 15:26 gmodena@deploy2002: Started deploy [analytics/refinery@ac87303] (hadoop-test): Gobblin config changes [analytics/refinery@ac873037] * 15:25 gmodena@deploy2002: Finished deploy [analytics/refinery@ac87303] (thin): Gobblin config changes THIN [analytics/refinery@ac873037] (duration: 00m 30s) * 15:25 gmodena@deploy2002: Started deploy [analytics/refinery@ac87303] (thin): Gobblin config changes THIN [analytics/refinery@ac873037] * 15:21 moritzm: removing ganeti1018 from active Ganeti nodes [[phab:T378921|T378921]] * 15:20 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2139.codfw.wmnet with reason: Maintenance * 15:20 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/tegola-vector-tiles: sync * 15:20 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2139.codfw.wmnet with reason: Maintenance * 15:19 gmodena@deploy2002: Finished deploy [analytics/refinery@ac87303]: Gobblin config changes [analytics/refinery@ac873037] (duration: 03m 05s) * 15:19 elukey@deploy2002: helmfile [staging] START helmfile.d/services/tegola-vector-tiles: sync * 15:18 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance * 15:18 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance * 15:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2032', diff saved to https://phabricator.wikimedia.org/P71370 and previous config saved to /var/cache/conftool/dbconfig/20241128-151655-ladsgroup.json * 15:16 gmodena@deploy2002: Started deploy [analytics/refinery@ac87303]: Gobblin config changes [analytics/refinery@ac873037] * 15:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1018.eqiad.wmnet * 15:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1240.eqiad.wmnet with reason: Maintenance * 15:15 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1240.eqiad.wmnet with reason: Maintenance * 15:13 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1223.eqiad.wmnet with reason: Maintenance * 15:12 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1223.eqiad.wmnet with reason: Maintenance * 15:10 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 15:10 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 15:10 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1212.eqiad.wmnet with reason: Maintenance * 15:10 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1212.eqiad.wmnet with reason: Maintenance * 15:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1198.eqiad.wmnet with reason: Maintenance * 15:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1198.eqiad.wmnet with reason: Maintenance * 15:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1175.eqiad.wmnet with reason: Maintenance * 15:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1175.eqiad.wmnet with reason: Maintenance * 15:04 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1166.eqiad.wmnet with reason: Maintenance * 15:04 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1166.eqiad.wmnet with reason: Maintenance * 15:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1157.eqiad.wmnet with reason: Maintenance * 15:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1157.eqiad.wmnet with reason: Maintenance * 15:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2032', diff saved to https://phabricator.wikimedia.org/P71369 and previous config saved to /var/cache/conftool/dbconfig/20241128-150148-ladsgroup.json * 15:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1150.eqiad.wmnet with reason: Maintenance * 15:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1150.eqiad.wmnet with reason: Maintenance * 15:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2214.codfw.wmnet with reason: Maintenance * 14:59 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2214.codfw.wmnet with reason: Maintenance * 14:59 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1201.eqiad.wmnet with reason: Maintenance * 14:59 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1201.eqiad.wmnet with reason: Maintenance * 14:59 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2229.codfw.wmnet with reason: Maintenance * 14:58 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2229.codfw.wmnet with reason: Maintenance * 14:58 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2224.codfw.wmnet with reason: Maintenance * 14:58 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2224.codfw.wmnet with reason: Maintenance * 14:58 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2217.codfw.wmnet with reason: Maintenance * 14:57 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2217.codfw.wmnet with reason: Maintenance * 14:57 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2197.codfw.wmnet with reason: Maintenance * 14:57 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2197.codfw.wmnet with reason: Maintenance * 14:57 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2193.codfw.wmnet with reason: Maintenance * 14:57 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2193.codfw.wmnet with reason: Maintenance * 14:56 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2180.codfw.wmnet with reason: Maintenance * 14:56 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2180.codfw.wmnet with reason: Maintenance * 14:56 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2169.codfw.wmnet with reason: Maintenance * 14:56 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2169.codfw.wmnet with reason: Maintenance * 14:55 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2187.codfw.wmnet with reason: Maintenance * 14:55 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2187.codfw.wmnet with reason: Maintenance * 14:55 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2158.codfw.wmnet with reason: Maintenance * 14:55 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2158.codfw.wmnet with reason: Maintenance * 14:55 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2151.codfw.wmnet with reason: Maintenance * 14:54 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2151.codfw.wmnet with reason: Maintenance * 14:54 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance * 14:54 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance * 14:54 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1231.eqiad.wmnet with reason: Maintenance * {{safesubst:SAL entry|1=14:54 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1098623{{!}}Use `useformat` query param for device detection or mobile domain (m.) (T380646 T375788)]], [[gerrit:1098913{{!}}ReportIncident: Enable instrumentation on labs (T372823)]], [[gerrit:1098509{{!}}Enable message group subscription feature for some wikis (T372386)]], [[gerrit:1098622{{!}}Use `useformat` query param for device detection or mobile domain (m.)}} * 14:54 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1231.eqiad.wmnet with reason: Maintenance * 14:53 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1225.eqiad.wmnet with reason: Maintenance * 14:53 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1225.eqiad.wmnet with reason: Maintenance * 14:53 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1187.eqiad.wmnet with reason: Maintenance * 14:53 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1187.eqiad.wmnet with reason: Maintenance * 14:53 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1180.eqiad.wmnet with reason: Maintenance * 14:52 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1180.eqiad.wmnet with reason: Maintenance * 14:52 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1173.eqiad.wmnet with reason: Maintenance * 14:52 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1173.eqiad.wmnet with reason: Maintenance * 14:52 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1168.eqiad.wmnet with reason: Maintenance * 14:52 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1168.eqiad.wmnet with reason: Maintenance * 14:51 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance * 14:51 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance * 14:51 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1165.eqiad.wmnet with reason: Maintenance * 14:51 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1165.eqiad.wmnet with reason: Maintenance * 14:47 urbanecm@deploy2002: urbanecm, tgr, abi, mszabo: Continuing with sync * 14:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P71352 and previous config saved to /var/cache/conftool/dbconfig/20241128-144641-ladsgroup.json * 14:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es2032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P71351 and previous config saved to /var/cache/conftool/dbconfig/20241128-144039-ladsgroup.json * 14:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2032.codfw.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2032.codfw.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P71350 and previous config saved to /var/cache/conftool/dbconfig/20241128-144012-ladsgroup.json * 14:39 urbanecm: [urbanecm@deploy2002 ~]$ while read wiki; do echo "== $wiki"; mwscript-k8s extensions/Flow/maintenance/FlowMoveBoardsToSubpages.php -- --wiki=$wiki; done < wikis.txt # wikis.txt is at P71349 # [[phab:T378827|T378827]] * 14:36 urbanecm: [urbanecm@deploy2002 ~]$ mwscript-k8s -f extensions/Flow/maintenance/FlowMoveBoardsToSubpages.php -- --wiki=bswiki # [[phab:T378827|T378827]] * 14:33 moritzm: installing node-es-module-lexer updates from Bookworm point release * {{safesubst:SAL entry|1=14:28 urbanecm@deploy2002: urbanecm, tgr, abi, mszabo: Backport for [[gerrit:1098623{{!}}Use `useformat` query param for device detection or mobile domain (m.) (T380646 T375788)]], [[gerrit:1098913{{!}}ReportIncident: Enable instrumentation on labs (T372823)]], [[gerrit:1098509{{!}}Enable message group subscription feature for some wikis (T372386)]], [[gerrit:1098622{{!}}Use `useformat` query param for device detection or mobile domain (m.}} * 14:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2028', diff saved to https://phabricator.wikimedia.org/P71347 and previous config saved to /var/cache/conftool/dbconfig/20241128-142505-ladsgroup.json * 14:25 Dreamy_Jazz: Started MediaModeration scanning scripts to run again over all wikis * {{safesubst:SAL entry|1=14:23 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1098623{{!}}Use `useformat` query param for device detection or mobile domain (m.) (T380646 T375788)]], [[gerrit:1098913{{!}}ReportIncident: Enable instrumentation on labs (T372823)]], [[gerrit:1098509{{!}}Enable message group subscription feature for some wikis (T372386)]], [[gerrit:1098622{{!}}Use `useformat` query param for device detection or mobile domain (m.) (}} * 14:22 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1098561{{!}}Allow IRS to record server-side interaction events (T380599)]], [[gerrit:1098939{{!}}Revert^2 "Add contact form for U4C"]] (duration: 14m 07s) * 14:22 Dreamy_Jazz: Restarted MediaModeration scanning script - https://wikitech.wikimedia.org/wiki/MediaModeration * 14:15 urbanecm@deploy2002: nmw03, mszabo, urbanecm: Continuing with sync * 14:14 moritzm: installing apr security updates * 14:14 urbanecm@deploy2002: nmw03, mszabo, urbanecm: Backport for [[gerrit:1098561{{!}}Allow IRS to record server-side interaction events (T380599)]], [[gerrit:1098939{{!}}Revert^2 "Add contact form for U4C"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2028', diff saved to https://phabricator.wikimedia.org/P71346 and previous config saved to /var/cache/conftool/dbconfig/20241128-140958-ladsgroup.json * 14:08 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1098561{{!}}Allow IRS to record server-side interaction events (T380599)]], [[gerrit:1098939{{!}}Revert^2 "Add contact form for U4C"]] * 14:06 klausman@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 13:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P71345 and previous config saved to /var/cache/conftool/dbconfig/20241128-135451-ladsgroup.json * 13:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es2028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P71344 and previous config saved to /var/cache/conftool/dbconfig/20241128-134859-ladsgroup.json * 13:48 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2028.codfw.wmnet with reason: Maintenance * 13:48 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2028.codfw.wmnet with reason: Maintenance * 12:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P71343 and previous config saved to /var/cache/conftool/dbconfig/20241128-124957-ladsgroup.json * 12:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2031', diff saved to https://phabricator.wikimedia.org/P71342 and previous config saved to /var/cache/conftool/dbconfig/20241128-123451-ladsgroup.json * 12:23 klausman@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 12:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2031', diff saved to https://phabricator.wikimedia.org/P71340 and previous config saved to /var/cache/conftool/dbconfig/20241128-121943-ladsgroup.json * 12:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P71339 and previous config saved to /var/cache/conftool/dbconfig/20241128-120437-ladsgroup.json * 12:04 ladsgroup@deploy2002: Finished scap sync-world: Backport for [[gerrit:1098914{{!}}Bump ratio of new parsercache key spec to 2 (T373037)]] (duration: 12m 37s) * 12:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71338 and previous config saved to /var/cache/conftool/dbconfig/20241128-120031-ladsgroup.json * 11:57 ladsgroup@deploy2002: ladsgroup: Continuing with sync * 11:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es2031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P71337 and previous config saved to /var/cache/conftool/dbconfig/20241128-115741-ladsgroup.json * 11:57 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2031.codfw.wmnet with reason: Maintenance * 11:57 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2031.codfw.wmnet with reason: Maintenance * 11:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P71336 and previous config saved to /var/cache/conftool/dbconfig/20241128-115715-ladsgroup.json * 11:57 ladsgroup@deploy2002: ladsgroup: Backport for [[gerrit:1098914{{!}}Bump ratio of new parsercache key spec to 2 (T373037)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 11:51 ladsgroup@deploy2002: Started scap sync-world: Backport for [[gerrit:1098914{{!}}Bump ratio of new parsercache key spec to 2 (T373037)]] * 11:48 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2237 gradually with 4 steps - Maint over ([[phab:T379813|T379813]]) * 11:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236', diff saved to https://phabricator.wikimedia.org/P71334 and previous config saved to /var/cache/conftool/dbconfig/20241128-114524-ladsgroup.json * 11:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2033', diff saved to https://phabricator.wikimedia.org/P71333 and previous config saved to /var/cache/conftool/dbconfig/20241128-114208-ladsgroup.json * 11:31 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1018.eqiad.wmnet * 11:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236', diff saved to https://phabricator.wikimedia.org/P71330 and previous config saved to /var/cache/conftool/dbconfig/20241128-113017-ladsgroup.json * 11:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2033', diff saved to https://phabricator.wikimedia.org/P71329 and previous config saved to /var/cache/conftool/dbconfig/20241128-112701-ladsgroup.json * 11:21 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1018.eqiad.wmnet * 11:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71327 and previous config saved to /var/cache/conftool/dbconfig/20241128-111510-ladsgroup.json * 11:14 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1018.eqiad.wmnet * 11:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1236 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71326 and previous config saved to /var/cache/conftool/dbconfig/20241128-111300-ladsgroup.json * 11:12 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1236.eqiad.wmnet with reason: Maintenance * 11:12 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1236.eqiad.wmnet with reason: Maintenance * 11:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P71325 and previous config saved to /var/cache/conftool/dbconfig/20241128-111154-ladsgroup.json * 11:11 moritzm: removing ganeti1022 from active Ganeti nodes [[phab:T378921|T378921]] * 11:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1022.eqiad.wmnet * 11:10 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1201.eqiad.wmnet with reason: Maintenance * 11:10 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1201.eqiad.wmnet with reason: Maintenance * 11:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2204.codfw.wmnet with reason: Maintenance * 11:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2204.codfw.wmnet with reason: Maintenance * 11:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es2033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P71324 and previous config saved to /var/cache/conftool/dbconfig/20241128-110457-ladsgroup.json * 11:04 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2033.codfw.wmnet with reason: Maintenance * 11:04 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2033.codfw.wmnet with reason: Maintenance * 11:03 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool db2237 gradually with 4 steps - Maint over ([[phab:T379813|T379813]]) * 10:51 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Fix commit bug - oblivian@cumin1002" * 10:51 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix commit bug - oblivian@cumin1002 * 10:51 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix commit bug - oblivian@cumin1002 * 10:51 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Fix commit bug - oblivian@cumin1002" * 10:32 isaranto@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 10:27 isaranto@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 09:36 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@34b35a5] (releasing): Update Jenkins version on releases1003.eqiad.wmnet (duration: 01m 22s) * 09:35 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@34b35a5] (releasing): Update Jenkins version on releases1003.eqiad.wmnet * 09:31 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@34b35a5] (releasing): Update Jenkins version on releases2003.codfw.wmnet (duration: 01m 27s) * 09:30 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@34b35a5] (releasing): Update Jenkins version on releases2003.codfw.wmnet * 09:23 isaranto@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 09:22 hashar@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.5 refs [[phab:T375664|T375664]] * 09:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2200.codfw.wmnet with reason: Maintenance * 09:13 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2200.codfw.wmnet with reason: Maintenance * 09:09 isaranto@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 09:06 isaranto@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 09:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2198.codfw.wmnet with reason: Maintenance * 09:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2198.codfw.wmnet with reason: Maintenance * 09:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71319 and previous config saved to /var/cache/conftool/dbconfig/20241128-090035-ladsgroup.json * 08:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P71318 and previous config saved to /var/cache/conftool/dbconfig/20241128-084528-ladsgroup.json * 08:43 isaranto@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 08:41 isaranto@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 08:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P71317 and previous config saved to /var/cache/conftool/dbconfig/20241128-083021-ladsgroup.json * 08:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71316 and previous config saved to /var/cache/conftool/dbconfig/20241128-081514-ladsgroup.json * 08:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2195 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71315 and previous config saved to /var/cache/conftool/dbconfig/20241128-080244-ladsgroup.json * 08:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2195.codfw.wmnet with reason: Maintenance * 08:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2195.codfw.wmnet with reason: Maintenance * 08:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71314 and previous config saved to /var/cache/conftool/dbconfig/20241128-080221-ladsgroup.json * 07:56 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1022.eqiad.wmnet * 07:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P71313 and previous config saved to /var/cache/conftool/dbconfig/20241128-074714-ladsgroup.json * 07:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P71312 and previous config saved to /var/cache/conftool/dbconfig/20241128-073207-ladsgroup.json * 07:23 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "CSRF token support - oblivian@cumin1002" * 07:23 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: CSRF token support - oblivian@cumin1002 * 07:23 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: CSRF token support - oblivian@cumin1002 * 07:22 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "CSRF token support - oblivian@cumin1002" * 07:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71310 and previous config saved to /var/cache/conftool/dbconfig/20241128-071700-ladsgroup.json * 07:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2181 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71309 and previous config saved to /var/cache/conftool/dbconfig/20241128-070231-ladsgroup.json * 07:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2181.codfw.wmnet with reason: Maintenance * 07:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2181.codfw.wmnet with reason: Maintenance * 07:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71308 and previous config saved to /var/cache/conftool/dbconfig/20241128-070209-ladsgroup.json * 07:02 kartik@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 06:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P71307 and previous config saved to /var/cache/conftool/dbconfig/20241128-064702-ladsgroup.json * 06:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P71306 and previous config saved to /var/cache/conftool/dbconfig/20241128-063155-ladsgroup.json * 06:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71305 and previous config saved to /var/cache/conftool/dbconfig/20241128-061647-ladsgroup.json * 06:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2167 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71304 and previous config saved to /var/cache/conftool/dbconfig/20241128-060418-ladsgroup.json * 06:04 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2167.codfw.wmnet with reason: Maintenance * 06:04 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2167.codfw.wmnet with reason: Maintenance * 06:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71303 and previous config saved to /var/cache/conftool/dbconfig/20241128-060355-ladsgroup.json * 05:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P71302 and previous config saved to /var/cache/conftool/dbconfig/20241128-054847-ladsgroup.json * 05:48 kartik@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 05:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P71301 and previous config saved to /var/cache/conftool/dbconfig/20241128-053340-ladsgroup.json * 05:29 tstarling@deploy2002: Finished scap sync-world: Backport for [[gerrit:1098652{{!}}Add frwiki on labs for new addWiki.php test]] (duration: 13m 41s) * 05:23 tstarling@deploy2002: tstarling: Continuing with sync * 05:22 tstarling@deploy2002: tstarling: Backport for [[gerrit:1098652{{!}}Add frwiki on labs for new addWiki.php test]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 05:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71300 and previous config saved to /var/cache/conftool/dbconfig/20241128-051833-ladsgroup.json * 05:16 tstarling@deploy2002: Started scap sync-world: Backport for [[gerrit:1098652{{!}}Add frwiki on labs for new addWiki.php test]] * 05:06 kartik@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 05:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2166 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71299 and previous config saved to /var/cache/conftool/dbconfig/20241128-050352-ladsgroup.json * 05:03 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2166.codfw.wmnet with reason: Maintenance * 05:03 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2166.codfw.wmnet with reason: Maintenance * 05:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71298 and previous config saved to /var/cache/conftool/dbconfig/20241128-050329-ladsgroup.json * 04:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P71297 and previous config saved to /var/cache/conftool/dbconfig/20241128-044822-ladsgroup.json * 04:41 eileen: civicrm upgraded from {{Gerrit|ed67a1b2}} to {{Gerrit|be7e5d33}} * 04:36 eileen: * civicrm upgraded from {{Gerrit|40f4f1a3}} to {{Gerrit|ed67a1b2}} * 04:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P71296 and previous config saved to /var/cache/conftool/dbconfig/20241128-043314-ladsgroup.json * 04:26 eileen: * civicrm upgraded from {{Gerrit|7ade5fd7}} to {{Gerrit|40f4f1a3}} * 04:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71294 and previous config saved to /var/cache/conftool/dbconfig/20241128-041807-ladsgroup.json * 04:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2164 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71292 and previous config saved to /var/cache/conftool/dbconfig/20241128-040326-ladsgroup.json * 04:03 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db2186.codfw.wmnet with reason: Maintenance * 04:03 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 16:00:00 on db2186.codfw.wmnet with reason: Maintenance * 04:03 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2164.codfw.wmnet with reason: Maintenance * 04:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2164.codfw.wmnet with reason: Maintenance * 04:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71291 and previous config saved to /var/cache/conftool/dbconfig/20241128-040248-ladsgroup.json * 03:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P71290 and previous config saved to /var/cache/conftool/dbconfig/20241128-034741-ladsgroup.json * 03:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P71289 and previous config saved to /var/cache/conftool/dbconfig/20241128-033234-ladsgroup.json * 03:22 eileen: config revision changed from {{Gerrit|f284fd46}} to {{Gerrit|a3175f86}} (like for real this time) * 03:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71288 and previous config saved to /var/cache/conftool/dbconfig/20241128-031726-ladsgroup.json * 03:14 eileen: onfig revision changed from {{Gerrit|f284fd46}} to {{Gerrit|a3175f86}} * 03:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2163 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71287 and previous config saved to /var/cache/conftool/dbconfig/20241128-030213-ladsgroup.json * 03:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2163.codfw.wmnet with reason: Maintenance * 03:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2163.codfw.wmnet with reason: Maintenance * 03:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71286 and previous config saved to /var/cache/conftool/dbconfig/20241128-030151-ladsgroup.json * 02:53 eileen: civicrm upgraded from {{Gerrit|c8c461b9}} to {{Gerrit|7ade5fd7}} * 02:46 eileen: * civicrm upgraded from {{Gerrit|80f03357}} to {{Gerrit|c8c461b9}} * 02:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P71285 and previous config saved to /var/cache/conftool/dbconfig/20241128-024644-ladsgroup.json * 02:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P71284 and previous config saved to /var/cache/conftool/dbconfig/20241128-023136-ladsgroup.json * 02:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71283 and previous config saved to /var/cache/conftool/dbconfig/20241128-021629-ladsgroup.json * 02:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2162 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71282 and previous config saved to /var/cache/conftool/dbconfig/20241128-020143-ladsgroup.json * 02:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2162.codfw.wmnet with reason: Maintenance * 02:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2162.codfw.wmnet with reason: Maintenance * 02:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71281 and previous config saved to /var/cache/conftool/dbconfig/20241128-020120-ladsgroup.json * 01:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P71280 and previous config saved to /var/cache/conftool/dbconfig/20241128-014613-ladsgroup.json * 01:38 eileen: civicrm upgraded from {{Gerrit|3b1ed162}} to {{Gerrit|80f03357}} * 01:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P71279 and previous config saved to /var/cache/conftool/dbconfig/20241128-013106-ladsgroup.json * 01:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71278 and previous config saved to /var/cache/conftool/dbconfig/20241128-011559-ladsgroup.json * 01:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2161 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71277 and previous config saved to /var/cache/conftool/dbconfig/20241128-010112-ladsgroup.json * 01:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2161.codfw.wmnet with reason: Maintenance * 01:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2161.codfw.wmnet with reason: Maintenance * 01:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71276 and previous config saved to /var/cache/conftool/dbconfig/20241128-010049-ladsgroup.json * 00:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P71275 and previous config saved to /var/cache/conftool/dbconfig/20241128-004542-ladsgroup.json * 00:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P71274 and previous config saved to /var/cache/conftool/dbconfig/20241128-003035-ladsgroup.json * 00:16 tstarling@deploy2002: Finished scap sync-world: Backport for [[gerrit:1094126{{!}}Move default main page text for new wikis to config (T352113)]], [[gerrit:1096839{{!}}Introduce preinstall.dblist for wikis that haven't been installed yet (T352113)]] (duration: 14m 42s) * 00:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71273 and previous config saved to /var/cache/conftool/dbconfig/20241128-001528-ladsgroup.json * 00:09 tstarling@deploy2002: tstarling: Continuing with sync * 00:07 tstarling@deploy2002: tstarling: Backport for [[gerrit:1094126{{!}}Move default main page text for new wikis to config (T352113)]], [[gerrit:1096839{{!}}Introduce preinstall.dblist for wikis that haven't been installed yet (T352113)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 00:01 tstarling@deploy2002: Started scap sync-world: Backport for [[gerrit:1094126{{!}}Move default main page text for new wikis to config (T352113)]], [[gerrit:1096839{{!}}Introduce preinstall.dblist for wikis that haven't been installed yet (T352113)]] * 00:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2154 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71272 and previous config saved to /var/cache/conftool/dbconfig/20241128-000046-ladsgroup.json * 00:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2154.codfw.wmnet with reason: Maintenance * 00:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2154.codfw.wmnet with reason: Maintenance * 00:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71271 and previous config saved to /var/cache/conftool/dbconfig/20241128-000023-ladsgroup.json == 2024-11-27 == * 23:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P71270 and previous config saved to /var/cache/conftool/dbconfig/20241127-234518-ladsgroup.json * 23:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P71269 and previous config saved to /var/cache/conftool/dbconfig/20241127-233011-ladsgroup.json * 23:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71267 and previous config saved to /var/cache/conftool/dbconfig/20241127-231504-ladsgroup.json * 23:09 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1098633{{!}}Fix mobile domain logic for login.wikimedia.org (T380646)]] (duration: 18m 07s) * 23:02 tgr@deploy2002: tgr: Continuing with sync * 23:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2152 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71264 and previous config saved to /var/cache/conftool/dbconfig/20241127-230159-ladsgroup.json * 23:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2152.codfw.wmnet with reason: Maintenance * 23:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2152.codfw.wmnet with reason: Maintenance * 22:56 tgr@deploy2002: tgr: Backport for [[gerrit:1098633{{!}}Fix mobile domain logic for login.wikimedia.org (T380646)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 22:52 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance * 22:52 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance * 22:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71263 and previous config saved to /var/cache/conftool/dbconfig/20241127-225159-ladsgroup.json * 22:51 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1098633{{!}}Fix mobile domain logic for login.wikimedia.org (T380646)]] * 22:46 cjming: end of UTC late backport window * 22:44 cjming@deploy2002: Finished scap sync-world: Backport for [[gerrit:1098572{{!}}Turn on Parsoid Read views on jawikivoyage (T380769)]] (duration: 15m 22s) * 22:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P71262 and previous config saved to /var/cache/conftool/dbconfig/20241127-223652-ladsgroup.json * 22:35 cjming@deploy2002: cscott, cjming: Continuing with sync * 22:35 cjming@deploy2002: cscott, cjming: Backport for [[gerrit:1098572{{!}}Turn on Parsoid Read views on jawikivoyage (T380769)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 22:29 cjming@deploy2002: Started scap sync-world: Backport for [[gerrit:1098572{{!}}Turn on Parsoid Read views on jawikivoyage (T380769)]] * 22:27 cjming@deploy2002: Finished scap sync-world: Backport for [[gerrit:1098581{{!}}Bump wikimedia/parsoid to 0.21.0-a9 (T373035 T380664)]], [[gerrit:1098583{{!}}Bump wikimedia/parsoid to 0.21.0-a9 (T380664)]] (duration: 42m 38s) * 22:26 bking@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - bking@cumin1002" * 22:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P71261 and previous config saved to /var/cache/conftool/dbconfig/20241127-222145-ladsgroup.json * 22:13 cjming@deploy2002: arlolra, cjming: Continuing with sync * 22:11 bking@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:09 cjming@deploy2002: arlolra, cjming: Backport for [[gerrit:1098581{{!}}Bump wikimedia/parsoid to 0.21.0-a9 (T373035 T380664)]], [[gerrit:1098583{{!}}Bump wikimedia/parsoid to 0.21.0-a9 (T380664)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 22:07 bking@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71260 and previous config saved to /var/cache/conftool/dbconfig/20241127-220638-ladsgroup.json * 21:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1226 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71259 and previous config saved to /var/cache/conftool/dbconfig/20241127-215407-ladsgroup.json * 21:54 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:53 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:45 cjming@deploy2002: Started scap sync-world: Backport for [[gerrit:1098581{{!}}Bump wikimedia/parsoid to 0.21.0-a9 (T373035 T380664)]], [[gerrit:1098583{{!}}Bump wikimedia/parsoid to 0.21.0-a9 (T380664)]] * 21:43 cjming@deploy2002: Finished scap sync-world: Backport for [[gerrit:1098567{{!}}Revert "Normalize ref html before comparison" (T380977)]] (duration: 12m 49s) * 21:40 bking@cumin1002: START - Cookbook sre.hosts.reimage for host wdqs1027.eqiad.wmnet with OS bullseye * 21:38 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:38 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71258 and previous config saved to /var/cache/conftool/dbconfig/20241127-213759-ladsgroup.json * 21:37 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1026.eqiad.wmnet with OS bullseye * 21:37 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - bking@cumin2002" * 21:37 cjming@deploy2002: cjming, cscott: Continuing with sync * 21:37 cjming@deploy2002: cjming, cscott: Backport for [[gerrit:1098567{{!}}Revert "Normalize ref html before comparison" (T380977)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:31 cjming@deploy2002: Started scap sync-world: Backport for [[gerrit:1098567{{!}}Revert "Normalize ref html before comparison" (T380977)]] * 21:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P71257 and previous config saved to /var/cache/conftool/dbconfig/20241127-212252-ladsgroup.json * 21:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P71256 and previous config saved to /var/cache/conftool/dbconfig/20241127-211704-ladsgroup.json * 21:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P71255 and previous config saved to /var/cache/conftool/dbconfig/20241127-210745-ladsgroup.json * 21:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2029', diff saved to https://phabricator.wikimedia.org/P71254 and previous config saved to /var/cache/conftool/dbconfig/20241127-210157-ladsgroup.json * 20:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71253 and previous config saved to /var/cache/conftool/dbconfig/20241127-205238-ladsgroup.json * 20:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2029', diff saved to https://phabricator.wikimedia.org/P71252 and previous config saved to /var/cache/conftool/dbconfig/20241127-204650-ladsgroup.json * 20:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2237.codfw.wmnet with reason: Optimize ([[phab:T379813|T379813]]) * 20:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2237.codfw.wmnet with reason: Optimize ([[phab:T379813|T379813]]) * 20:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2237 depool ([[phab:T379813|T379813]])', diff saved to https://phabricator.wikimedia.org/P71251 and previous config saved to /var/cache/conftool/dbconfig/20241127-204450-ladsgroup.json * 20:38 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - bking@cumin2002" * 20:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1214 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71250 and previous config saved to /var/cache/conftool/dbconfig/20241127-203724-ladsgroup.json * 20:37 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:36 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71249 and previous config saved to /var/cache/conftool/dbconfig/20241127-203650-ladsgroup.json * 20:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P71248 and previous config saved to /var/cache/conftool/dbconfig/20241127-203143-ladsgroup.json * 20:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es2029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P71247 and previous config saved to /var/cache/conftool/dbconfig/20241127-202446-ladsgroup.json * 20:24 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2029.codfw.wmnet with reason: Maintenance * 20:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2029.codfw.wmnet with reason: Maintenance * 20:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2034 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P71246 and previous config saved to /var/cache/conftool/dbconfig/20241127-202420-ladsgroup.json * 20:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P71245 and previous config saved to /var/cache/conftool/dbconfig/20241127-202143-ladsgroup.json * 20:20 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1026.eqiad.wmnet with reason: host reimage * 20:18 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1026.eqiad.wmnet with reason: host reimage * 20:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2034', diff saved to https://phabricator.wikimedia.org/P71244 and previous config saved to /var/cache/conftool/dbconfig/20241127-200913-ladsgroup.json * 20:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P71243 and previous config saved to /var/cache/conftool/dbconfig/20241127-200636-ladsgroup.json * 19:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2034', diff saved to https://phabricator.wikimedia.org/P71242 and previous config saved to /var/cache/conftool/dbconfig/20241127-195406-ladsgroup.json * 19:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71241 and previous config saved to /var/cache/conftool/dbconfig/20241127-195129-ladsgroup.json * 19:50 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1026.eqiad.wmnet with OS bullseye * 19:50 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host wdqs1025.eqiad.wmnet with OS bullseye * 19:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2034 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P71240 and previous config saved to /var/cache/conftool/dbconfig/20241127-193858-ladsgroup.json * 19:36 moritzm: imported jenkins 2.479.2 to thirdparty/ci for bullseye-wikimedia * 19:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1211 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71239 and previous config saved to /var/cache/conftool/dbconfig/20241127-193529-ladsgroup.json * 19:35 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71238 and previous config saved to /var/cache/conftool/dbconfig/20241127-193507-ladsgroup.json * 19:34 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1025.eqiad.wmnet with OS bullseye * 19:32 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host wdqs1025.eqiad.wmnet with OS bullseye * 19:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es2034 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P71237 and previous config saved to /var/cache/conftool/dbconfig/20241127-193202-ladsgroup.json * 19:31 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2034.codfw.wmnet with reason: Maintenance * 19:31 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2034.codfw.wmnet with reason: Maintenance * 19:25 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1027.eqiad.wmnet with OS bullseye * 19:24 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1025.eqiad.wmnet with OS bullseye * 19:23 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wdqs1025.eqiad.wmnet with OS bullseye * 19:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P71236 and previous config saved to /var/cache/conftool/dbconfig/20241127-192000-ladsgroup.json * 19:18 brett@cumin2002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site magru [reason: repool magru, [[phab:T376737|T376737]]] * 19:18 brett@cumin2002: START - Cookbook sre.dns.admin DNS admin: pool site magru [reason: repool magru, [[phab:T376737|T376737]]] * 19:17 mforns@deploy2002: Finished deploy [airflow-dags/analytics@99032bf]: regular weekly train (duration: 03m 10s) * 19:14 mforns@deploy2002: Started deploy [airflow-dags/analytics@99032bf]: regular weekly train * 19:13 mutante: disabled puppet on R:scap::target (180 hosts) for a short time - deploying gerrit:1092841 * 19:09 brett@puppetserver1001: conftool action : set/pooled=yes; selector: dc=magru,service=cdn * 19:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P71235 and previous config saved to /var/cache/conftool/dbconfig/20241127-190453-ladsgroup.json * 19:02 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1025.eqiad.wmnet with OS bullseye * 18:56 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wdqs1025.eqiad.wmnet with OS bullseye * 18:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71233 and previous config saved to /var/cache/conftool/dbconfig/20241127-184946-ladsgroup.json * 18:47 fabfur@cumin1002: conftool action : set/pooled=yes; selector: cluster=dnsbox,dc=magru * 18:38 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 16 hosts * 18:38 fabfur@cumin1002: START - Cookbook sre.hosts.remove-downtime for 16 hosts * 18:38 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs7003.magru.wmnet * 18:38 fabfur@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs7003.magru.wmnet * 18:38 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs7002.magru.wmnet * 18:38 fabfur@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs7002.magru.wmnet * 18:38 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs7001.magru.wmnet * 18:38 fabfur@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs7001.magru.wmnet * 18:38 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for dns7002.wikimedia.org * 18:38 fabfur@cumin1002: START - Cookbook sre.hosts.remove-downtime for dns7002.wikimedia.org * 18:37 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for dns7001.wikimedia.org * 18:37 fabfur@cumin1002: START - Cookbook sre.hosts.remove-downtime for dns7001.wikimedia.org * 18:37 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dns7001.wikimedia.org with reason: [[phab:T380307|T380307]] * 18:37 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on dns7001.wikimedia.org with reason: [[phab:T380307|T380307]] * 18:36 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1025.eqiad.wmnet with OS bullseye * 18:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1209 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71232 and previous config saved to /var/cache/conftool/dbconfig/20241127-183455-ladsgroup.json * 18:34 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:34 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71231 and previous config saved to /var/cache/conftool/dbconfig/20241127-183432-ladsgroup.json * 18:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P71230 and previous config saved to /var/cache/conftool/dbconfig/20241127-181925-ladsgroup.json * 18:05 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wdqs1027.eqiad.wmnet with OS bullseye * 18:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P71229 and previous config saved to /var/cache/conftool/dbconfig/20241127-180418-ladsgroup.json * 17:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71228 and previous config saved to /var/cache/conftool/dbconfig/20241127-174911-ladsgroup.json * 17:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1203 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71227 and previous config saved to /var/cache/conftool/dbconfig/20241127-173426-ladsgroup.json * 17:34 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:34 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71226 and previous config saved to /var/cache/conftool/dbconfig/20241127-173403-ladsgroup.json * 17:33 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply * 17:33 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply * 17:33 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventstreams: apply * 17:32 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply * 17:32 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply * 17:32 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply * 17:32 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/eventstreams: apply * 17:31 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply * 17:31 jiji@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-main: apply * 17:31 jiji@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams: apply * 17:28 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply * 17:27 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply * 17:27 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply * 17:27 jiji@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply * 17:25 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:24 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 17:24 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:23 jiji@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 17:20 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/benthos-cache-invalidator: apply * 17:19 jiji@deploy2002: helmfile [staging] START helmfile.d/services/benthos-cache-invalidator: apply * 17:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P71225 and previous config saved to /var/cache/conftool/dbconfig/20241127-171857-ladsgroup.json * 17:17 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7010.magru.wmnet with OS bullseye * 17:16 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/benthos-cache-invalidator: apply * 17:16 jiji@deploy2002: helmfile [staging] START helmfile.d/services/benthos-cache-invalidator: apply * 17:14 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kafka-main1007.eqiad.wmnet * 17:14 jiji@cumin1002: START - Cookbook sre.hosts.remove-downtime for kafka-main1007.eqiad.wmnet * 17:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P71224 and previous config saved to /var/cache/conftool/dbconfig/20241127-170350-ladsgroup.json * 16:56 jiji@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 16:55 jiji@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'. * 16:55 jiji@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 16:55 jiji@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 16:55 jiji@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'. * 16:54 jiji@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'. * 16:54 jiji@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 16:54 jiji@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 16:54 jiji@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 16:53 jiji@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 16:53 jiji@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 16:53 jiji@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 16:53 jiji@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 16:52 jiji@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 16:52 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 16:52 jiji@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'. * 16:52 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 16:52 jiji@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'. * 16:51 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7010.magru.wmnet with reason: host reimage * 16:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71222 and previous config saved to /var/cache/conftool/dbconfig/20241127-164843-ladsgroup.json * 16:47 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7010.magru.wmnet with reason: host reimage * 16:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1192 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71221 and previous config saved to /var/cache/conftool/dbconfig/20241127-163407-ladsgroup.json * 16:34 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:33 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71220 and previous config saved to /var/cache/conftool/dbconfig/20241127-163344-ladsgroup.json * 16:27 jiji@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-main-eqiad * 16:26 fabfur@cumin1002: START - Cookbook sre.hosts.reimage for host cp7010.magru.wmnet with OS bullseye * 16:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P71218 and previous config saved to /var/cache/conftool/dbconfig/20241127-161837-ladsgroup.json * 16:16 jiji@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-main-eqiad * 16:12 effie: roll restarting kafka-main brokers - [[phab:T363214|T363214]] * 16:11 moritzm: installing distro-info-data updates from bookworm point release * 16:11 fabfur@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:11 fabfur@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp70101 - fabfur@cumin1002" * 16:11 fabfur@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp70101 - fabfur@cumin1002" * 16:05 fabfur@cumin1002: START - Cookbook sre.dns.netbox * 16:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P71217 and previous config saved to /var/cache/conftool/dbconfig/20241127-160330-ladsgroup.json * 15:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71216 and previous config saved to /var/cache/conftool/dbconfig/20241127-154823-ladsgroup.json * 15:41 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1027.eqiad.wmnet with OS bullseye * 15:41 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1026.eqiad.wmnet with OS bullseye * 15:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T370903|T370903]])', diff saved to https://phabricator.wikimedia.org/P71215 and previous config saved to /var/cache/conftool/dbconfig/20241127-153316-ladsgroup.json * 15:33 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:32 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:32 ecarg@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 15:31 ecarg@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 15:30 ecarg@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 15:30 ecarg@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 15:28 ecarg@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 15:27 ecarg@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 15:22 ecarg@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 15:22 ecarg@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 15:21 ecarg@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 15:20 ecarg@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 15:09 ecarg@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 15:08 ecarg@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 15:08 Krinkle: krinkle@webperf2003: `sudo apt-get install kafkacat` (matching webperf1003, for ad-hoc debugging) * 15:05 kart_: Updated recommendation-api to 2024-11-27-142924-production ([[phab:T380838|T380838]], [[phab:T379036|T379036]], [[phab:T380699|T380699]]) * 15:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1003.eqiad.wmnet to plain * 15:03 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1003.eqiad.wmnet to plain * 15:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1022.eqiad.wmnet * 15:02 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1022.eqiad.wmnet * 15:01 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1003.eqiad.wmnet to drbd * 14:59 kartik@deploy2002: helmfile [ml-serve-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 14:58 kartik@deploy2002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 14:51 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1003.eqiad.wmnet to drbd * 14:48 kartik@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 14:43 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1022.eqiad.wmnet * 14:39 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1022.eqiad.wmnet * 14:35 moritzm: rebalance magru01 following switch of VMs back to DRBD [[phab:T376737|T376737]] * 14:33 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on doh[7001-7002].wikimedia.org with reason: site is depooled, maintenance * 14:33 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on doh[7001-7002].wikimedia.org with reason: site is depooled, maintenance * 14:33 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1097309{{!}}[GrowthExperiments] Undefine wgGEDatabaseCluster (T354939)]] (duration: 12m 21s) * 14:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of doh7001.wikimedia.org to drbd * 14:26 urbanecm@deploy2002: urbanecm: Continuing with sync * 14:26 urbanecm@deploy2002: urbanecm: Backport for [[gerrit:1097309{{!}}[GrowthExperiments] Undefine wgGEDatabaseCluster (T354939)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:25 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on cloudvirt1061.eqiad.wmnet with reason: cloudvirt1061 needs maintenance [[phab:T380673|T380673]] * 14:25 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on cloudvirt1061.eqiad.wmnet with reason: cloudvirt1061 needs maintenance [[phab:T380673|T380673]] * 14:24 urbanecm: Purge https://en.wikipedia.org/static/images/mobile/copyright/wikiquote-wordmark-az.svg ([[phab:T380974|T380974]]) * 14:21 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wdqs1027.eqiad.wmnet with OS bullseye * 14:21 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wdqs1026.eqiad.wmnet with OS bullseye * 14:21 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1097309{{!}}[GrowthExperiments] Undefine wgGEDatabaseCluster (T354939)]] * 14:20 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1098076{{!}}Enable ParserMigration compact indicator on all wikis (T363484)]], [[gerrit:1093405{{!}}Deploy Parsoid Read Views to de/ru wikivoyage and dagwiki (T375394 T380401)]], [[gerrit:1098019{{!}}Updated wordmark for Azerbaijani Wikiquote (T380974)]] (duration: 17m 20s) * 14:20 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of doh7001.wikimedia.org to drbd * 14:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of durum7001.magru.wmnet to drbd * 14:13 urbanecm@deploy2002: urbanecm, cscott, nmw03: Continuing with sync * 14:09 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of durum7001.magru.wmnet to drbd * 14:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir7001.magru.wmnet to drbd * 14:08 urbanecm@deploy2002: urbanecm, cscott, nmw03: Backport for [[gerrit:1098076{{!}}Enable ParserMigration compact indicator on all wikis (T363484)]], [[gerrit:1093405{{!}}Deploy Parsoid Read Views to de/ru wikivoyage and dagwiki (T375394 T380401)]], [[gerrit:1098019{{!}}Updated wordmark for Azerbaijani Wikiquote (T380974)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:03 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1098076{{!}}Enable ParserMigration compact indicator on all wikis (T363484)]], [[gerrit:1093405{{!}}Deploy Parsoid Read Views to de/ru wikivoyage and dagwiki (T375394 T380401)]], [[gerrit:1098019{{!}}Updated wordmark for Azerbaijani Wikiquote (T380974)]] * 13:59 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir7001.magru.wmnet to drbd * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of install7001.wikimedia.org to drbd * 13:45 moritzm: installing php8.2 security updates * 13:40 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of install7001.wikimedia.org to drbd * 13:39 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of netflow7001.magru.wmnet to drbd * 13:38 mszabo@deploy2002: Finished scap sync-world: Backport for [[gerrit:1098506{{!}}private: Add stub for wgReportIncidentZendeskSubjectLine (T380868)]], [[gerrit:1098480{{!}}Configure IRS Zendesk integration (T380908)]], [[gerrit:1093389{{!}}Configure instrument for the Incident Reporting System (T372823)]] (duration: 13m 53s) * 13:31 mszabo@deploy2002: mszabo: Continuing with sync * 13:30 mszabo@deploy2002: mszabo: Backport for [[gerrit:1098506{{!}}private: Add stub for wgReportIncidentZendeskSubjectLine (T380868)]], [[gerrit:1098480{{!}}Configure IRS Zendesk integration (T380908)]], [[gerrit:1093389{{!}}Configure instrument for the Incident Reporting System (T372823)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 13:28 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of netflow7001.magru.wmnet to drbd * 13:27 moritzm: rebalance magru02 following switch of VMs back to DRBD [[phab:T376737|T376737]] * 13:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of durum7002.magru.wmnet to drbd * 13:24 mszabo@deploy2002: Started scap sync-world: Backport for [[gerrit:1098506{{!}}private: Add stub for wgReportIncidentZendeskSubjectLine (T380868)]], [[gerrit:1098480{{!}}Configure IRS Zendesk integration (T380908)]], [[gerrit:1093389{{!}}Configure instrument for the Incident Reporting System (T372823)]] * 13:20 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wdqs1026.eqiad.wmnet with OS bullseye * 13:20 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wdqs1025.eqiad.wmnet with OS bullseye * 13:20 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wdqs1027.eqiad.wmnet with OS bullseye * 13:16 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of durum7002.magru.wmnet to drbd * 13:15 kartik@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 13:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of doh7002.wikimedia.org to drbd * 13:05 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of doh7002.wikimedia.org to drbd * 13:05 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of prometheus7001.magru.wmnet to drbd * 12:56 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on kafka-main[1002,1007].eqiad.wmnet with reason: Hardware refresh * 12:56 jiji@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on kafka-main[1002,1007].eqiad.wmnet with reason: Hardware refresh * 12:50 moritzm: installing ghostscript security updates * 12:39 kartik@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 12:38 effie: start replacing kafka-main1002 with kafka-main1007 - [[phab:T363214|T363214]] * 12:24 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply * 12:24 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply * 12:24 kart_: Updated cxserver to 2024-11-20-121713-production ([[phab:T377966|T377966]], [[phab:T357950|T357950]]) * 12:22 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 12:22 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 12:20 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 12:20 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply * 12:18 moritzm: installing python-cryptography security updates * 12:14 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 12:13 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply * 12:12 moritzm: installing openssl security updates * 12:08 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/zotero: apply * 12:07 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/zotero: apply * 12:06 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/zotero: apply * 12:06 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of prometheus7001.magru.wmnet to drbd * 12:06 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/zotero: apply * 12:05 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/zotero: apply * 12:05 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/zotero: apply * 12:05 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 12:05 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply * 12:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on ganeti2042.codfw.wmnet with reason: broken CPU * 12:03 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on ganeti2042.codfw.wmnet with reason: broken CPU * 11:54 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of bast7001.wikimedia.org to drbd * 11:45 ladsgroup@deploy2002: Finished scap sync-world: Backport for [[gerrit:1098484{{!}}Bump ratio of new parsercache key spec to 3 (T373037)]] (duration: 12m 51s) * 11:38 ladsgroup@deploy2002: ladsgroup: Continuing with sync * 11:38 ladsgroup@deploy2002: ladsgroup: Backport for [[gerrit:1098484{{!}}Bump ratio of new parsercache key spec to 3 (T373037)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 11:34 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of bast7001.wikimedia.org to drbd * 11:32 ladsgroup@deploy2002: Started scap sync-world: Backport for [[gerrit:1098484{{!}}Bump ratio of new parsercache key spec to 3 (T373037)]] * 11:29 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir7002.magru.wmnet to drbd * 11:21 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dns7002.wikimedia.org with reason: [[phab:T376737|T376737]] * 11:21 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on dns7002.wikimedia.org with reason: [[phab:T376737|T376737]] * 11:21 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dns7001.wikimedia.org with reason: [[phab:T376737|T376737]] * 11:20 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on dns7001.wikimedia.org with reason: [[phab:T376737|T376737]] * 11:19 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on lvs[7001-7003].magru.wmnet with reason: [[phab:T376737|T376737]] * 11:19 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on lvs[7001-7003].magru.wmnet with reason: [[phab:T376737|T376737]] * 11:19 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 16 hosts with reason: [[phab:T376737|T376737]] * 11:19 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on 16 hosts with reason: [[phab:T376737|T376737]] * 11:18 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir7002.magru.wmnet to drbd * 11:16 xSavitar: [[phab:T380875|T380875]] Ran mwscript-k8s --comment="[[phab:T380875|T380875]]" -f -- extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=metawiki --logwiki=metawiki 'EMBakeryEquipment' 'Janapanna' * 11:15 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti7002.magru.wmnet to cluster magru02 and group B4 * 11:13 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti7002.magru.wmnet to cluster magru02 and group B4 * 11:13 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs7001.magru.wmnet with reason: [[phab:T376737|T376737]] * 11:13 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs7001.magru.wmnet with reason: [[phab:T376737|T376737]] * 11:04 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'article-models' for release 'main' . * 11:03 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti7001.magru.wmnet to cluster magru01 and group B3 * 11:02 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti7001.magru.wmnet to cluster magru01 and group B3 * 10:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti7002.magru.wmnet * 10:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti7002.magru.wmnet * 10:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti7001.magru.wmnet * 10:10 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp7008.mgmt.magru.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 10:08 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti7001.magru.wmnet * 10:01 fabfur@cumin1002: START - Cookbook sre.hosts.provision for host cp7008.mgmt.magru.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 10:00 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti7002 * 09:59 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti7002 * 09:59 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti7001 * 09:58 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti7001 * 09:55 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp7006.mgmt.magru.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 09:48 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "readded ganeti nodes in magru - jmm@cumin2002 - [[phab:T376737|T376737]]" * 09:48 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "readded ganeti nodes in magru - jmm@cumin2002 - [[phab:T376737|T376737]]" * 09:46 fabfur@cumin1002: START - Cookbook sre.hosts.provision for host cp7006.mgmt.magru.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 09:45 hashar@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.5 refs [[phab:T375664|T375664]] * 09:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti7001.magru.wmnet with OS bookworm * 09:31 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 09:30 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 09:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti7001.magru.wmnet with reason: host reimage * 09:06 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1098413{{!}}ext.uls.inputsettings: Use arrow functions (T380431)]] (duration: 16m 06s) * 09:05 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti7001.magru.wmnet with reason: host reimage * 08:59 kartik@deploy2002: abi, kartik: Continuing with sync * 08:55 kartik@deploy2002: abi, kartik: Backport for [[gerrit:1098413{{!}}ext.uls.inputsettings: Use arrow functions (T380431)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:50 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1098413{{!}}ext.uls.inputsettings: Use arrow functions (T380431)]] * 08:45 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti7001.magru.wmnet with OS bookworm * 08:38 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1098415{{!}}Fix illegal access of typed property. (T380724)]] (duration: 21m 02s) * 08:31 kartik@deploy2002: kartik, abi: Continuing with sync * 08:24 kartik@deploy2002: kartik, abi: Backport for [[gerrit:1098415{{!}}Fix illegal access of typed property. (T380724)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti7002.magru.wmnet with OS bookworm * 08:24 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 08:23 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 08:17 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1098415{{!}}Fix illegal access of typed property. (T380724)]] * 08:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti7002.magru.wmnet with reason: host reimage * 07:57 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti7002.magru.wmnet with reason: host reimage * 07:34 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti7002.magru.wmnet with OS bookworm * 07:24 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 07:23 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. == 2024-11-26 == * 23:29 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns7002.wikimedia.org with OS bookworm * 23:29 brett@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002" * 23:28 brett@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002" * 23:28 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs7001.magru.wmnet with OS bullseye * 23:28 brett@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002" * 23:23 brett@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002" * 23:13 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7010.magru.wmnet with OS bullseye * 23:13 brett@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002" * 23:12 brett@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002" * 23:03 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs7001.magru.wmnet with reason: host reimage * 23:00 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs7001.magru.wmnet with reason: host reimage * 22:54 reedy@deploy2002: Finished scap sync-world: Backport for [[gerrit:1098161{{!}}Add CodeMirror to BetaFeaturesAllowList (T376735)]] (duration: 31m 35s) * 22:51 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns7002.wikimedia.org with reason: host reimage * 22:48 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns7002.wikimedia.org with reason: host reimage * 22:48 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7010.magru.wmnet with reason: host reimage * 22:45 reedy@deploy2002: musikanimal, reedy: Continuing with sync * 22:44 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7010.magru.wmnet with reason: host reimage * 22:40 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs7001.magru.wmnet with OS bullseye * 22:37 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7002.magru.wmnet with OS bullseye * 22:37 brett@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002" * 22:32 brett@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002" * 22:28 reedy@deploy2002: musikanimal, reedy: Backport for [[gerrit:1098161{{!}}Add CodeMirror to BetaFeaturesAllowList (T376735)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 22:26 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'configure' for AS: 4800 * 22:25 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp7010.magru.wmnet with OS bullseye * 22:24 brett@cumin2002: START - Cookbook sre.hosts.reimage for host dns7002.wikimedia.org with OS bookworm * 22:24 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dns7002.wikimedia.org with OS bullseye * 22:24 cmooney@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 4800 * 22:22 reedy@deploy2002: Started scap sync-world: Backport for [[gerrit:1098161{{!}}Add CodeMirror to BetaFeaturesAllowList (T376735)]] * 22:21 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns7002.wikimedia.org with reason: host reimage * 22:21 reedy@deploy2002: Finished scap sync-world: Backport for [[gerrit:1097484{{!}}Nov 26 2024: Vector 2022 Deployments (T379799)]] (duration: 19m 52s) * 22:18 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns7002.wikimedia.org with reason: host reimage * 22:15 cmooney@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 262979 * 22:14 cmooney@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 262979 * 22:11 reedy@deploy2002: jdlrobson, reedy: Continuing with sync * 22:08 reedy@deploy2002: jdlrobson, reedy: Backport for [[gerrit:1097484{{!}}Nov 26 2024: Vector 2022 Deployments (T379799)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 22:01 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7002.magru.wmnet with reason: host reimage * 22:01 reedy@deploy2002: Started scap sync-world: Backport for [[gerrit:1097484{{!}}Nov 26 2024: Vector 2022 Deployments (T379799)]] * 22:00 reedy@deploy2002: Finished scap sync-world: Backport for [[gerrit:1097591{{!}}Add BetaFeature for CodeMirror 6 (T376735)]] (duration: 40m 05s) * 21:58 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7004.magru.wmnet with OS bullseye * 21:58 brett@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002" * 21:58 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7002.magru.wmnet with reason: host reimage * 21:57 brett@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002" * 21:56 brett@cumin2002: START - Cookbook sre.hosts.reimage for host dns7002.wikimedia.org with OS bullseye * 21:46 reedy@deploy2002: musikanimal, reedy: Continuing with sync * 21:44 reedy@deploy2002: musikanimal, reedy: Backport for [[gerrit:1097591{{!}}Add BetaFeature for CodeMirror 6 (T376735)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:38 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp7002.magru.wmnet with OS bullseye * 21:35 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp7002.magru.wmnet with OS bullseye * 21:35 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns7002.wikimedia.org with OS bookworm * 21:35 brett@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cp7002.magru.wmnet dns7002.magru.wmnet on all recursors * 21:35 brett@cumin2002: START - Cookbook sre.dns.wipe-cache cp7002.magru.wmnet dns7002.magru.wmnet on all recursors * 21:35 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs7001.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:34 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp7010.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:32 robh@cumin2002: START - Cookbook sre.hosts.provision for host lvs7001.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:32 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp7010.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:32 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:32 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru shuffle - robh@cumin2002" * 21:32 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7004.magru.wmnet with reason: host reimage * 21:32 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru shuffle - robh@cumin2002" * 21:30 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs7001 * 21:30 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7003.magru.wmnet with OS bullseye * 21:30 brett@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002" * 21:30 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs7001 * 21:30 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp7010 * 21:30 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp7010 * 21:29 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7004.magru.wmnet with reason: host reimage * 21:28 robh@cumin2002: START - Cookbook sre.dns.netbox * 21:28 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:26 brett@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002" * 21:25 robh@cumin2002: START - Cookbook sre.dns.netbox * 21:24 damilare: civicrm upgraded from {{Gerrit|59d340cd}} to {{Gerrit|3b1ed162}} * 21:23 damilare: SmashPig upgraded from {{Gerrit|131e92a5}} to {{Gerrit|79b463b4}} * 21:22 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts lvs7001.magru.wmnet * 21:22 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:21 brett@cumin2002: START - Cookbook sre.hosts.reimage for host dns7002.wikimedia.org with OS bookworm * 21:20 reedy@deploy2002: Started scap sync-world: Backport for [[gerrit:1097591{{!}}Add BetaFeature for CodeMirror 6 (T376735)]] * 21:20 robh@cumin2002: START - Cookbook sre.dns.netbox * 21:20 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cp7010.magru.wmnet * 21:20 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:20 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp7010.magru.wmnet decommissioned, removing all IPs except the asset tag one - robh@cumin2002" * 21:19 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp7010.magru.wmnet decommissioned, removing all IPs except the asset tag one - robh@cumin2002" * 21:17 reedy@deploy2002: Synchronized wmf-config/core-Permissions.php: [[phab:T380753|T380753]] (duration: 11m 23s) * 21:16 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp7002.magru.wmnet with OS bullseye * 21:15 robh@cumin2002: START - Cookbook sre.dns.netbox * 21:10 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp7004.magru.wmnet with OS bullseye * 21:08 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp7004.magru.wmnet with OS bullseye * 21:08 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp7010.magru.wmnet * 21:08 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts lvs7001.magru.wmnet * 21:04 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dns7002.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:04 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp7002.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:02 brett@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cp7003.magru.wmnet cp7004.magru.wmnet on all recursors * 21:02 brett@cumin2002: START - Cookbook sre.dns.wipe-cache cp7003.magru.wmnet cp7004.magru.wmnet on all recursors * 21:02 robh@cumin2002: START - Cookbook sre.hosts.provision for host dns7002.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:02 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp7002.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:01 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:01 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru shuffle - robh@cumin2002" * 21:01 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dns7002 * 21:01 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru shuffle - robh@cumin2002" * 21:01 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host dns7002 * 21:01 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp7002 * 21:01 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp7002 * 20:58 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7003.magru.wmnet with reason: host reimage * 20:54 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7003.magru.wmnet with reason: host reimage * 20:54 robh@cumin2002: START - Cookbook sre.dns.netbox * 20:50 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cp7002.magru.wmnet * 20:50 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:47 robh@cumin2002: START - Cookbook sre.dns.netbox * 20:47 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts dns7002.wikimedia.org * 20:47 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:47 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns7002.wikimedia.org decommissioned, removing all IPs except the asset tag one - robh@cumin2002" * 20:47 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns7002.wikimedia.org decommissioned, removing all IPs except the asset tag one - robh@cumin2002" * 20:44 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp7004.magru.wmnet with OS bullseye * 20:43 robh@cumin2002: START - Cookbook sre.dns.netbox * 20:39 dzahn@cumin2002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: security release {{Gerrit|20241126}} * 20:37 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp7002.magru.wmnet * 20:37 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts dns7002.wikimedia.org * 20:35 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp7003.magru.wmnet with OS bullseye * 20:34 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp7004.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:32 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1027.eqiad.wmnet with OS bullseye * 20:32 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1026.eqiad.wmnet with OS bullseye * 20:32 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1025.eqiad.wmnet with OS bullseye * 20:32 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti7002.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:31 swfrench@deploy2002: Finished scap sync-world: Backport for [[gerrit:1076848{{!}}debug.json: add support for mwdebug-next (T372605)]] (duration: 14m 21s) * 20:26 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp7004.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:26 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru shuffle - robh@cumin2002" * 20:26 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru shuffle - robh@cumin2002" * 20:25 robh@cumin2002: START - Cookbook sre.hosts.provision for host ganeti7002.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti7002 * 20:25 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti7002 * 20:24 swfrench@deploy2002: swfrench: Continuing with sync * 20:23 robh@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti7002 * 20:23 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti7002 * 20:23 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti7002.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:23 swfrench@deploy2002: swfrench: Backport for [[gerrit:1076848{{!}}debug.json: add support for mwdebug-next (T372605)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:22 robh@cumin2002: START - Cookbook sre.hosts.provision for host ganeti7002.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:21 robh@cumin2002: START - Cookbook sre.dns.netbox * 20:21 dzahn@cumin2002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: security release {{Gerrit|20241126}} * 20:17 swfrench@deploy2002: Started scap sync-world: Backport for [[gerrit:1076848{{!}}debug.json: add support for mwdebug-next (T372605)]] * 20:16 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cp7004.magru.wmnet * 20:16 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:16 aokoth@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Security Update * 20:14 robh@cumin2002: START - Cookbook sre.dns.netbox * 20:13 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ganeti7002.magru.wmnet * 20:13 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:13 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti7002.magru.wmnet decommissioned, removing all IPs except the asset tag one - robh@cumin2002" * 20:13 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti7002.magru.wmnet decommissioned, removing all IPs except the asset tag one - robh@cumin2002" * 20:11 hashar@deploy2002: Finished scap sync-world: Backport for [[gerrit:1098120{{!}}Avoid exception on mTemplateIds/mTemplate array discrepancy (T380862)]] (duration: 15m 23s) * 20:09 robh@cumin2002: START - Cookbook sre.dns.netbox * 20:08 aokoth@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Security Update * 20:07 aokoth@cumin1002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1003.wikimedia.org with reason: Security Update * 20:07 aokoth@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Security Update * 20:04 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp7004.magru.wmnet * 20:04 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti7002.magru.wmnet * 20:02 hashar@deploy2002: hashar: Continuing with sync * 20:02 hashar@deploy2002: hashar: Backport for [[gerrit:1098120{{!}}Avoid exception on mTemplateIds/mTemplate array discrepancy (T380862)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:00 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp7003.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:59 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti7001.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:55 hashar@deploy2002: Started scap sync-world: Backport for [[gerrit:1098120{{!}}Avoid exception on mTemplateIds/mTemplate array discrepancy (T380862)]] * 19:52 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp7003.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:52 robh@cumin2002: START - Cookbook sre.hosts.provision for host ganeti7001.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:51 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:51 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru shuffle - robh@cumin2002" * 19:50 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru shuffle - robh@cumin2002" * 19:49 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti7001 * 19:49 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti7001 * 19:49 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp7003 * 19:49 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp7003 * 19:46 robh@cumin2002: START - Cookbook sre.dns.netbox * 19:43 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cp7003.magru.wmnet * 19:43 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:43 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp7003.magru.wmnet decommissioned, removing all IPs except the asset tag one - robh@cumin2002" * 19:42 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp7003.magru.wmnet decommissioned, removing all IPs except the asset tag one - robh@cumin2002" * 19:33 robh@cumin2002: START - Cookbook sre.dns.netbox * 19:27 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp7003.magru.wmnet * 19:27 urbanecm: [urbanecm@mwmaint2002 ~]$ foreachwiki userOptions.php --delete-defaults growthexperiments-homepage-variant # [[phab:T379146|T379146]], logging to /home/urbanecm/T379146.log * 19:26 urbanecm: mwscript-k8s -f userOptions.php -- --wiki=enwiki --old=oldimpact --delete 'growthexperiments-homepage-variant' # [[phab:T379146|T379146]] * 19:23 robh@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti7001.magru.wmnet * 19:23 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:23 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti7001.magru.wmnet decommissioned, removing all IPs except the asset tag one - robh@cumin2002" * 19:23 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti7001.magru.wmnet decommissioned, removing all IPs except the asset tag one - robh@cumin2002" * 19:22 eileen: civicrm upgraded from {{Gerrit|eec961a3}} to {{Gerrit|59d340cd}} * 19:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2215 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P71191 and previous config saved to /var/cache/conftool/dbconfig/20241126-192112-ladsgroup.json * 19:12 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wdqs1027.eqiad.wmnet with OS bullseye * 19:12 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wdqs1026.eqiad.wmnet with OS bullseye * 19:12 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wdqs1025.eqiad.wmnet with OS bullseye * 19:11 robh@cumin2002: START - Cookbook sre.dns.netbox * 19:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2215 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P71190 and previous config saved to /var/cache/conftool/dbconfig/20241126-190607-ladsgroup.json * 18:55 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti7001.magru.wmnet * 18:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2215 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P71189 and previous config saved to /var/cache/conftool/dbconfig/20241126-185101-ladsgroup.json * 18:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2215 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P71188 and previous config saved to /var/cache/conftool/dbconfig/20241126-183556-ladsgroup.json * 18:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2215 repool', diff saved to https://phabricator.wikimedia.org/P71187 and previous config saved to /var/cache/conftool/dbconfig/20241126-183547-ladsgroup.json * 18:34 ladsgroup@cumin1002: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) db2215 gradually with 4 steps - Maint over * 18:33 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool db2215 gradually with 4 steps - Maint over * 18:25 jayme@cumin2002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on D<nowiki>{</nowiki>wikikube-ctrl200[1-3].codfw.wmnet<nowiki>}</nowiki> and (A:wikikube-worker-codfw or A:wikikube-master-codfw) * 18:10 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2215.codfw.wmnet with reason: Maintenance * 18:10 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2215.codfw.wmnet with reason: Maintenance * 17:58 jayme@cumin2002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on D<nowiki>{</nowiki>wikikube-ctrl200[1-3].codfw.wmnet<nowiki>}</nowiki> and (A:wikikube-worker-codfw or A:wikikube-master-codfw) * 17:47 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[1313-1327].eqiad.wmnet * 17:47 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[1313-1327].eqiad.wmnet * 17:35 claime: homer 'cr*eqiad*' commit '[[phab:T380350|T380350]]' * 17:35 claime: homer 'lsw1-e7-eqiad*' commit '[[phab:T380350|T380350]]' * 17:34 claime: homer 'lsw1-f6-eqiad*' commit '[[phab:T380350|T380350]]' * 17:34 claime: homer 'lsw1-f5-eqiad*' commit '[[phab:T380350|T380350]]' * 17:33 claime: homer 'lsw1-e5-eqiad*' commit '[[phab:T380350|T380350]]' * 17:32 claime: homer 'lsw1-e6-eqiad*' commit '[[phab:T380350|T380350]]' * 17:31 claime: homer 'lsw1-f7-eqiad*' commit '[[phab:T380350|T380350]]' * 17:28 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1321.eqiad.wmnet with OS bookworm * 17:25 jayme@cumin2002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on D<nowiki>{</nowiki>wikikube-ctrl100[1-3].eqiad.wmnet<nowiki>}</nowiki> and (A:wikikube-worker-eqiad or A:wikikube-master-eqiad) * 17:23 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1325.eqiad.wmnet with OS bookworm * 17:20 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1326.eqiad.wmnet with OS bookworm * 17:17 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1324.eqiad.wmnet with OS bookworm * 17:13 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1327.eqiad.wmnet with OS bookworm * 17:10 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1322.eqiad.wmnet with OS bookworm * 17:09 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1321.eqiad.wmnet with reason: host reimage * 17:06 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1321.eqiad.wmnet with reason: host reimage * 17:05 ladsgroup@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1020.eqiad.wmnet,service=s8 * 17:04 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1325.eqiad.wmnet with reason: host reimage * 17:03 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1323.eqiad.wmnet with OS bookworm * 17:01 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1326.eqiad.wmnet with reason: host reimage * 16:59 jayme@cumin2002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on D<nowiki>{</nowiki>wikikube-ctrl100[1-3].eqiad.wmnet<nowiki>}</nowiki> and (A:wikikube-worker-eqiad or A:wikikube-master-eqiad) * 16:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1324.eqiad.wmnet with reason: host reimage * 16:54 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1327.eqiad.wmnet with reason: host reimage * 16:51 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1322.eqiad.wmnet with reason: host reimage * 16:49 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1327.eqiad.wmnet with reason: host reimage * 16:49 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1326.eqiad.wmnet with reason: host reimage * 16:49 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1325.eqiad.wmnet with reason: host reimage * 16:48 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1324.eqiad.wmnet with reason: host reimage * 16:48 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1322.eqiad.wmnet with reason: host reimage * 16:46 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1321.eqiad.wmnet with OS bookworm * 16:45 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1323.eqiad.wmnet with reason: host reimage * 16:45 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:42 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1321.eqiad.wmnet with OS bookworm * 16:42 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 16:41 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1323.eqiad.wmnet with reason: host reimage * 16:40 urbanecm: `mwscript-k8s -f userOptions.php -- --wiki=enwiki --old=control --delete 'growthexperiments-homepage-variant'` # [[phab:T379146|T379146]], [[phab:T377631|T377631]] * 16:30 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1327.eqiad.wmnet with OS bookworm * 16:30 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1326.eqiad.wmnet with OS bookworm * 16:30 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1325.eqiad.wmnet with OS bookworm * 16:29 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1324.eqiad.wmnet with OS bookworm * 16:29 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1322.eqiad.wmnet with OS bookworm * 16:29 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1321.eqiad.wmnet with OS bookworm * 16:28 cgoubert@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-worker1321.eqiad.wmnet with OS bookworm * 16:28 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1327.eqiad.wmnet with OS bookworm * 16:27 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1326.eqiad.wmnet with OS bookworm * 16:27 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1325.eqiad.wmnet with OS bookworm * 16:27 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1324.eqiad.wmnet with OS bookworm * 16:26 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1322.eqiad.wmnet with OS bookworm * 16:22 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1323.eqiad.wmnet with OS bookworm * 16:20 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1323.eqiad.wmnet with OS bookworm * 15:52 moritzm: installing intel-microcode security updates * 15:49 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1327.eqiad.wmnet with OS bookworm * 15:49 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1326.eqiad.wmnet with OS bookworm * 15:48 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1325.eqiad.wmnet with OS bookworm * 15:48 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1324.eqiad.wmnet with OS bookworm * 15:47 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1323.eqiad.wmnet with OS bookworm * 15:46 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1322.eqiad.wmnet with OS bookworm * 15:45 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1321.eqiad.wmnet with OS bookworm * 15:42 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns7001.wikimedia.org with OS bookworm * 15:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2042.codfw.wmnet * 15:39 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1027.eqiad.wmnet with OS bullseye * 15:39 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1025.eqiad.wmnet with OS bullseye * 15:39 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1026.eqiad.wmnet with OS bullseye * 15:35 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2042.codfw.wmnet * 15:34 moritzm: installing wireshark security updates * 15:33 ladsgroup@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db2215 gradually with 4 steps - Maint over * 15:33 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool db2215 gradually with 4 steps - Maint over * 15:27 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply * 15:25 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply * 15:24 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1020.eqiad.wmnet * 15:22 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1020.eqiad.wmnet * 15:21 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1020.eqiad.wmnet * 15:19 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1020.eqiad.wmnet * 15:16 dcausse@deploy2002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply * 15:16 dcausse@deploy2002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply * 15:11 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns7001.wikimedia.org with reason: host reimage * away: UTC afternoon deploys done * 15:08 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply * 15:08 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply * 15:07 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns7001.wikimedia.org with reason: host reimage * 15:05 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1095082{{!}}Allow simulating the SUL3 shared domain settings via env var (T380575)]] (duration: 26m 23s) * 14:58 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on doh[7001-7002].wikimedia.org,durum[7001-7002].magru.wmnet with reason: site is depooled, maintenance * 14:58 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on doh[7001-7002].wikimedia.org,durum[7001-7002].magru.wmnet with reason: site is depooled, maintenance * 14:58 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh[7001-7002].wikimedia.org,durum[7001-7002].magru.wmnet with reason: site is depooled, maintenance * 14:58 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh[7001-7002].wikimedia.org,durum[7001-7002].magru.wmnet with reason: site is depooled, maintenance * 14:56 tgr@deploy2002: tgr: Continuing with sync * 14:44 tgr@deploy2002: tgr: Backport for [[gerrit:1095082{{!}}Allow simulating the SUL3 shared domain settings via env var (T380575)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:43 fabfur@cumin1002: START - Cookbook sre.hosts.reimage for host dns7001.wikimedia.org with OS bookworm * 14:43 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns7001.wikimedia.org with OS bullseye * 14:43 fabfur@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1002" * 14:40 fabfur@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1002" * 14:39 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1095082{{!}}Allow simulating the SUL3 shared domain settings via env var (T380575)]] * 14:31 mlitn@deploy2002: Finished scap sync-world: Backport for [[gerrit:1097983{{!}}Fix incorrect 'this']] (duration: 12m 36s) * 14:25 mlitn@deploy2002: mlitn: Continuing with sync * 14:25 mlitn@deploy2002: mlitn: Backport for [[gerrit:1097983{{!}}Fix incorrect 'this']] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:19 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wdqs1027.eqiad.wmnet with OS bullseye * 14:19 mlitn@deploy2002: Started scap sync-world: Backport for [[gerrit:1097983{{!}}Fix incorrect 'this']] * 14:19 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wdqs1025.eqiad.wmnet with OS bullseye * 14:19 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wdqs1026.eqiad.wmnet with OS bullseye * 14:19 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-search: apply * 14:18 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-search: apply * 14:14 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Add grid view - oblivian@cumin1002" * 14:14 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Add grid view - oblivian@cumin1002 * 14:14 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Add grid view - oblivian@cumin1002 * 14:13 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Add grid view - oblivian@cumin1002" * 14:09 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-search: apply * 14:09 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns7001.wikimedia.org with reason: host reimage * 14:08 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-search: apply * 14:08 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-search: apply * 14:08 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-search: apply * 14:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-search: apply * 14:05 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns7001.wikimedia.org with reason: host reimage * 14:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-search: apply * 14:01 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7015.magru.wmnet with OS bullseye * 14:01 fabfur@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1002" * 14:01 fabfur@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1002" * 13:54 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Reclone ([[phab:T379724|T379724]]) * 13:54 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Reclone ([[phab:T379724|T379724]]) * 13:54 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb1020.eqiad.wmnet with reason: Reclone ([[phab:T379724|T379724]]) * 13:54 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb1020.eqiad.wmnet with reason: Reclone ([[phab:T379724|T379724]]) * 13:49 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs7003.magru.wmnet with OS bullseye * 13:49 fabfur@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1002" * 13:49 ladsgroup@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1020.eqiad.wmnet,service=s8 * 13:46 fabfur@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1002" * 13:43 fabfur@cumin1002: START - Cookbook sre.hosts.reimage for host dns7001.wikimedia.org with OS bullseye * 13:40 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host dns7001.wikimedia.org * 13:38 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp7004.magru.wmnet with reason: [[phab:T376737|T376737]] * 13:38 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp7004.magru.wmnet with reason: [[phab:T376737|T376737]] * 13:35 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7015.magru.wmnet with reason: host reimage * 13:34 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp7010.magru.wmnet with reason: [[phab:T376737|T376737]] * 13:34 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp7010.magru.wmnet with reason: [[phab:T376737|T376737]] * 13:34 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp7003.magru.wmnet with reason: [[phab:T376737|T376737]] * 13:34 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp7003.magru.wmnet with reason: [[phab:T376737|T376737]] * 13:32 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7015.magru.wmnet with reason: host reimage * 13:30 Emperor: swift delete wikipedia-commons-local-public.bf b/bf/Schuur_-_Nieuwerbrug_-_20164513_-_RCE.jpg [[phab:T380738|T380738]] * 13:29 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp7003.magru.wmnet with reason: [[phab:T376737|T376737]] * 13:28 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp7003.magru.wmnet with reason: [[phab:T376737|T376737]] * 13:28 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp7002.magru.wmnet with reason: [[phab:T376737|T376737]] * 13:28 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp7002.magru.wmnet with reason: [[phab:T376737|T376737]] * 13:28 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cp7002.magru.wmnet with reason: [[phab:T376737|T376737]] * 13:28 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on cp7002.magru.wmnet with reason: [[phab:T376737|T376737]] * 13:27 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cp7008.magru.wmnet with reason: [[phab:T376737|T376737]] * 13:27 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on cp7008.magru.wmnet with reason: [[phab:T376737|T376737]] * 13:27 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cp7006.magru.wmnet with reason: [[phab:T376737|T376737]] * 13:26 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on cp7006.magru.wmnet with reason: [[phab:T376737|T376737]] * 13:26 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cp7001.magru.wmnet with reason: [[phab:T376737|T376737]] * 13:26 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on cp7001.magru.wmnet with reason: [[phab:T376737|T376737]] * 13:21 cmooney@cumin1002: START - Cookbook sre.hosts.dhcp for host dns7001.wikimedia.org * 13:20 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs7003.magru.wmnet with reason: host reimage * 13:18 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs7003.magru.wmnet with reason: host reimage * 13:15 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephmon1004.eqiad.wmnet with OS bullseye * 13:11 fabfur@cumin1002: START - Cookbook sre.hosts.reimage for host cp7015.magru.wmnet with OS bullseye * 13:11 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 100%: repool', diff saved to https://phabricator.wikimedia.org/P71185 and previous config saved to /var/cache/conftool/dbconfig/20241126-131120-arnaudb.json * 13:07 sukhe@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns7001.wikimedia.org with OS bookworm * 13:03 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host dns7001.wikimedia.org with OS bookworm * 12:58 fabfur@cumin1002: START - Cookbook sre.hosts.reimage for host lvs7003.magru.wmnet with OS bullseye * 12:57 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephmon1004.eqiad.wmnet with reason: host reimage * 12:56 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 75%: repool', diff saved to https://phabricator.wikimedia.org/P71183 and previous config saved to /var/cache/conftool/dbconfig/20241126-125614-arnaudb.json * 12:53 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on A:cp-text_esams and A:cp for 9.2.6-1wm2 * 12:53 dcaro@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephmon1004.eqiad.wmnet with reason: host reimage * 12:51 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on A:cp-upload_esams and A:cp for 9.2.6-1wm2 * 12:48 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp7015.magru.wmnet with OS bullseye * 12:41 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 50%: repool', diff saved to https://phabricator.wikimedia.org/P71182 and previous config saved to /var/cache/conftool/dbconfig/20241126-124109-arnaudb.json * 12:30 robh@cumin2002: START - Cookbook sre.hosts.reimage for host cp7015.magru.wmnet with OS bullseye * 12:26 arnaudb@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 100%: repool', diff saved to https://phabricator.wikimedia.org/P71181 and previous config saved to /var/cache/conftool/dbconfig/20241126-122622-arnaudb.json * 12:26 hashar@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.5 refs [[phab:T375664|T375664]] * 12:26 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 25%: repool', diff saved to https://phabricator.wikimedia.org/P71180 and previous config saved to /var/cache/conftool/dbconfig/20241126-122603-arnaudb.json * 12:23 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:20 robh@cumin2002: START - Cookbook sre.dns.netbox * 12:20 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs7003 * 12:20 moritzm: failover Ganeti master in magru02 to ganeti7004 * 12:20 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs7003 * 12:19 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp7015 * 12:19 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp7015 * 12:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of prometheus7001.magru.wmnet to plain * 12:15 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of prometheus7001.magru.wmnet to plain * 12:13 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of doh7002.wikimedia.org to plain * 12:11 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of doh7002.wikimedia.org to plain * 12:11 arnaudb@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 75%: repool', diff saved to https://phabricator.wikimedia.org/P71179 and previous config saved to /var/cache/conftool/dbconfig/20241126-121117-arnaudb.json * 12:10 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 20%: repool', diff saved to https://phabricator.wikimedia.org/P71178 and previous config saved to /var/cache/conftool/dbconfig/20241126-121058-arnaudb.json * 12:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of durum7002.magru.wmnet to plain * 12:10 ladsgroup@deploy2002: Finished scap sync-world: Backport for [[gerrit:1098006{{!}}Bump ratio of new parsercache key spec to 4 (T373037)]] (duration: 15m 21s) * 12:09 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of durum7002.magru.wmnet to plain * 12:07 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of bast7001.wikimedia.org to plain * 12:07 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of bast7001.wikimedia.org to plain * 12:05 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir7002.magru.wmnet to plain * 12:05 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir7002.magru.wmnet to plain * 12:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of prometheus7001.magru.wmnet to drbd * 12:02 ladsgroup@deploy2002: ladsgroup: Continuing with sync * 12:01 ladsgroup@deploy2002: ladsgroup: Backport for [[gerrit:1098006{{!}}Bump ratio of new parsercache key spec to 4 (T373037)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 11:56 arnaudb@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 50%: repool', diff saved to https://phabricator.wikimedia.org/P71177 and previous config saved to /var/cache/conftool/dbconfig/20241126-115612-arnaudb.json * 11:55 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 15%: repool', diff saved to https://phabricator.wikimedia.org/P71176 and previous config saved to /var/cache/conftool/dbconfig/20241126-115552-arnaudb.json * 11:55 ladsgroup@deploy2002: Started scap sync-world: Backport for [[gerrit:1098006{{!}}Bump ratio of new parsercache key spec to 4 (T373037)]] * 11:54 hashar@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.5 refs [[phab:T375664|T375664]] (duration: 25m 52s) * 11:53 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on A:cp-upload_esams and A:cp for 9.2.6-1wm2 * 11:53 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on A:cp-text_esams and A:cp for 9.2.6-1wm2 * 11:41 arnaudb@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 25%: repool', diff saved to https://phabricator.wikimedia.org/P71175 and previous config saved to /var/cache/conftool/dbconfig/20241126-114106-arnaudb.json * 11:40 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 10%: repool', diff saved to https://phabricator.wikimedia.org/P71174 and previous config saved to /var/cache/conftool/dbconfig/20241126-114047-arnaudb.json * 11:38 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on A:cp-text_eqiad and A:cp for 9.2.6-1wm2 * 11:38 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on A:cp-upload_eqiad and A:cp for 9.2.6-1wm2 * 11:31 moritzm: remove ganeti7001 from active Ganeti nodes in magru01 * 11:28 dcaro@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephmon1004.eqiad.wmnet with OS bullseye * 11:28 hashar@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.5 refs [[phab:T375664|T375664]] * 11:28 moritzm: failover Ganeti master in magru01 to ganeti7003 * 11:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of install7001.wikimedia.org to plain * 11:26 arnaudb@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 15%: repool', diff saved to https://phabricator.wikimedia.org/P71173 and previous config saved to /var/cache/conftool/dbconfig/20241126-112601-arnaudb.json * 11:25 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 5%: repool', diff saved to https://phabricator.wikimedia.org/P71172 and previous config saved to /var/cache/conftool/dbconfig/20241126-112542-arnaudb.json * 11:25 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of install7001.wikimedia.org to plain * 11:25 hashar@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.5 refs [[phab:T375664|T375664]] * 11:25 dcaro@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephmon1004.eqiad.wmnet with OS bullseye * 11:24 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of doh7001.wikimedia.org to plain * 11:23 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of doh7001.wikimedia.org to plain * 11:21 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of durum7001.magru.wmnet to plain * 11:20 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of durum7001.magru.wmnet to plain * 11:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir7001.magru.wmnet to plain * 11:12 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir7001.magru.wmnet to plain * 11:10 arnaudb@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 10%: repool', diff saved to https://phabricator.wikimedia.org/P71171 and previous config saved to /var/cache/conftool/dbconfig/20241126-111056-arnaudb.json * 11:10 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on lvs7003.magru.wmnet with reason: [[phab:T376737|T376737]] * 11:10 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on lvs7003.magru.wmnet with reason: [[phab:T376737|T376737]] * 11:10 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 2%: repool', diff saved to https://phabricator.wikimedia.org/P71170 and previous config saved to /var/cache/conftool/dbconfig/20241126-111036-arnaudb.json * 11:10 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on lvs7002.magru.wmnet with reason: [[phab:T376737|T376737]] * 11:10 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on lvs7002.magru.wmnet with reason: [[phab:T376737|T376737]] * 11:10 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on lvs7001.magru.wmnet with reason: [[phab:T376737|T376737]] * 11:10 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on lvs7001.magru.wmnet with reason: [[phab:T376737|T376737]] * 11:09 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of netflow7001.magru.wmnet to plain * 11:09 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of netflow7001.magru.wmnet to plain * 11:06 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of netflow7001.magru.wmnet to drbd * 11:05 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 11:03 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of prometheus7001.magru.wmnet to drbd * 11:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of bast7001.wikimedia.org to drbd * 10:56 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of netflow7001.magru.wmnet to drbd * 10:55 arnaudb@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 5%: repool', diff saved to https://phabricator.wikimedia.org/P71169 and previous config saved to /var/cache/conftool/dbconfig/20241126-105550-arnaudb.json * 10:55 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 1%: repool', diff saved to https://phabricator.wikimedia.org/P71168 and previous config saved to /var/cache/conftool/dbconfig/20241126-105531-arnaudb.json * 10:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir7001.magru.wmnet to drbd * 10:47 hashar@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.5 refs [[phab:T375664|T375664]] * 10:46 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 10:43 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of bast7001.wikimedia.org to drbd * 10:42 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir7001.magru.wmnet to drbd * 10:42 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on A:cp-upload_eqiad and A:cp for 9.2.6-1wm2 * 10:42 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on A:cp-text_eqiad and A:cp for 9.2.6-1wm2 * 10:38 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of doh7002.wikimedia.org to drbd * 10:38 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of durum7001.magru.wmnet to drbd * 10:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1233.eqiad.wmnet onto db1246.eqiad.wmnet * 10:28 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of doh7002.wikimedia.org to drbd * 10:27 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of durum7001.magru.wmnet to drbd * 10:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of durum7002.magru.wmnet to drbd * 10:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of doh7001.wikimedia.org to drbd * 10:15 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of durum7002.magru.wmnet to drbd * 10:15 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of doh7001.wikimedia.org to drbd * 10:13 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir7002.magru.wmnet to drbd * 10:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of install7001.wikimedia.org to drbd * 10:02 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir7002.magru.wmnet to drbd * 09:57 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of install7001.wikimedia.org to drbd * 09:56 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti7004.magru.wmnet to cluster magru02 and group B4 * 09:54 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti7004.magru.wmnet to cluster magru02 and group B4 * 09:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti7003.magru.wmnet to cluster magru01 and group B3 * 09:52 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti7003.magru.wmnet to cluster magru01 and group B3 * 09:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti7004.mgmt.magru.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 09:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti7003.magru.wmnet to cluster magru01 and group B3 * 09:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti7003.magru.wmnet to cluster magru01 and group B3 * 09:23 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti7004.mgmt.magru.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 09:21 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts kubernetes[2005-2006,2015-2016].codfw.wmnet,kubernetes[1005-1006,1015-1016].eqiad.wmnet * 09:21 jayme@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:21 jayme@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kubernetes[2005-2006,2015-2016].codfw.wmnet,kubernetes[1005-1006,1015-1016].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jayme@cumin2002" * 09:21 jayme@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kubernetes[2005-2006,2015-2016].codfw.wmnet,kubernetes[1005-1006,1015-1016].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jayme@cumin2002" * 09:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti7003.mgmt.magru.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 09:11 jayme@cumin2002: START - Cookbook sre.dns.netbox * 09:11 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti7003.mgmt.magru.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 09:03 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db1233.eqiad.wmnet onto db1246.eqiad.wmnet * 08:52 jayme@cumin2002: START - Cookbook sre.hosts.decommission for hosts kubernetes[2005-2006,2015-2016].codfw.wmnet,kubernetes[1005-1006,1015-1016].eqiad.wmnet * 08:52 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti7004.magru.wmnet to cluster magru02 and group B4 * 08:52 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti7004.magru.wmnet to cluster magru02 and group B4 * 08:49 dcausse@deploy2002: Finished deploy [airflow-dags/search@f969d75]: search: swift_upload.py moved to refinery/bin/ (duration: 00m 27s) * 08:49 jayme@cumin2002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubernetes[1005-1006,1015-1016].eqiad.wmnet * 08:48 jayme@cumin2002: START - Cookbook sre.k8s.pool-depool-node depool for host kubernetes[1005-1006,1015-1016].eqiad.wmnet * 08:48 dcausse@deploy2002: Started deploy [airflow-dags/search@f969d75]: search: swift_upload.py moved to refinery/bin/ * 08:47 jayme@cumin2002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubernetes[2005-2006,2015-2016].codfw.wmnet * 08:46 jayme@cumin2002: START - Cookbook sre.k8s.pool-depool-node depool for host kubernetes[2005-2006,2015-2016].codfw.wmnet * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti7004.magru.wmnet * 08:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti7003.magru.wmnet to cluster magru01 and group B3 * 08:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti7003.magru.wmnet to cluster magru01 and group B3 * 08:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti7003.magru.wmnet * 08:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti7004.magru.wmnet * 08:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti7003.magru.wmnet * 08:06 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti7004 * 08:06 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti7004 * 08:06 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti7003 * 08:05 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti7003 * 07:55 arnaudb@cumin1002: dbctl commit (dc=all): 'manual depool commit', diff saved to https://phabricator.wikimedia.org/P71164 and previous config saved to /var/cache/conftool/dbconfig/20241126-075433-arnaudb.json * 07:55 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.depool (exit_code=99) db1233 - clone on db1246 * 07:54 arnaudb@cumin1002: START - Cookbook sre.mysql.depool db1233 - clone on db1246 * 07:36 joal@deploy2002: Finished deploy [analytics/refinery@f48b8de] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@f48b8de2] (duration: 00m 29s) * 07:35 joal@deploy2002: Started deploy [analytics/refinery@f48b8de] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@f48b8de2] * 07:35 joal@deploy2002: Finished deploy [analytics/refinery@f48b8de] (thin): Regular analytics weekly train THIN [analytics/refinery@f48b8de2] (duration: 00m 35s) * 07:34 joal@deploy2002: Started deploy [analytics/refinery@f48b8de] (thin): Regular analytics weekly train THIN [analytics/refinery@f48b8de2] * 07:34 joal@deploy2002: Finished deploy [analytics/refinery@f48b8de]: Regular analytics weekly train [analytics/refinery@f48b8de2] (duration: 02m 03s) * 07:33 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "UI bugfixes - oblivian@cumin1002" * 07:33 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: UI bugfixes - oblivian@cumin1002 * 07:33 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: UI bugfixes - oblivian@cumin1002 * 07:33 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "UI bugfixes - oblivian@cumin1002" * 07:32 joal@deploy2002: Started deploy [analytics/refinery@f48b8de]: Regular analytics weekly train [analytics/refinery@f48b8de2] * 03:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2215 ([[phab:T380449|T380449]])', diff saved to https://phabricator.wikimedia.org/P71163 and previous config saved to /var/cache/conftool/dbconfig/20241126-034040-ladsgroup.json * 03:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2215.codfw.wmnet with reason: Maintenance * 03:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2215.codfw.wmnet with reason: Maintenance * 03:12 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 20:00:00 on wdqs[2018-2020,2026-2027].codfw.wmnet with reason: [[phab:T376150|T376150]] non-prod hosts * 03:12 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 20:00:00 on wdqs[2018-2020,2026-2027].codfw.wmnet with reason: [[phab:T376150|T376150]] non-prod hosts * 03:11 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) ([[phab:T376150|T376150]], initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling neither afterwards * 03:10 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T376150|T376150]], initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling neither afterwards * 03:09 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) ([[phab:T376150|T376150]], initialize wdqs internal main tier) xfer wikidata_main from wdqs2021.codfw.wmnet -> wdqs2019.codfw.wmnet, repooling neither afterwards * 03:07 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T376150|T376150]], initialize wdqs internal main tier) xfer wikidata_main from wdqs2021.codfw.wmnet -> wdqs2019.codfw.wmnet, repooling neither afterwards * 02:42 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2201.codfw.wmnet with reason: Maintenance * 02:41 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2201.codfw.wmnet with reason: Maintenance * 02:34 brett: Import libvmod-netmapper 1.9.1-1 into varnish-staging apt component * 02:31 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on wdqs[2026-2027].codfw.wmnet with reason: [[phab:T376150|T376150]] * 02:30 bking@cumin2002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on wdqs[2026-2027].codfw.wmnet with reason: [[phab:T376150|T376150]] * 02:29 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) ([[phab:T376150|T376150]], initialize wdqs internal main tier) xfer wikidata_main from wdqs2021.codfw.wmnet -> wdqs2020.codfw.wmnet, repooling source-only afterwards * 02:24 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T376150|T376150]], initialize wdqs internal main tier) xfer wikidata_main from wdqs2021.codfw.wmnet -> wdqs2020.codfw.wmnet, repooling source-only afterwards * 01:47 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) ([[phab:T376150|T376150]], initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2027.codfw.wmnet, repooling source-only afterwards * 01:29 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host dns7001.wikimedia.org with OS bookworm * 01:08 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) ([[phab:T376150|T376150]], initialize wdqs internal main tier) xfer wikidata_main from wdqs2021.codfw.wmnet -> wdqs2019.codfw.wmnet, repooling source-only afterwards * 01:06 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on lvs[7001-7003].magru.wmnet with reason: site is depooled, maintenance * 01:06 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 10:00:00 on lvs[7001-7003].magru.wmnet with reason: site is depooled, maintenance * 01:04 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T376150|T376150]], initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2027.codfw.wmnet, repooling source-only afterwards * 01:04 sukhe@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp7015.magru.wmnet with OS bullseye * 01:03 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T376150|T376150]], initialize wdqs internal main tier) xfer wikidata_main from wdqs2021.codfw.wmnet -> wdqs2019.codfw.wmnet, repooling source-only afterwards * 01:01 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on wdqs[2026-2027].codfw.wmnet with reason: [[phab:T376150|T376150]] * 01:01 bking@cumin2002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on wdqs[2026-2027].codfw.wmnet with reason: [[phab:T376150|T376150]] * 00:55 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host cp7015.magru.wmnet with OS bullseye * 00:28 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) ([[phab:T376150|T376150]], initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards * 00:21 eileen: civicrm upgraded from {{Gerrit|190ea417}} to {{Gerrit|eec961a3}} * 00:16 tzatziki: removing 6 files for legal compliance * 00:00 tzatziki: removing 1 file for legal compliance == 2024-11-25 == * 23:56 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2197.codfw.wmnet with reason: Maintenance * 23:55 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2197.codfw.wmnet with reason: Maintenance * 23:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2191 ([[phab:T380449|T380449]])', diff saved to https://phabricator.wikimedia.org/P71162 and previous config saved to /var/cache/conftool/dbconfig/20241125-235547-ladsgroup.json * 23:54 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) ([[phab:T376150|T376150]], initialize wdqs internal main tier) xfer wikidata_main from wdqs2021.codfw.wmnet -> wdqs2018.codfw.wmnet, repooling source-only afterwards * 23:53 tzatziki: removing 1 file for legal compliance * 23:49 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T376150|T376150]], initialize wdqs internal main tier) xfer wikidata_main from wdqs2021.codfw.wmnet -> wdqs2018.codfw.wmnet, repooling source-only afterwards * 23:44 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T376150|T376150]], initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards * 23:42 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T376150|T376150]], initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards * 23:42 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T376150|T376150]], initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards * 23:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2191', diff saved to https://phabricator.wikimedia.org/P71161 and previous config saved to /var/cache/conftool/dbconfig/20241125-234040-ladsgroup.json * 23:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2191', diff saved to https://phabricator.wikimedia.org/P71160 and previous config saved to /var/cache/conftool/dbconfig/20241125-232533-ladsgroup.json * 23:23 tzatziki: removing 2 files for legal compliance * 23:16 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T376150|T376150]], initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards * 23:16 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T376150|T376150]], initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards * 23:16 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T376150|T376150]], initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards * 23:16 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T376150|T376150]], initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards * 23:14 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T376150|T376150]], initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards * 23:14 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T376150|T376150]], initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards * 23:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2191 ([[phab:T380449|T380449]])', diff saved to https://phabricator.wikimedia.org/P71159 and previous config saved to /var/cache/conftool/dbconfig/20241125-231026-ladsgroup.json * 23:10 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T376150|T376150]], initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards * 23:10 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T376150|T376150]], initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards * 23:09 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T376150|T376150]], initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards * 23:09 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T376150|T376150]], initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards * 23:09 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp7015.magru.wmnet with OS bullseye * 23:02 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp7015.magru.wmnet with OS bullseye * 23:01 bking@cumin1002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) * 23:01 bking@cumin1002: START - Cookbook sre.wdqs.data-transfer * 23:01 brett@cumin2002: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) cp7015.magru.wmnet lvs7003.magru.wmnet cp7015.mgmt.magru.wmnet lvs7003.mgmt.magru.wmnet on all recursors * 23:00 brett@cumin2002: START - Cookbook sre.dns.wipe-cache cp7015.magru.wmnet lvs7003.magru.wmnet cp7015.mgmt.magru.wmnet lvs7003.mgmt.magru.wmnet on all recursors * 23:00 brett@cumin2002: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) cp7015.magru.wmnet lvs7003.magru.wmnet on all recursors * 23:00 brett@cumin2002: START - Cookbook sre.dns.wipe-cache cp7015.magru.wmnet lvs7003.magru.wmnet on all recursors * 22:56 brett: Import varnish-modules 0.20.0-2~deb11u1 into varnish-staging apt component * 22:56 bking@cumin1002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T376150|T376150]], initialize wdqs internal scholarly tier) xfer wikidata from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards * 22:56 bking@cumin1002: START - Cookbook sre.wdqs.data-transfer ([[phab:T376150|T376150]], initialize wdqs internal scholarly tier) xfer wikidata from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards * 22:53 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T376150|T376150]], initialize wdqs internal scholarly tier) xfer wikidata from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards * 22:53 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T376150|T376150]], initialize wdqs internal scholarly tier) xfer wikidata from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards * 22:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2191 ([[phab:T380449|T380449]])', diff saved to https://phabricator.wikimedia.org/P71158 and previous config saved to /var/cache/conftool/dbconfig/20241125-224949-ladsgroup.json * 22:49 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2191.codfw.wmnet with reason: Maintenance * 22:49 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2191.codfw.wmnet with reason: Maintenance * 22:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2131 ([[phab:T380449|T380449]])', diff saved to https://phabricator.wikimedia.org/P71157 and previous config saved to /var/cache/conftool/dbconfig/20241125-224927-ladsgroup.json * 22:48 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T376150|T376150]], initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards * 22:48 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T376150|T376150]], initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards * 22:46 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp7015.magru.wmnet with OS bullseye * 22:43 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T376150|T376150]], initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards * 22:43 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T376150|T376150]], initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards * 22:38 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T376150|T376150]], initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards * 22:38 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T376150|T376150]], initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards * 22:37 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T376150|T376150]], initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards * 22:37 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T376150|T376150]], initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards * 22:37 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T376150|T376150]], initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards * 22:37 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T376150|T376150]], initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards * away: UTC late deploys done * 22:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2131', diff saved to https://phabricator.wikimedia.org/P71156 and previous config saved to /var/cache/conftool/dbconfig/20241125-223420-ladsgroup.json * 22:34 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1097327{{!}}SUL3: Sort overrides (T373737)]], [[gerrit:1097328{{!}}More authentication domain overrides (T373737)]], [[gerrit:1097322{{!}}Update private/readme.php to match production]] (duration: 12m 49s) * 22:32 eileen: civicrm upgraded from {{Gerrit|b7bd670f}} to {{Gerrit|190ea417}} * 22:31 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs7003.magru.wmnet with OS bullseye * 22:27 tgr@deploy2002: tgr: Continuing with sync * 22:25 tgr@deploy2002: tgr: Backport for [[gerrit:1097327{{!}}SUL3: Sort overrides (T373737)]], [[gerrit:1097328{{!}}More authentication domain overrides (T373737)]], [[gerrit:1097322{{!}}Update private/readme.php to match production]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 22:21 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1097327{{!}}SUL3: Sort overrides (T373737)]], [[gerrit:1097328{{!}}More authentication domain overrides (T373737)]], [[gerrit:1097322{{!}}Update private/readme.php to match production]] * 22:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2131', diff saved to https://phabricator.wikimedia.org/P71155 and previous config saved to /var/cache/conftool/dbconfig/20241125-221913-ladsgroup.json * 22:19 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1097518{{!}}Reader Survey: Increase coverage (T378660)]] (duration: 14m 08s) * 22:13 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs7003.magru.wmnet with reason: host reimage * 22:12 tgr@deploy2002: tgr, dani: Continuing with sync * 22:09 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs7003.magru.wmnet with reason: host reimage * 22:09 tgr@deploy2002: tgr, dani: Backport for [[gerrit:1097518{{!}}Reader Survey: Increase coverage (T378660)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 22:09 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp7015.magru.wmnet with OS bullseye * 22:08 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp7015.magru.wmnet with OS bullseye * 22:04 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1097518{{!}}Reader Survey: Increase coverage (T378660)]] * 22:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2131 ([[phab:T380449|T380449]])', diff saved to https://phabricator.wikimedia.org/P71154 and previous config saved to /var/cache/conftool/dbconfig/20241125-220406-ladsgroup.json * 22:02 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1097457{{!}}LoginCompleteHookHandler: onTempUserCreatedRedirect() should use getPrimaryInstance() (T380042)]] (duration: 12m 41s) * 21:56 tgr@deploy2002: tgr, matmarex: Continuing with sync * 21:54 tgr@deploy2002: tgr, matmarex: Backport for [[gerrit:1097457{{!}}LoginCompleteHookHandler: onTempUserCreatedRedirect() should use getPrimaryInstance() (T380042)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:50 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1097457{{!}}LoginCompleteHookHandler: onTempUserCreatedRedirect() should use getPrimaryInstance() (T380042)]] * 21:49 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1094054{{!}}Reader Survey: Increase coverage on enwiki (T378660)]] (duration: 16m 06s) * 21:46 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp7015.magru.wmnet with OS bullseye * 21:45 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp7015.magru.wmnet with OS bullseye * 21:42 tgr@deploy2002: tgr, dani: Continuing with sync * 21:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2131 ([[phab:T380449|T380449]])', diff saved to https://phabricator.wikimedia.org/P71153 and previous config saved to /var/cache/conftool/dbconfig/20241125-213904-ladsgroup.json * 21:38 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2131.codfw.wmnet with reason: Maintenance * 21:38 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2131.codfw.wmnet with reason: Maintenance * 21:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2115 ([[phab:T380449|T380449]])', diff saved to https://phabricator.wikimedia.org/P71152 and previous config saved to /var/cache/conftool/dbconfig/20241125-213841-ladsgroup.json * 21:37 tgr@deploy2002: tgr, dani: Backport for [[gerrit:1094054{{!}}Reader Survey: Increase coverage on enwiki (T378660)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:33 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1094054{{!}}Reader Survey: Increase coverage on enwiki (T378660)]] * 21:31 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs7003.magru.wmnet with OS bullseye * 21:30 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs7003.magru.wmnet with OS bullseye * 21:29 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1097417{{!}}Reader Survey: Fix yes/no messages (T378660)]] (duration: 16m 02s) * 21:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2115', diff saved to https://phabricator.wikimedia.org/P71151 and previous config saved to /var/cache/conftool/dbconfig/20241125-212334-ladsgroup.json * 21:22 tgr@deploy2002: dani, tgr: Continuing with sync * 21:18 sukhe@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "testing - sukhe@cumin1002" * 21:18 sukhe@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "testing - sukhe@cumin1002" * 21:17 tgr@deploy2002: dani, tgr: Backport for [[gerrit:1097417{{!}}Reader Survey: Fix yes/no messages (T378660)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:13 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1097417{{!}}Reader Survey: Fix yes/no messages (T378660)]] * 21:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2115', diff saved to https://phabricator.wikimedia.org/P71150 and previous config saved to /var/cache/conftool/dbconfig/20241125-210827-ladsgroup.json * 21:04 brett@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Host reimage - brett@cumin2002 - brett@cumin2002" * 21:04 brett@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Host reimage - brett@cumin2002 - brett@cumin2002" * 21:03 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts restbase[2021-2023].codfw.wmnet * 21:03 eevans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:03 eevans@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: restbase[2021-2023].codfw.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 21:03 eevans@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: restbase[2021-2023].codfw.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:59 eevans@cumin1002: START - Cookbook sre.dns.netbox * 20:57 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7008.magru.wmnet with OS bullseye * 20:57 brett@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002" * 20:56 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - robh@cumin2002" * 20:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2115 ([[phab:T380449|T380449]])', diff saved to https://phabricator.wikimedia.org/P71149 and previous config saved to /var/cache/conftool/dbconfig/20241125-205320-ladsgroup.json * 20:51 eevans@cumin1002: START - Cookbook sre.hosts.decommission for hosts restbase[2021-2023].codfw.wmnet * 20:51 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:51 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru reshuffle - robh@cumin2002" * 20:50 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru reshuffle - robh@cumin2002" * 20:45 robh@cumin2002: START - Cookbook sre.dns.netbox * 20:45 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dns7001 * 20:45 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host dns7001 * 20:44 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs7003.magru.wmnet with OS bullseye * 20:43 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs7003.magru.wmnet with OS bullseye * 20:40 brett@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002" * 20:34 robh@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti7004.magru.wmnet with reason: host reimage * 20:31 robh@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti7004.magru.wmnet with reason: host reimage * 20:26 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs7003.magru.wmnet with OS bullseye * 20:24 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp7015.magru.wmnet with OS bullseye * 20:12 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7008.magru.wmnet with reason: host reimage * 20:09 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7008.magru.wmnet with reason: host reimage * 20:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2115 ([[phab:T380449|T380449]])', diff saved to https://phabricator.wikimedia.org/P71147 and previous config saved to /var/cache/conftool/dbconfig/20241125-200031-ladsgroup.json * 20:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2115.codfw.wmnet with reason: Maintenance * 20:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2115.codfw.wmnet with reason: Maintenance * 20:00 robh@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti7004.magru.wmnet with OS bookworm * 19:58 robh@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti7003.magru.wmnet with OS bookworm * 19:58 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - robh@cumin2002" * 19:56 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - robh@cumin2002" * 19:50 eevans@cumin1002: conftool action : set/weight=10; selector: cluster=restbase,dc=codfw,name=restbase2038.codfw.wmnet * 19:50 eevans@cumin1002: conftool action : set/weight=10; selector: cluster=restbase,dc=codfw,name=restbase2037.codfw.wmnet * 19:50 eevans@cumin1002: conftool action : set/weight=10; selector: cluster=restbase,dc=codfw,name=restbase2036.codfw.wmnet * 19:43 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp7008.magru.wmnet with OS bullseye * 19:43 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp7008.magru.wmnet with OS bullseye * 19:37 ejegg: fundraising civicrm upgraded from {{Gerrit|3311520a}} to {{Gerrit|b7bd670f}} * 19:36 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1095126{{!}}[Growth] enwiki: Deploy Add Link to 2% of new users (T377631)]] (duration: 11m 59s) * 19:35 robh@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti7003.magru.wmnet with reason: host reimage * 19:31 robh@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti7003.magru.wmnet with reason: host reimage * 19:29 urbanecm@deploy2002: urbanecm: Continuing with sync * 19:28 urbanecm@deploy2002: urbanecm: Backport for [[gerrit:1095126{{!}}[Growth] enwiki: Deploy Add Link to 2% of new users (T377631)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:24 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1095126{{!}}[Growth] enwiki: Deploy Add Link to 2% of new users (T377631)]] * 19:18 swfrench@deploy2002: Finished scap sync-world: Deployment to pick up new php 8.1 base images (duration: 09m 37s) * 19:14 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:14 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru reshuffle - robh@cumin2002" * 19:14 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru reshuffle - robh@cumin2002" * 19:11 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance * 19:11 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance * 19:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1237 ([[phab:T380449|T380449]])', diff saved to https://phabricator.wikimedia.org/P71144 and previous config saved to /var/cache/conftool/dbconfig/20241125-191124-ladsgroup.json * 19:10 robh@cumin2002: START - Cookbook sre.dns.netbox * 19:10 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs7003 * 19:10 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs7003 * 19:08 swfrench@deploy2002: Started scap sync-world: Deployment to pick up new php 8.1 base images * 19:06 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp7008.magru.wmnet with OS bullseye * 19:06 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp7008.magru.wmnet with OS bullseye * 19:02 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7006.magru.wmnet with OS bullseye * 19:02 brett@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002" * 18:59 robh@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti7003.magru.wmnet with OS bookworm * 18:59 brett@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002" * 18:59 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:59 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru reshuffle - robh@cumin2002" * 18:59 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru reshuffle - robh@cumin2002" * 18:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1237', diff saved to https://phabricator.wikimedia.org/P71143 and previous config saved to /var/cache/conftool/dbconfig/20241125-185617-ladsgroup.json * 18:53 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs7003 * 18:53 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs7003 * 18:53 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp7008.magru.wmnet with OS bullseye * 18:52 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp7008.magru.wmnet with OS bullseye * 18:49 jayme@cumin2002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on D<nowiki>{</nowiki>wikikube-worker[2128-2170].codfw.wmnet<nowiki>}</nowiki> and (A:wikikube-staging-worker-codfw or A:wikikube-staging-master-codfw or A:wikikube-staging-worker-eqiad or A:wikikube-staging-master-eqiad or A:wikikube-worker-codfw or A:wikikube-master-codfw or A:wikikube-worker-eqiad or A:wikikube-master-eqiad or A:ml-serve-worker-eqiad or A:ml-se * 18:48 krinkle@deploy2002: Finished deploy [statsv/statsv@6678d4b]: {{Gerrit|I7a8d831817}}: remove unused statsvr.py (duration: 00m 09s) * 18:48 krinkle@deploy2002: Started deploy [statsv/statsv@6678d4b]: {{Gerrit|I7a8d831817}}: remove unused statsvr.py * 18:45 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp7015 * 18:45 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp7015 * 18:45 robh@cumin2002: START - Cookbook sre.dns.netbox * 18:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1237', diff saved to https://phabricator.wikimedia.org/P71142 and previous config saved to /var/cache/conftool/dbconfig/20241125-184110-ladsgroup.json * 18:34 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp7008.magru.wmnet with OS bullseye * 18:33 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7006.magru.wmnet with reason: host reimage * 18:31 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:29 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7006.magru.wmnet with reason: host reimage * 18:28 robh@cumin2002: START - Cookbook sre.dns.netbox * 18:28 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cp7015.magru.wmnet * 18:28 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:27 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp7015.magru.wmnet decommissioned, removing all IPs except the asset tag one - robh@cumin2002" * 18:27 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp7015.magru.wmnet decommissioned, removing all IPs except the asset tag one - robh@cumin2002" * 18:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1237 ([[phab:T380449|T380449]])', diff saved to https://phabricator.wikimedia.org/P71141 and previous config saved to /var/cache/conftool/dbconfig/20241125-182603-ladsgroup.json * 18:24 robh@cumin2002: START - Cookbook sre.dns.netbox * 18:18 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp7015.magru.wmnet * 18:17 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts lvs7003.magru.wmnet * 18:17 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:17 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs7003.magru.wmnet decommissioned, removing all IPs except the asset tag one - robh@cumin2002" * 18:16 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs7003.magru.wmnet decommissioned, removing all IPs except the asset tag one - robh@cumin2002" * 18:13 robh@cumin2002: START - Cookbook sre.dns.netbox * 18:08 swfrench-wmf: rebuilt php8.1 production images to pick up 8.1.31 * 18:08 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1097310{{!}}Migrate to virtual domains (T354939)]], [[gerrit:1097369{{!}}createExtensionTables: Use virtual domains for GrowthExperiments (T354939)]] (duration: 13m 18s) * 18:04 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp7006.magru.wmnet with OS bullseye * 18:03 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp7006.magru.wmnet with OS bullseye * 18:03 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts lvs7003.magru.wmnet * 18:02 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:02 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru reshuffle - robh@cumin2002" * 18:02 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru reshuffle - robh@cumin2002" * 18:01 urbanecm@deploy2002: urbanecm: Continuing with sync * 17:59 urbanecm@deploy2002: urbanecm: Backport for [[gerrit:1097310{{!}}Migrate to virtual domains (T354939)]], [[gerrit:1097369{{!}}createExtensionTables: Use virtual domains for GrowthExperiments (T354939)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 17:58 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti7004 * 17:58 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti7004 * 17:57 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp7008 * 17:57 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp7008 * 17:56 ryankemper@deploy2002: Finished deploy [wdqs/wdqs@9927a5a] (wcqs): Deploy 0.3.150 to WCQS (duration: 02m 53s) * 17:55 robh@cumin2002: START - Cookbook sre.dns.netbox * 17:54 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1097310{{!}}Migrate to virtual domains (T354939)]], [[gerrit:1097369{{!}}createExtensionTables: Use virtual domains for GrowthExperiments (T354939)]] * 17:53 ryankemper@deploy2002: Started deploy [wdqs/wdqs@9927a5a] (wcqs): Deploy 0.3.150 to WCQS * 17:49 ryankemper: [[phab:T378260|T378260]] `snapshot1016.eqiad.wmnet` => manually deleted `cirrussearch-dump-s11.[timer,service]` * 17:49 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7001.magru.wmnet with OS bullseye * 17:49 fabfur@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1002" * 17:46 fabfur@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1002" * 17:44 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp7006.magru.wmnet with OS bullseye * 17:41 robh@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cp7008.magru.wmnet * 17:41 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:39 robh@cumin2002: START - Cookbook sre.dns.netbox * 17:39 robh@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti7004.magru.wmnet * 17:39 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:39 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti7004.magru.wmnet decommissioned, removing all IPs except the asset tag one - robh@cumin2002" * 17:39 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti7004.magru.wmnet decommissioned, removing all IPs except the asset tag one - robh@cumin2002" * 17:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1237 ([[phab:T380449|T380449]])', diff saved to https://phabricator.wikimedia.org/P71140 and previous config saved to /var/cache/conftool/dbconfig/20241125-173511-ladsgroup.json * 17:35 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1237.eqiad.wmnet with reason: Maintenance * 17:34 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1237.eqiad.wmnet with reason: Maintenance * 17:34 robh@cumin2002: START - Cookbook sre.dns.netbox * 17:29 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp7008.magru.wmnet * 17:29 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti7004.magru.wmnet * 17:23 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:23 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru reshuffle - robh@cumin2002" * 17:22 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru reshuffle - robh@cumin2002" * 17:19 robh@cumin2002: START - Cookbook sre.dns.netbox * 17:17 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7001.magru.wmnet with reason: host reimage * 17:14 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7001.magru.wmnet with reason: host reimage * 17:10 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cp7006.magru.wmnet * 17:10 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:10 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp7006.magru.wmnet decommissioned, removing all IPs except the asset tag one - robh@cumin2002" * 17:10 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp7006.magru.wmnet decommissioned, removing all IPs except the asset tag one - robh@cumin2002" * 17:06 robh@cumin2002: START - Cookbook sre.dns.netbox * 16:59 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp7006.magru.wmnet * 16:59 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ganeti7003.magru.wmnet * 16:59 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:58 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti7003.magru.wmnet decommissioned, removing all IPs except the asset tag one - robh@cumin2002" * 16:58 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti7003.magru.wmnet decommissioned, removing all IPs except the asset tag one - robh@cumin2002" * 16:55 robh@cumin2002: START - Cookbook sre.dns.netbox * 16:49 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti7003.magru.wmnet * 16:47 fabfur@cumin1002: START - Cookbook sre.hosts.reimage for host cp7001.magru.wmnet with OS bullseye * 16:45 hashar@deploy2002: Pruned MediaWiki: 1.44.0-wmf.2 (duration: 03m 05s) * 16:44 dcaro@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephmon1004.eqiad.wmnet with OS bullseye * 16:42 hashar@deploy2002: Installation of scap version "4.129.0" completed for 211 hosts * 16:42 swfrench-wmf: uploaded php8.1 8.1.31-1+wmf11u1 to apt.w.o (16:25 UTC) * 16:38 hashar@deploy2002: Installing scap version "4.129.0" for 211 hosts * 16:27 hashar@deploy2002: Installation of scap version "4.128.0" completed for 211 hosts * 16:27 Dreamy_Jazz: Restarted MediaModeration scanning script - https://wikitech.wikimedia.org/wiki/MediaModeration * 16:23 hashar@deploy2002: Installing scap version "4.128.0" for 211 hosts * 16:19 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1225.eqiad.wmnet with reason: Maintenance * 16:19 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1225.eqiad.wmnet with reason: Maintenance * 16:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224 ([[phab:T380449|T380449]])', diff saved to https://phabricator.wikimedia.org/P71138 and previous config saved to /var/cache/conftool/dbconfig/20241125-161915-ladsgroup.json * 16:05 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dns7001.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:05 jayme@cumin2002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on D<nowiki>{</nowiki>wikikube-worker[1305-1312].eqiad.wmnet<nowiki>}</nowiki> and (A:wikikube-staging-worker-codfw or A:wikikube-staging-master-codfw or A:wikikube-staging-worker-eqiad or A:wikikube-staging-master-eqiad or A:wikikube-worker-codfw or A:wikikube-master-codfw or A:wikikube-worker-eqiad or A:wikikube-master-eqiad or A:ml-serve-worker-eqiad or A:ml-se * 16:04 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 16:04 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on A:cp-text_drmrs and A:cp for 9.2.6-1wm2 * 16:04 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 16:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P71134 and previous config saved to /var/cache/conftool/dbconfig/20241125-160408-ladsgroup.json * 16:02 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on A:cp-upload_drmrs and A:cp for 9.2.6-1wm2 * 15:58 robh@cumin2002: START - Cookbook sre.hosts.provision for host dns7001.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:52 Lucas_WMDE: UTC afternoon backport+config window done (apologies for the temporary flood of “Use of QuickSurveys survey” deprecation warnings – should be fixed again) * 15:52 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dns7001.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:49 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1097410{{!}}Reader Survey: Fix question (T378660)]] (duration: 13m 02s) * 15:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P71133 and previous config saved to /var/cache/conftool/dbconfig/20241125-154901-ladsgroup.json * 15:48 robh@cumin2002: START - Cookbook sre.hosts.provision for host dns7001.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:47 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:47 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru swaps - robh@cumin2002" * 15:46 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru swaps - robh@cumin2002" * 15:46 claime: homer cr*eqiad* commit '[[phab:T380027|T380027]]' * 15:42 robh@cumin2002: START - Cookbook sre.dns.netbox * 15:41 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, dani: Continuing with sync * 15:41 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts kubernetes[1009-1014].eqiad.wmnet * 15:41 cgoubert@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 15:40 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wmde: apply * 15:40 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wmde: apply * 15:40 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, dani: Backport for [[gerrit:1097410{{!}}Reader Survey: Fix question (T378660)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:39 cgoubert@cumin1002: START - Cookbook sre.dns.netbox * 15:38 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1309.eqiad.wmnet with OS bookworm * 15:37 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on backup2011.codfw.wmnet with reason: Reboot * 15:37 jynus@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on backup2011.codfw.wmnet with reason: Reboot * 15:37 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on backup2010.codfw.wmnet with reason: Reboot * 15:37 jynus@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on backup2010.codfw.wmnet with reason: Reboot * 15:36 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1097410{{!}}Reader Survey: Fix question (T378660)]] * 15:36 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp7001.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224 ([[phab:T380449|T380449]])', diff saved to https://phabricator.wikimedia.org/P71132 and previous config saved to /var/cache/conftool/dbconfig/20241125-153354-ladsgroup.json * 15:31 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp7001.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:23 lucaswerkmeister-wmde@deploy2002: dani, lucaswerkmeister-wmde: Continuing with sync * 15:21 lucaswerkmeister-wmde@deploy2002: dani, lucaswerkmeister-wmde: Backport for [[gerrit:1093987{{!}}Reader Survey: Deploy on enwiki (T378660)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:19 cgoubert@cumin1002: START - Cookbook sre.hosts.decommission for hosts kubernetes[1009-1014].eqiad.wmnet * 15:18 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1309.eqiad.wmnet with reason: host reimage * 15:18 robh@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:17 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1093987{{!}}Reader Survey: Deploy on enwiki (T378660)]] * 15:15 robh@cumin1002: START - Cookbook sre.dns.netbox * 15:15 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1309.eqiad.wmnet with reason: host reimage * 15:15 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1094511{{!}}New stream config for Android Rabbit Holes feature. (T380107)]] (duration: 15m 45s) * 15:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1224 ([[phab:T380449|T380449]])', diff saved to https://phabricator.wikimedia.org/P71131 and previous config saved to /var/cache/conftool/dbconfig/20241125-151103-ladsgroup.json * 15:10 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1224.eqiad.wmnet with reason: Maintenance * 15:10 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1224.eqiad.wmnet with reason: Maintenance * 15:08 lucaswerkmeister-wmde@deploy2002: dbrant, lucaswerkmeister-wmde: Continuing with sync * 15:03 lucaswerkmeister-wmde@deploy2002: dbrant, lucaswerkmeister-wmde: Backport for [[gerrit:1094511{{!}}New stream config for Android Rabbit Holes feature. (T380107)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:02 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on A:cp-text_drmrs and A:cp for 9.2.6-1wm2 * 15:02 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on A:cp-upload_drmrs and A:cp for 9.2.6-1wm2 * 14:59 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubernetes[1009-1014].eqiad.wmnet * 14:59 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1094511{{!}}New stream config for Android Rabbit Holes feature. (T380107)]] * 14:57 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1097381{{!}}Pass context to 'revreview-pending-basic' on history page (T380519)]], [[gerrit:1097382{{!}}Use Contexts for Message objects in review dialog (tooltip) (T380519)]] (duration: 15m 35s) * 14:56 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:56 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove vlan1107 IPv6 entries - cmooney@cumin1002" * 14:56 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host kubernetes[1009-1014].eqiad.wmnet * 14:54 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove vlan1107 IPv6 entries - cmooney@cumin1002" * 14:54 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1309.eqiad.wmnet with OS bookworm * 14:52 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker1309.eqiad.wmnet * 14:52 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker1309.eqiad.wmnet * 14:52 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 14:50 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, matmarex: Continuing with sync * 14:49 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on A:cp-text_codfw and A:cp for 9.2.6-1wm2 * 14:48 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, matmarex: Backport for [[gerrit:1097381{{!}}Pass context to 'revreview-pending-basic' on history page (T380519)]], [[gerrit:1097382{{!}}Use Contexts for Message objects in review dialog (tooltip) (T380519)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:47 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[1310-1312].eqiad.wmnet * 14:47 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[1310-1312].eqiad.wmnet * 14:47 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on A:cp-upload_codfw and A:cp for 9.2.6-1wm2 * 14:44 cmooney@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 14:44 cmooney@cumin1002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add reverse IPv6 includes to dns repo for vlan1107 - cmooney@cumin1002" * 14:42 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add reverse IPv6 includes to dns repo for vlan1107 - cmooney@cumin1002" * 14:41 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1097381{{!}}Pass context to 'revreview-pending-basic' on history page (T380519)]], [[gerrit:1097382{{!}}Use Contexts for Message objects in review dialog (tooltip) (T380519)]] * 14:39 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 14:26 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Add tooltips - oblivian@cumin1002" * 14:26 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Add tooltips - oblivian@cumin1002 * 14:26 moritzm: prune unneeded kernels from grafana2001 * 14:26 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Add tooltips - oblivian@cumin1002 * 14:26 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Add tooltips - oblivian@cumin1002" * 14:20 claime: Manually deleting wikikube-worker13[13-20].eqiad.wmnet for ip exhaustion [[phab:T375845|T375845]] * 14:19 claime: disable puppet and kubelet on wikikube-worker13[13-28].eqiad.wmnet for ip exhaustion [[phab:T375845|T375845]] * 14:12 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 14:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host es2045.codfw.wmnet * 14:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host es2046.codfw.wmnet * 14:02 aborrero@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:01 aborrero@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudgw updates - aborrero@cumin1002" * 14:01 aborrero@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudgw updates - aborrero@cumin1002" * 13:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host es2046.codfw.wmnet * 13:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host es2045.codfw.wmnet * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host es2044.codfw.wmnet * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host es2043.codfw.wmnet * 13:51 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on A:cp-upload_codfw and A:cp for 9.2.6-1wm2 * 13:51 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on A:cp-text_codfw and A:cp for 9.2.6-1wm2 * 13:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host es2044.codfw.wmnet * 13:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host es2043.codfw.wmnet * 13:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host es2042.codfw.wmnet * 13:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host es2041.codfw.wmnet * 13:47 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 13:47 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 13:47 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1318.eqiad.wmnet with OS bookworm * 13:46 jayme: deployed sessionstore to non-dedicated nodes - [[phab:T379599|T379599]] * 13:44 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/services/sessionstore: apply * 13:44 jayme@deploy2002: helmfile [codfw] START helmfile.d/services/sessionstore: apply * 13:43 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1320.eqiad.wmnet with OS bookworm * 13:43 jayme: cordoned kubernetes[2005-2006,2015-2016].codfw.wmnet,kubernetes[1005-1006,1015-1016].eqiad.wmnet - [[phab:T379599|T379599]] * 13:42 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/services/sessionstore: apply * 13:42 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host es2042.codfw.wmnet * 13:42 aborrero@cumin1002: START - Cookbook sre.dns.netbox * 13:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host es2041.codfw.wmnet * 13:41 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1314.eqiad.wmnet with OS bookworm * 13:40 btullis@cumin1002: END (PASS) - Cookbook sre.ceph.roll-restart-reboot-server (exit_code=0) rolling reboot on A:cephosd and (A:cephosd) * 13:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host db1246.eqiad.wmnet * 13:38 jayme@deploy2002: helmfile [eqiad] START helmfile.d/services/sessionstore: apply * 13:38 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply * 13:38 jayme@deploy2002: helmfile [staging] START helmfile.d/services/sessionstore: apply * 13:37 andrewtavis-wmde@deploy2002: Finished deploy [airflow-dags/wmde@006515b]: Testing the new k8s deployment (duration: 02m 34s) * 13:37 andrewtavis-wmde@deploy2002: Started deploy [airflow-dags/wmde@006515b]: Testing the new k8s deployment * 13:35 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1319.eqiad.wmnet with OS bookworm * 13:32 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1315.eqiad.wmnet with OS bookworm * 13:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host db1246.eqiad.wmnet * 13:32 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 13:30 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1317.eqiad.wmnet with OS bookworm * 13:28 jayme@cumin2002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on D<nowiki>{</nowiki>wikikube-worker[1305-1312].eqiad.wmnet<nowiki>}</nowiki> and (A:wikikube-staging-worker-codfw or A:wikikube-staging-master-codfw or A:wikikube-staging-worker-eqiad or A:wikikube-staging-master-eqiad or A:wikikube-worker-codfw or A:wikikube-master-codfw or A:wikikube-worker-eqiad or A:wikikube-master-eqiad or A:ml-serve-worker-eqiad or A:ml-serve-master-eqiad or * 13:28 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1318.eqiad.wmnet with reason: host reimage * 13:28 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1316.eqiad.wmnet with OS bookworm * 13:27 jayme@cumin2002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on D<nowiki>{</nowiki>wikikube-worker[2128-2170].codfw.wmnet<nowiki>}</nowiki> and (A:wikikube-staging-worker-codfw or A:wikikube-staging-master-codfw or A:wikikube-staging-worker-eqiad or A:wikikube-staging-master-eqiad or A:wikikube-worker-codfw or A:wikikube-master-codfw or A:wikikube-worker-eqiad or A:wikikube-master-eqiad or A:ml-serve-worker-eqiad or A:ml-serve-master-eqiad or * 13:24 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1320.eqiad.wmnet with reason: host reimage * 13:24 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1313.eqiad.wmnet with OS bookworm * 13:23 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wmde: apply * 13:23 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wmde: apply * 13:21 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1314.eqiad.wmnet with reason: host reimage * 13:17 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1179 gradually with 4 steps - Maint over * 13:17 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1319.eqiad.wmnet with reason: host reimage * 13:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: [[phab:T373579|T373579]], host is WIP * 13:16 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: [[phab:T373579|T373579]], host is WIP * 13:13 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1315.eqiad.wmnet with reason: host reimage * 13:11 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1317.eqiad.wmnet with reason: host reimage * 13:09 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wmde: apply * 13:08 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1316.eqiad.wmnet with reason: host reimage * 13:08 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wmde: apply * 13:06 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1320.eqiad.wmnet with reason: host reimage * 13:05 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp7015.magru.wmnet with reason: [[phab:T376737|T376737]] * 13:05 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp7015.magru.wmnet with reason: [[phab:T376737|T376737]] * 13:05 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp7008.magru.wmnet with reason: [[phab:T376737|T376737]] * 13:05 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp7008.magru.wmnet with reason: [[phab:T376737|T376737]] * 13:05 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1319.eqiad.wmnet with reason: host reimage * 13:05 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp7006.magru.wmnet with reason: [[phab:T376737|T376737]] * 13:04 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1313.eqiad.wmnet with reason: host reimage * 13:04 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp7006.magru.wmnet with reason: [[phab:T376737|T376737]] * 13:04 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp7001.magru.wmnet with reason: [[phab:T376737|T376737]] * 13:04 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp7001.magru.wmnet with reason: [[phab:T376737|T376737]] * 13:04 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on lvs7003.magru.wmnet with reason: [[phab:T376737|T376737]] * 13:04 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs7003.magru.wmnet with reason: [[phab:T376737|T376737]] * 13:04 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1318.eqiad.wmnet with reason: host reimage * 13:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aux-k8s-etcd2005.codfw.wmnet * 13:03 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1317.eqiad.wmnet with reason: host reimage * 13:03 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti7004.magru.wmnet with reason: [[phab:T376737|T376737]] * 13:03 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ganeti7004.magru.wmnet with reason: [[phab:T376737|T376737]] * 13:03 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti7003.magru.wmnet with reason: [[phab:T376737|T376737]] * 13:03 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1316.eqiad.wmnet with reason: host reimage * 13:03 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ganeti7003.magru.wmnet with reason: [[phab:T376737|T376737]] * 13:03 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wmde: apply * 13:02 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dns7001.wikimedia.org with reason: [[phab:T376737|T376737]] * 13:02 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 13:02 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dns7001.wikimedia.org with reason: [[phab:T376737|T376737]] * 13:02 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wmde: apply * 13:02 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1315.eqiad.wmnet with reason: host reimage * 13:02 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp7001.magru.wmnet with reason: [[phab:T376737|T376737]] * 13:02 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp7001.magru.wmnet with reason: [[phab:T376737|T376737]] * 13:02 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-search: apply * 13:02 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1314.eqiad.wmnet with reason: host reimage * 13:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aux-k8s-etcd2004.codfw.wmnet * 13:01 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-search: apply * 13:01 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 13:01 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-research: apply * 13:00 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1313.eqiad.wmnet with reason: host reimage * 13:00 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-research: apply * 13:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host aux-k8s-etcd2005.codfw.wmnet * 12:59 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-platform-eng: apply * 12:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aux-k8s-etcd2003.codfw.wmnet * 12:59 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-platform-eng: apply * 12:59 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-product: apply * 12:58 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-product: apply * 12:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aux-k8s-ctrl2003.codfw.wmnet * 12:58 jayme@cumin2002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on D<nowiki>{</nowiki>kubestage100[5-6].eqiad.wmnet<nowiki>}</nowiki> and (A:wikikube-staging-worker-codfw or A:wikikube-staging-master-codfw or A:wikikube-staging-worker-eqiad or A:wikikube-staging-master-eqiad or A:wikikube-worker-codfw or A:wikikube-master-codfw or A:wikikube-worker-eqiad or A:wikikube-master-eqiad or A:ml-serve-worker-eqiad or A:ml-serve-maste * 12:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host aux-k8s-etcd2004.codfw.wmnet * 12:57 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 12:56 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 12:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host aux-k8s-etcd2003.codfw.wmnet * 12:54 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host aux-k8s-ctrl2003.codfw.wmnet * 12:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aux-k8s-ctrl2002.codfw.wmnet * 12:48 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 12:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aux-k8s-worker2005.codfw.wmnet * 12:48 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 12:47 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on backup1011.eqiad.wmnet with reason: Reboot * 12:47 jynus@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on backup1011.eqiad.wmnet with reason: Reboot * 12:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host aux-k8s-ctrl2002.codfw.wmnet * 12:44 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host aux-k8s-worker2005.codfw.wmnet * 12:44 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1320.eqiad.wmnet with OS bookworm * 12:43 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1319.eqiad.wmnet with OS bookworm * 12:43 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1318.eqiad.wmnet with OS bookworm * 12:43 jayme@cumin2002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on D<nowiki>{</nowiki>kubestage100[5-6].eqiad.wmnet<nowiki>}</nowiki> and (A:wikikube-staging-worker-codfw or A:wikikube-staging-master-codfw or A:wikikube-staging-worker-eqiad or A:wikikube-staging-master-eqiad or A:wikikube-worker-codfw or A:wikikube-master-codfw or A:wikikube-worker-eqiad or A:wikikube-master-eqiad or A:ml-serve-worker-eqiad or A:ml-serve-master-eqiad or A:ml-ser * 12:42 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1317.eqiad.wmnet with OS bookworm * 12:42 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1316.eqiad.wmnet with OS bookworm * 12:42 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on backup1010.eqiad.wmnet with reason: Reboot * 12:41 jynus@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on backup1010.eqiad.wmnet with reason: Reboot * 12:41 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1315.eqiad.wmnet with OS bookworm * 12:41 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1314.eqiad.wmnet with OS bookworm * 12:40 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1313.eqiad.wmnet with OS bookworm * 12:33 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on A:cp-eqsin and not (P<nowiki>{</nowiki>cp5018.*<nowiki>}</nowiki> or P<nowiki>{</nowiki>cp5026.*<nowiki>}</nowiki>) and A:cp for 9.2.6-1wm2 * 12:32 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool db1179 gradually with 4 steps - Maint over * 12:28 btullis@cumin1002: START - Cookbook sre.ceph.roll-restart-reboot-server rolling reboot on A:cephosd and (A:cephosd) * 12:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aux-k8s-worker2004.codfw.wmnet * 12:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aux-k8s-worker2003.codfw.wmnet * 12:23 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host aux-k8s-worker2004.codfw.wmnet * 12:22 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host aux-k8s-worker2003.codfw.wmnet * 12:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aux-k8s-worker2002.codfw.wmnet * 12:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host build2002.codfw.wmnet * 12:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host aux-k8s-worker2002.codfw.wmnet * 12:06 hashar@deploy2002: Pruned MediaWiki: 1.39.0-wmf.1 (duration: 00m 40s) * 12:06 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host build2002.codfw.wmnet * 12:03 hashar@deploy2002: Pruned MediaWiki: 1.39.0-wmf.1 (duration: 00m 37s) * 11:56 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker1256.eqiad.wmnet * 11:56 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker1256.eqiad.wmnet * 11:51 hashar@deploy2002: Installation of scap version "4.128.0" completed for 211 hosts * 11:49 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) check for host wikikube-worker1290.eqiad.wmnet * 11:49 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node check for host wikikube-worker1290.eqiad.wmnet * 11:47 hashar@deploy2002: Installing scap version "4.128.0" for 211 hosts * 11:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1179 ([[phab:T380449|T380449]])', diff saved to https://phabricator.wikimedia.org/P71125 and previous config saved to /var/cache/conftool/dbconfig/20241125-114651-ladsgroup.json * 11:46 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance * 11:46 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance * 11:41 claime: homer 'cr*eqiad*' commit '[[phab:T379454|T379454]]' * 11:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1256.eqiad.wmnet with OS bookworm * 11:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - cgoubert@cumin1002" * 11:39 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - cgoubert@cumin1002" * 11:34 hashar@deploy2002: Installing scap version "4.128.0" for 211 hosts * 11:24 moritzm: installing Linux 6.1.119 on Bookworm nodes * 11:20 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1256.eqiad.wmnet with reason: host reimage * 11:18 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1256.eqiad.wmnet with reason: host reimage * 11:03 fabfur@cumin1002: conftool action : set/pooled=no; selector: cluster=dnsbox,dc=magru * 11:02 fabfur: depooling dnsboxes @ magru for hardware swap ([[phab:T376737|T376737]]) * 11:02 fabfur@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site magru [reason: depool magru for hw swap, [[phab:T376737|T376737]]] * 11:01 fabfur@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site magru [reason: depool magru for hw swap, [[phab:T376737|T376737]]] * 11:01 fabfur: depooling magru for hardware swap ([[phab:T376737|T376737]]) * 10:40 hashar@deploy2002: Finished deploy [integration/docroot@d585f2b]: build: Updating cross-spawn to 7.0.6 (duration: 00m 10s) * 10:40 hashar@deploy2002: Started deploy [integration/docroot@d585f2b]: build: Updating cross-spawn to 7.0.6 * 10:38 _joe_: deleted pyall component from reprepro * 10:35 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on A:cp-eqsin and not (P<nowiki>{</nowiki>cp5018.*<nowiki>}</nowiki> or P<nowiki>{</nowiki>cp5026.*<nowiki>}</nowiki>) and A:cp for 9.2.6-1wm2 * 10:25 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on A:cp-ulsfo and not (P<nowiki>{</nowiki>cp4043.*<nowiki>}</nowiki> or P<nowiki>{</nowiki>cp4051.*<nowiki>}</nowiki>) and A:cp for 9.2.6-1wm2 * 10:17 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host thanos-be1005.eqiad.wmnet * 10:11 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 10:11 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 10:10 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 10:10 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 10:07 jynus: extending backup1009 free filesystem * 10:06 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be1005.eqiad.wmnet * 09:58 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host thanos-be2005.codfw.wmnet * 09:46 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2005.codfw.wmnet * 09:45 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host thanos-be2005.codfw.wmnet * 09:44 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 09:43 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 09:39 moritzm: remove ganeti7003 from active Ganeti nodes in magru01 [[phab:T376737|T376737]] * 09:34 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2005.codfw.wmnet * 09:32 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 09:32 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 09:25 ladsgroup@deploy2002: Finished scap sync-world: Backport for [[gerrit:1093956{{!}}Bump ratio of new parsercache key spec to 6 (T373037)]] (duration: 11m 05s) * 09:18 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of doh7001.wikimedia.org to plain * 09:18 ladsgroup@deploy2002: ladsgroup: Continuing with sync * 09:18 ladsgroup@deploy2002: ladsgroup: Backport for [[gerrit:1093956{{!}}Bump ratio of new parsercache key spec to 6 (T373037)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 09:14 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of doh7001.wikimedia.org to plain * 09:13 ladsgroup@deploy2002: Started scap sync-world: Backport for [[gerrit:1093956{{!}}Bump ratio of new parsercache key spec to 6 (T373037)]] * 09:13 dcausse: restarting blazegraph on wdqs1012 (BlazegraphFreeAllocatorsDecreasingRapidly) * 09:04 kostajh: UTC morning deploys done * 09:01 kharlan@deploy2002: Finished scap sync-world: Backport for [[gerrit:1053230{{!}}IPReputation: Enable everywhere (T360067)]] (duration: 15m 48s) * 08:53 kharlan@deploy2002: kharlan: Continuing with sync * 08:50 kharlan@deploy2002: kharlan: Backport for [[gerrit:1053230{{!}}IPReputation: Enable everywhere (T360067)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of durum7001.magru.wmnet to plain * 08:47 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of durum7001.magru.wmnet to plain * 08:46 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2179.codfw.wmnet with reason: Maintenance * 08:46 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2179.codfw.wmnet with reason: Maintenance * 08:46 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1160.eqiad.wmnet with reason: Maintenance * 08:46 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1160.eqiad.wmnet with reason: Maintenance * 08:46 kharlan@deploy2002: Started scap sync-world: Backport for [[gerrit:1053230{{!}}IPReputation: Enable everywhere (T360067)]] * 08:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2240 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P71123 and previous config saved to /var/cache/conftool/dbconfig/20241125-084531-arnaudb.json * 08:43 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of install7001.wikimedia.org to plain * 08:39 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of install7001.wikimedia.org to plain * 08:39 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1094071{{!}}Disable more extensions when using the shared login domain (T373737)]] (duration: 30m 35s) * 08:37 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on A:cp-ulsfo and not (P<nowiki>{</nowiki>cp4043.*<nowiki>}</nowiki> or P<nowiki>{</nowiki>cp4051.*<nowiki>}</nowiki>) and A:cp for 9.2.6-1wm2 * 08:30 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2240', diff saved to https://phabricator.wikimedia.org/P71122 and previous config saved to /var/cache/conftool/dbconfig/20241125-083024-arnaudb.json * 08:30 tgr@deploy2002: tgr: Continuing with sync * 08:25 tgr@deploy2002: tgr: Backport for [[gerrit:1094071{{!}}Disable more extensions when using the shared login domain (T373737)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:17 jelto@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikidata-query-gui: apply * 08:17 jelto@deploy2002: helmfile [codfw] START helmfile.d/services/wikidata-query-gui: apply * 08:17 jelto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikidata-query-gui: apply * 08:16 jelto@deploy2002: helmfile [eqiad] START helmfile.d/services/wikidata-query-gui: apply * 08:16 jelto@deploy2002: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply * 08:15 jelto@deploy2002: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply * 08:15 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2240', diff saved to https://phabricator.wikimedia.org/P71121 and previous config saved to /var/cache/conftool/dbconfig/20241125-081517-arnaudb.json * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir7001.magru.wmnet to plain * 08:10 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir7001.magru.wmnet to plain * 08:08 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1094071{{!}}Disable more extensions when using the shared login domain (T373737)]] * 08:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of netflow7001.magru.wmnet to plain * 08:00 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of netflow7001.magru.wmnet to plain * 08:00 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2240 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P71120 and previous config saved to /var/cache/conftool/dbconfig/20241125-080010-arnaudb.json * 07:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2240 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P71119 and previous config saved to /var/cache/conftool/dbconfig/20241125-075758-arnaudb.json * 07:57 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2240.codfw.wmnet with reason: Maintenance * 07:57 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2240.codfw.wmnet with reason: Maintenance * 07:56 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 07:56 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 07:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:55 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:54 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti7003.magru.wmnet * 07:53 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti7003.magru.wmnet * 07:47 moritzm: remove ganeti7004 from active Ganeti nodes in magru02 [[phab:T376737|T376737]] * 07:15 _joe_: upgrading vopsbot to 0.3.9 == 2024-11-23 == * 12:08 btullis@cumin1002: END (FAIL) - Cookbook sre.hadoop.roll-restart-masters (exit_code=99) restart masters for Hadoop test cluster: Restart of jvm daemons. * 12:05 btullis@cumin1002: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop test cluster: Restart of jvm daemons. * 02:15 urandom: decommissioning Cassandra/restbase2023-<nowiki>{</nowiki>a,b,c<nowiki>}</nowiki> — [[phab:T380236|T380236]] == 2024-11-22 == * 21:51 bking@cumin2002: conftool action : set/pooled=false; selector: dnsdisc=wdqs-internal-scholarly,name=eqiad * 21:37 bking@cumin2002: conftool action : set/pooled=yes; selector: name=wdqs2026.codfw.wmnet * 21:37 bking@cumin2002: conftool action : set/pooled=yes; selector: name=wdqs2018.codfw.wmnet * 21:33 bking@cumin2002: conftool action : set/weight=1; selector: name=wdqs2026.codfw.wmnet * 21:33 bking@cumin2002: conftool action : set/weight=1; selector: name=wdqs2018.codfw.wmnet * 21:25 bking@cumin2002: conftool action : set/pooled=yes:weight=1; selector: cluster=wdqs-scholarly,service=wdqs-internal-scholarly * 21:25 bking@cumin2002: conftool action : set/pooled=yes:weight=1; selector: cluster=wdqs-main,service=wdqs-internal-main * 20:59 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-worker2005.codfw.wmnet * 20:59 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker2005.codfw.wmnet with OS bookworm * 20:41 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker2005.codfw.wmnet with reason: host reimage * 20:37 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker2005.codfw.wmnet with reason: host reimage * 20:20 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker2005.codfw.wmnet with OS bookworm * 20:17 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker2005.codfw.wmnet - herron@cumin1002" * 20:17 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker2005.codfw.wmnet - herron@cumin1002" * 20:17 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker2005.codfw.wmnet on all recursors * 20:17 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker2005.codfw.wmnet on all recursors * 20:17 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:17 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker2005.codfw.wmnet - herron@cumin1002" * 20:17 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker2005.codfw.wmnet - herron@cumin1002" * 20:07 herron@cumin1002: START - Cookbook sre.dns.netbox * 20:07 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker2005.codfw.wmnet * 19:47 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-worker2004.codfw.wmnet * 19:47 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker2004.codfw.wmnet with OS bookworm * 19:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2045.codfw.wmnet with OS bookworm * 19:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 19:36 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 19:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2046.codfw.wmnet with OS bookworm * 19:35 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 19:32 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 19:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2043.codfw.wmnet with OS bookworm * 19:32 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 19:31 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker2004.codfw.wmnet with reason: host reimage * 19:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 19:27 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker2004.codfw.wmnet with reason: host reimage * 19:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2044.codfw.wmnet with OS bookworm * 19:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 19:26 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 19:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es2045.codfw.wmnet with reason: host reimage * 19:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es2046.codfw.wmnet with reason: host reimage * 19:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es2043.codfw.wmnet with reason: host reimage * 19:13 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker2004.codfw.wmnet with OS bookworm * 19:10 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker2004.codfw.wmnet - herron@cumin1002" * 19:10 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker2004.codfw.wmnet - herron@cumin1002" * 19:10 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker2004.codfw.wmnet on all recursors * 19:10 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker2004.codfw.wmnet on all recursors * 19:10 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:10 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker2004.codfw.wmnet - herron@cumin1002" * 19:10 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker2004.codfw.wmnet - herron@cumin1002" * 19:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es2044.codfw.wmnet with reason: host reimage * 19:05 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on es2045.codfw.wmnet with reason: host reimage * 19:05 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on es2046.codfw.wmnet with reason: host reimage * 19:05 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on es2043.codfw.wmnet with reason: host reimage * 19:05 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on es2044.codfw.wmnet with reason: host reimage * 18:58 herron@cumin1002: START - Cookbook sre.dns.netbox * 18:58 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker2004.codfw.wmnet * 18:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2042.codfw.wmnet with OS bookworm * 18:53 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:52 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:50 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host es2043.codfw.wmnet with OS bookworm * 18:50 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host es2044.codfw.wmnet with OS bookworm * 18:50 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host es2045.codfw.wmnet with OS bookworm * 18:50 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host es2046.codfw.wmnet with OS bookworm * 18:45 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-worker2003.codfw.wmnet * 18:45 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker2003.codfw.wmnet with OS bookworm * 18:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es2042.codfw.wmnet with reason: host reimage * 18:32 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on es2042.codfw.wmnet with reason: host reimage * 18:31 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker2003.codfw.wmnet with reason: host reimage * 18:27 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker2003.codfw.wmnet with reason: host reimage * 18:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host es2042.codfw.wmnet with OS bookworm * 18:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es2042.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:11 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker2003.codfw.wmnet with OS bookworm * 18:10 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker2003.codfw.wmnet - herron@cumin1002" * 18:10 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker2003.codfw.wmnet - herron@cumin1002" * 18:10 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker2003.codfw.wmnet on all recursors * 18:10 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker2003.codfw.wmnet on all recursors * 18:10 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:10 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker2003.codfw.wmnet - herron@cumin1002" * 18:10 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker2003.codfw.wmnet - herron@cumin1002" * 18:09 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host es2042.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:03 herron@cumin1002: START - Cookbook sre.dns.netbox * 18:02 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:02 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding es2042 to codfw - jhancock@cumin2002" * 18:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding es2042 to codfw - jhancock@cumin2002" * 18:02 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker2003.codfw.wmnet * 17:58 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 17:41 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-worker2002.codfw.wmnet * 17:41 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker2002.codfw.wmnet with OS bookworm * 17:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es2046.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:28 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host es2042 * 17:28 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host es2042 * 17:25 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker2002.codfw.wmnet with reason: host reimage * 17:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es2044.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:23 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cloudsw1-d5-eqiad.mgmt,cloudsw1-e4-eqiad.mgmt with reason: replace optics on faulty WMCS link from D5 to E4 * 17:22 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on cloudsw1-d5-eqiad.mgmt,cloudsw1-e4-eqiad.mgmt with reason: replace optics on faulty WMCS link from D5 to E4 * 17:22 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker2002.codfw.wmnet with reason: host reimage * 17:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host es2046.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host es2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:11 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host es2044.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:08 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker2002.codfw.wmnet with OS bookworm * 17:06 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker2002.codfw.wmnet - herron@cumin1002" * 17:06 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker2002.codfw.wmnet - herron@cumin1002" * 17:05 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker2002.codfw.wmnet on all recursors * 17:05 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker2002.codfw.wmnet on all recursors * 17:05 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:05 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker2002.codfw.wmnet - herron@cumin1002" * 17:05 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker2002.codfw.wmnet - herron@cumin1002" * 17:00 herron@cumin1002: START - Cookbook sre.dns.netbox * 17:00 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker2002.codfw.wmnet * 16:57 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host es2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:54 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of aux-k8s-etcd2003.codfw.wmnet to plain * 16:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es2042.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:53 herron@cumin1002: START - Cookbook sre.ganeti.changedisk for changing disk type of aux-k8s-etcd2003.codfw.wmnet to plain * 16:48 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of aux-k8s-etcd2004.codfw.wmnet to plain * 16:47 herron@cumin1002: START - Cookbook sre.ganeti.changedisk for changing disk type of aux-k8s-etcd2004.codfw.wmnet to plain * 16:43 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of aux-k8s-etcd2005.codfw.wmnet to plain * 16:43 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2041.codfw.wmnet with OS bookworm * 16:43 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 16:43 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 16:42 herron@cumin1002: START - Cookbook sre.ganeti.changedisk for changing disk type of aux-k8s-etcd2005.codfw.wmnet to plain * 16:40 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host es2042.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:27 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es2041.codfw.wmnet with reason: host reimage * 16:24 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on es2041.codfw.wmnet with reason: host reimage * 16:12 claime: homer 'cr*codfw*' commit '[[phab:T380473|T380473]]' * 16:11 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts parse[2002-2020].codfw.wmnet * 16:11 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:10 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: parse[2002-2020].codfw.wmnet decommissioned, removing all IPs except the asset tag one - cgoubert@cumin1002" * 16:10 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: parse[2002-2020].codfw.wmnet decommissioned, removing all IPs except the asset tag one - cgoubert@cumin1002" * 16:09 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host es2041.codfw.wmnet with OS bookworm * 16:08 bking@deploy2002: Finished deploy [wdqs/wdqs@9927a5a]: 0.3.150 (duration: 03m 00s) * 16:07 cgoubert@cumin1002: START - Cookbook sre.dns.netbox * 16:05 bking@deploy2002: Started deploy [wdqs/wdqs@9927a5a]: 0.3.150 * 16:00 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es2041.codfw.wmnet with OS bookworm * 15:31 cgoubert@cumin1002: START - Cookbook sre.hosts.decommission for hosts parse[2002-2020].codfw.wmnet * 15:31 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host es2041.codfw.wmnet with OS bookworm * 15:29 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts parse2001.codfw.wmnet * 15:29 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:29 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: parse2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - cgoubert@cumin1002" * 15:29 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: parse2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - cgoubert@cumin1002" * 15:29 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host es2041.codfw.wmnet with OS bookworm * 15:25 cgoubert@cumin1002: START - Cookbook sre.dns.netbox * 15:22 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host es2041.codfw.wmnet with OS bookworm * 15:20 cgoubert@cumin1002: START - Cookbook sre.hosts.decommission for hosts parse2001.codfw.wmnet * 15:17 ihurbain@deploy2002: helmfile [eqiad] DONE helmfile.d/services/push-notifications: apply * 15:17 ihurbain@deploy2002: helmfile [eqiad] START helmfile.d/services/push-notifications: apply * 15:16 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/push-notifications: apply * 15:15 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/push-notifications: apply * 15:14 claime: kubectl delete node parse20<nowiki>{</nowiki>01..20<nowiki>}</nowiki>.codfw.wmnet - [[phab:T380473|T380473]] * 15:12 claime: parse[2001-2020].codfw.wmnet 'systemctl stop kubelet.service' - [[phab:T380473|T380473]] * 15:11 claime: parse[2001-2020].codfw.wmnet 'disable-puppet "decom"' - [[phab:T380473|T380473]] * 15:09 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host parse[2001-2020].codfw.wmnet * 15:02 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on wdqs[2018-2020].codfw.wmnet with reason: [[phab:T379023|T379023]] * 15:02 bking@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on wdqs[2018-2020].codfw.wmnet with reason: [[phab:T379023|T379023]] * 15:01 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on wdqs[2026-2027].codfw.wmnet with reason: [[phab:T379023|T379023]] * 15:01 bking@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on wdqs[2026-2027].codfw.wmnet with reason: [[phab:T379023|T379023]] * 14:54 urandom: decommissioning Cassandra/restbase2022-<nowiki>{</nowiki>a,b,c<nowiki>}</nowiki> — * 14:53 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on restbase2022.codfw.wmnet with reason: Decommissioning — [[phab:T380236|T380236]] * 14:53 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on restbase2022.codfw.wmnet with reason: Decommissioning — [[phab:T380236|T380236]] * 14:49 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host parse[2001-2020].codfw.wmnet * 14:37 ihurbain@deploy2002: helmfile [codfw] DONE helmfile.d/services/push-notifications: apply * 14:27 ihurbain@deploy2002: helmfile [codfw] START helmfile.d/services/push-notifications: apply * 14:23 ihurbain@deploy2002: helmfile [codfw] DONE helmfile.d/services/push-notifications: apply * 14:22 vgutierrez: restoring haproxykafka on A:cp-ulsfo and A:cp-eqsin - [[phab:T380570|T380570]] * 14:13 ihurbain@deploy2002: helmfile [codfw] START helmfile.d/services/push-notifications: apply * 14:12 ihurbain@deploy2002: helmfile [staging] DONE helmfile.d/services/push-notifications: apply * 14:12 ihurbain@deploy2002: helmfile [staging] START helmfile.d/services/push-notifications: apply * 11:26 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2156-2170].codfw.wmnet * 11:26 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2156-2170].codfw.wmnet * 11:25 claime: homer 'lsw1-d7-codfw*' commit '[[phab:T376966|T376966]]' * 11:24 claime: homer 'lsw1-d6-codfw*' commit '[[phab:T376966|T376966]]' * 11:24 claime: homer 'lsw1-d5-codfw*' commit '[[phab:T376966|T376966]]' * 11:23 claime: homer 'lsw1-d4-codfw*' commit '[[phab:T376966|T376966]]' * 11:22 claime: homer 'lsw1-d1-codfw*' commit '[[phab:T376966|T376966]]' * 11:21 claime: homer 'lsw1-c7-codfw*' commit '[[phab:T376966|T376966]]' * 11:20 claime: homer 'lsw1-c4-codfw*' commit '[[phab:T376966|T376966]]' * 11:19 claime: homer 'lsw1-c2-codfw*' commit '[[phab:T376966|T376966]]' * 11:19 claime: homer 'lsw1-b7-codfw*' commit '[[phab:T376966|T376966]]' * 11:18 claime: homer 'lsw1-b4-codfw*' commit '[[phab:T376966|T376966]]' * 11:07 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2140.codfw.wmnet * 11:07 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2140.codfw.wmnet * 11:04 claime: homer 'lsw1-b7-codfw*' commit '[[phab:T377028|T377028]]' * 11:02 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2159.codfw.wmnet with OS bookworm * 10:43 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 10:40 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 10:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti1014.eqiad.wmnet * 10:37 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:37 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti1014.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:37 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti1014.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:31 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:26 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti1014.eqiad.wmnet * 10:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti1011.eqiad.wmnet * 10:23 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:23 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti1011.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:22 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti1011.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:22 vgutierrez: manually stopping haproxykafka on A:cp-ulsfo and A:cp-eqsin - [[phab:T380570|T380570]] * 10:21 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2159.codfw.wmnet with OS bookworm * 10:16 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:10 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti1011.eqiad.wmnet * 08:08 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Add sorting options to tree view - oblivian@cumin1002" * 08:08 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Add sorting options to tree view - oblivian@cumin1002 * 08:07 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Add sorting options to tree view - oblivian@cumin1002 * 08:07 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Add sorting options to tree view - oblivian@cumin1002" * 01:00 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-etcd2005.codfw.wmnet * 01:00 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-etcd2005.codfw.wmnet with OS bookworm * 00:46 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-etcd2005.codfw.wmnet with reason: host reimage * 00:42 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-etcd2005.codfw.wmnet with reason: host reimage * 00:27 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-etcd2005.codfw.wmnet with OS bookworm * 00:20 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-etcd2005.codfw.wmnet - herron@cumin1002" * 00:20 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-etcd2005.codfw.wmnet - herron@cumin1002" * 00:20 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-etcd2005.codfw.wmnet on all recursors * 00:20 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-etcd2005.codfw.wmnet on all recursors * 00:20 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 00:20 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-etcd2005.codfw.wmnet - herron@cumin1002" * 00:16 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-etcd2005.codfw.wmnet - herron@cumin1002" * 00:11 herron@cumin1002: START - Cookbook sre.dns.netbox * 00:11 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-etcd2005.codfw.wmnet * 00:11 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-etcd2004.codfw.wmnet * 00:11 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-etcd2004.codfw.wmnet with OS bookworm == 2024-11-21 == * 23:56 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-etcd2004.codfw.wmnet with reason: host reimage * 23:52 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-etcd2004.codfw.wmnet with reason: host reimage * 23:36 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-etcd2004.codfw.wmnet with OS bookworm * 23:29 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-etcd2004.codfw.wmnet - herron@cumin1002" * 23:29 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-etcd2004.codfw.wmnet - herron@cumin1002" * 23:29 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-etcd2004.codfw.wmnet on all recursors * 23:28 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-etcd2004.codfw.wmnet on all recursors * 23:28 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:28 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-etcd2004.codfw.wmnet - herron@cumin1002" * 23:24 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-etcd2004.codfw.wmnet - herron@cumin1002" * 23:11 herron@cumin1002: START - Cookbook sre.dns.netbox * 23:11 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-etcd2004.codfw.wmnet * 23:09 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-etcd2003.codfw.wmnet * 23:09 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-etcd2003.codfw.wmnet with OS bookworm * 23:08 brennen: end of utc late backport & config window * 23:07 brennen@deploy2002: Finished scap sync-world: Backport for [[gerrit:1094005{{!}}Add statsv to charts impressions (T379833)]] (duration: 12m 08s) * 23:06 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es2041.codfw.wmnet with OS bookworm * 23:01 brennen@deploy2002: bvibber, brennen: Continuing with sync * 23:00 brennen@deploy2002: bvibber, brennen: Backport for [[gerrit:1094005{{!}}Add statsv to charts impressions (T379833)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 22:55 brennen@deploy2002: Started scap sync-world: Backport for [[gerrit:1094005{{!}}Add statsv to charts impressions (T379833)]] * 22:55 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-etcd2003.codfw.wmnet with reason: host reimage * 22:54 brennen@deploy2002: Finished scap sync-world: resuming sync for [[gerrit:1094000{{!}}Add tracking categories for {{#chart:}} usage (T369684)]] after messing up a keypress (duration: 12m 35s) * 22:52 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-etcd2003.codfw.wmnet with reason: host reimage * 22:42 brennen@deploy2002: Started scap sync-world: resuming sync for [[gerrit:1094000{{!}}Add tracking categories for {{#chart:}} usage (T369684)]] after messing up a keypress * 22:40 brennen@deploy2002: Sync cancelled. * 22:40 brennen@deploy2002: bvibber, brennen: Backport for [[gerrit:1094000{{!}}Add tracking categories for {{#chart:}} usage (T369684)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 22:38 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-etcd2003.codfw.wmnet with OS bookworm * 22:36 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-etcd2003.codfw.wmnet - herron@cumin1002" * 22:36 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-etcd2003.codfw.wmnet - herron@cumin1002" * 22:35 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-etcd2003.codfw.wmnet on all recursors * 22:35 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-etcd2003.codfw.wmnet on all recursors * 22:35 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:35 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-etcd2003.codfw.wmnet - herron@cumin1002" * 22:35 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-etcd2003.codfw.wmnet - herron@cumin1002" * 22:32 herron@cumin1002: START - Cookbook sre.dns.netbox * 22:32 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-etcd2003.codfw.wmnet * 22:25 brennen@deploy2002: Started scap sync-world: Backport for [[gerrit:1094000{{!}}Add tracking categories for {{#chart:}} usage (T369684)]] * 22:25 brennen@deploy2002: Finished scap sync-world: Backport for [[gerrit:1092334{{!}}Disable various extensions when using the shared login domain (T373737)]] (duration: 18m 16s) * 22:22 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host es2041.codfw.wmnet with OS bookworm * 22:18 brennen@deploy2002: tgr, brennen: Continuing with sync * 22:10 brennen@deploy2002: tgr, brennen: Backport for [[gerrit:1092334{{!}}Disable various extensions when using the shared login domain (T373737)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 22:06 brennen@deploy2002: Started scap sync-world: Backport for [[gerrit:1092334{{!}}Disable various extensions when using the shared login domain (T373737)]] * 22:05 brennen@deploy2002: Finished scap sync-world: Backport for [[gerrit:1094047{{!}}Revert "Reduce number of bucketsizes for MediaViewer (group0)" (T372165)]] (duration: 10m 34s) * 21:58 brennen@deploy2002: brennen: Continuing with sync * 21:58 brennen@deploy2002: brennen: Backport for [[gerrit:1094047{{!}}Revert "Reduce number of bucketsizes for MediaViewer (group0)" (T372165)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:54 brennen@deploy2002: Started scap sync-world: Backport for [[gerrit:1094047{{!}}Revert "Reduce number of bucketsizes for MediaViewer (group0)" (T372165)]] * 21:51 brennen@deploy2002: Sync cancelled. * 21:42 brennen@deploy2002: brennen, tgr, simon04: Backport for [[gerrit:1079640{{!}}Reduce number of bucketsizes for MediaViewer (group0) (T372165)]], [[gerrit:1093961{{!}}Set 'remember' central session object field when recreating (T379254 T372702)]], [[gerrit:1093962{{!}}Use cookie to access central session when local session expired]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:39 brennen@deploy2002: Started scap sync-world: Backport for [[gerrit:1079640{{!}}Reduce number of bucketsizes for MediaViewer (group0) (T372165)]], [[gerrit:1093961{{!}}Set 'remember' central session object field when recreating (T379254 T372702)]], [[gerrit:1093962{{!}}Use cookie to access central session when local session expired]] * 21:36 brennen@deploy2002: Finished scap sync-world: Backport for [[gerrit:1093960{{!}}Enable Skin-Codex logging (T375287)]] (duration: 15m 53s) * 21:29 brennen@deploy2002: brennen, jdlrobson: Continuing with sync * 21:26 brennen@deploy2002: brennen, jdlrobson: Backport for [[gerrit:1093960{{!}}Enable Skin-Codex logging (T375287)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:20 brennen@deploy2002: Started scap sync-world: Backport for [[gerrit:1093960{{!}}Enable Skin-Codex logging (T375287)]] * 21:19 brennen@deploy2002: Finished scap sync-world: Backport for [[gerrit:1090968{{!}}Enable AutoModerator on afwiki (T376597)]] (duration: 13m 50s) * 21:12 brennen@deploy2002: kgraessle, brennen: Continuing with sync * 21:10 brennen@deploy2002: kgraessle, brennen: Backport for [[gerrit:1090968{{!}}Enable AutoModerator on afwiki (T376597)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:05 brennen@deploy2002: Started scap sync-world: Backport for [[gerrit:1090968{{!}}Enable AutoModerator on afwiki (T376597)]] * 20:46 tgr * 20:24 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp2038.codfw.wmnet [reason: DIMM replaced, [[phab:T308459|T308459]]] * 20:20 sukhe: force agent on cp2038 * 19:31 gmodena@deploy2002: Finished deploy [analytics/refinery@199401a] (hadoop-test): Ad-hoc deployment TEST [analytics/refinery@199401a6] (duration: 03m 45s) * 19:27 gmodena@deploy2002: Started deploy [analytics/refinery@199401a] (hadoop-test): Ad-hoc deployment TEST [analytics/refinery@199401a6] * 19:07 gmodena@deploy2002: Finished deploy [analytics/refinery@199401a] (thin): Ad-hoc deployment THIN [analytics/refinery@199401a6] (duration: 05m 37s) * 19:01 gmodena@deploy2002: Started deploy [analytics/refinery@199401a] (thin): Ad-hoc deployment THIN [analytics/refinery@199401a6] * 18:57 gmodena@deploy2002: Finished deploy [analytics/refinery@199401a]: Ad-hoc deployment [analytics/refinery@199401a6] (duration: 14m 08s) * 18:57 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1093983{{!}}Follow-up fix for Charts enable on commons/test2 (T379689)]] (duration: 11m 29s) * 18:49 cdanis@deploy2002: cdanis, bvibber: Continuing with sync * 18:49 cdanis@deploy2002: cdanis, bvibber: Backport for [[gerrit:1093983{{!}}Follow-up fix for Charts enable on commons/test2 (T379689)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 18:45 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1093983{{!}}Follow-up fix for Charts enable on commons/test2 (T379689)]] * 18:43 gmodena@deploy2002: Started deploy [analytics/refinery@199401a]: Ad-hoc deployment [analytics/refinery@199401a6] * 18:21 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1091328{{!}}Enabling Charts on commons+test2 (T379689)]] (duration: 14m 05s) * 18:16 jayme@cumin2002: conftool action : set/pooled=yes; selector: name=kubestage200[34].codfw.wmnet * 18:15 jayme@cumin2002: conftool action : set/weight=10; selector: name=kubestage200[34].codfw.wmnet * 18:13 cdanis@deploy2002: cdanis, bvibber: Continuing with sync * 18:12 cdanis@deploy2002: cdanis, bvibber: Backport for [[gerrit:1091328{{!}}Enabling Charts on commons+test2 (T379689)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 18:10 sukhe: running puppet on A:cp to resolve failed puppet run * 18:10 sukhe: sudo cumin -b11 'A:cp' 'run-puppet-agent * 18:09 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp2038.codfw.wmnet with reason: DIMM replacement in progress * 18:09 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on cp2038.codfw.wmnet with reason: DIMM replacement in progress * 18:07 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1091328{{!}}Enabling Charts on commons+test2 (T379689)]] * 17:58 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=cp2038.codfw.wmnet [reason: DIMM failure [[phab:T308459|T308459]]] * 17:45 jayme@cumin2002: END (FAIL) - Cookbook sre.k8s.pool-depool-node (exit_code=99) check for host kubestage2003.codfw.wmnet * 17:45 jayme@cumin2002: START - Cookbook sre.k8s.pool-depool-node check for host kubestage2003.codfw.wmnet * 17:40 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts clouddb2002-dev.codfw.wmnet * 17:40 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:40 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: clouddb2002-dev.codfw.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002" * 17:39 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: clouddb2002-dev.codfw.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002" * 17:39 fabfur: adding acls to kafka-jumbo cluster ([[phab:T380373|T380373]]) * 17:36 andrew@cumin1002: START - Cookbook sre.dns.netbox * 17:31 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts clouddb2002-dev.codfw.wmnet * 17:02 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2157.codfw.wmnet with OS bookworm * 16:54 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs2013.codfw.wmnet * 16:54 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs2013.codfw.wmnet * 16:54 sukhe: enable puppet on lvs2013 and start pybal * 16:48 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs2013.codfw.wmnet with reason: rebooting * 16:47 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs2013.codfw.wmnet with reason: rebooting * 16:47 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2140.codfw.wmnet with OS bookworm * 16:47 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - cgoubert@cumin1002" * 16:46 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs2013.codfw.wmnet * 16:46 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - cgoubert@cumin1002" * 16:43 sukhe@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs2013.codfw.wmnet * 16:43 sukhe: rebooting drained lvs2013 * 16:43 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 16:39 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 16:26 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 16:23 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 16:21 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2157.codfw.wmnet with OS bookworm * 16:20 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2157.codfw.wmnet with OS bookworm * 16:13 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=cluster=dnsbox,dc=magru [reason: testing] * 16:08 dancy@deploy2002: Finished scap sync-world: testing (duration: 03m 01s) * 16:05 dancy@deploy2002: Started scap sync-world: testing * 16:04 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2140.codfw.wmnet with OS bookworm * 16:03 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2140.codfw.wmnet with OS bookworm * 16:00 dancy@deploy2002: Installing scap version "4.127.0" for 209 hosts * 15:39 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1093927{{!}}Fix layout broken by display:flex on HorizontalLayout (T380471)]], [[gerrit:1093928{{!}}Revert "ExperimentUserDefaultsManager: use read latest when retrieving central id"]] (duration: 15m 51s) * 15:34 gmodena@deploy2002: Finished deploy [analytics/refinery@358ccf5] (hadoop-test): Ad-hoc deployment TEST [analytics/refinery@358ccf55] (duration: 03m 30s) * 15:33 kartik@deploy2002: abi, sgimeno, kartik: Continuing with sync * 15:31 gmodena@deploy2002: Started deploy [analytics/refinery@358ccf5] (hadoop-test): Ad-hoc deployment TEST [analytics/refinery@358ccf55] * 15:29 gmodena@deploy2002: Finished deploy [analytics/refinery@358ccf5] (thin): Ad-hoc deployment THIN [analytics/refinery@358ccf55] (duration: 05m 16s) * 15:29 ihurbain@deploy2002: helmfile [eqiad] DONE helmfile.d/services/push-notifications: apply * 15:29 kartik@deploy2002: abi, sgimeno, kartik: Backport for [[gerrit:1093927{{!}}Fix layout broken by display:flex on HorizontalLayout (T380471)]], [[gerrit:1093928{{!}}Revert "ExperimentUserDefaultsManager: use read latest when retrieving central id"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:28 ihurbain@deploy2002: helmfile [eqiad] START helmfile.d/services/push-notifications: apply * 15:28 ihurbain@deploy2002: helmfile [codfw] DONE helmfile.d/services/push-notifications: apply * 15:27 ihurbain@deploy2002: helmfile [codfw] START helmfile.d/services/push-notifications: apply * 15:26 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@6183645]: increase driver memory for mjolnir feature selection (duration: 00m 31s) * 15:26 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs2013.codfw.wmnet with reason: rebooting * 15:25 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs2013.codfw.wmnet with reason: rebooting * 15:25 ebernhardson@deploy2002: Started deploy [airflow-dags/search@6183645]: increase driver memory for mjolnir feature selection * 15:24 sukhe: stop pybal on lvs2013 to confirm changes in CR {{Gerrit|1091243}} * 15:24 gmodena@deploy2002: Started deploy [analytics/refinery@358ccf5] (thin): Ad-hoc deployment THIN [analytics/refinery@358ccf55] * 15:24 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1093927{{!}}Fix layout broken by display:flex on HorizontalLayout (T380471)]], [[gerrit:1093928{{!}}Revert "ExperimentUserDefaultsManager: use read latest when retrieving central id"]] * 15:23 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2140.codfw.wmnet with OS bookworm * 15:23 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2140.codfw.wmnet with OS bookworm * 15:16 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2140.codfw.wmnet with OS bookworm * 15:15 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2140.codfw.wmnet with OS bookworm * 15:11 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on restbase2021.codfw.wmnet with reason: Decommissioning — [[phab:T380236|T380236]] * 15:10 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on restbase2021.codfw.wmnet with reason: Decommissioning — [[phab:T380236|T380236]] * 15:06 gmodena@deploy2002: Finished deploy [analytics/refinery@358ccf5]: Ad-hoc deployment [analytics/refinery@358ccf55] (duration: 11m 44s) * 14:56 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2169.codfw.wmnet with OS bookworm * 14:55 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2140.codfw.wmnet with OS bookworm * 14:54 gmodena@deploy2002: Started deploy [analytics/refinery@358ccf5]: Ad-hoc deployment [analytics/refinery@358ccf55] * 14:53 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2168.codfw.wmnet with OS bookworm * 14:51 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2170.codfw.wmnet with OS bookworm * 14:50 sergi0: UTC afternoon deploys done * 14:49 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2167.codfw.wmnet with OS bookworm * 14:48 sgimeno@deploy2002: Sync cancelled. * 14:47 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2140.codfw.wmnet with OS bookworm * 14:47 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2166.codfw.wmnet with OS bookworm * 14:43 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on kafka-main1001.eqiad.wmnet with reason: Per claime's recommendation * 14:43 jynus@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on kafka-main1001.eqiad.wmnet with reason: Per claime's recommendation * 14:43 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2157.codfw.wmnet with OS bookworm * 14:41 sgimeno@deploy2002: sgimeno: Backport for [[gerrit:1093889{{!}}ExperimentUserDefaultsManager: use read latest when retrieving central id (T379682)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:39 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2140.codfw.wmnet with OS bookworm * 14:36 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 14:35 sgimeno@deploy2002: Started scap sync-world: Backport for [[gerrit:1093889{{!}}ExperimentUserDefaultsManager: use read latest when retrieving central id (T379682)]] * 14:33 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 14:31 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 14:28 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 14:25 ihurbain@deploy2002: helmfile [staging] DONE helmfile.d/services/push-notifications: apply * 14:25 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 14:25 ihurbain@deploy2002: helmfile [staging] START helmfile.d/services/push-notifications: apply * 14:24 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 14:24 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 14:23 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 14:23 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 14:22 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 14:21 sgimeno@deploy2002: Finished scap sync-world: Backport for [[gerrit:1092956{{!}}enwiki: Add abusefilter-access-protected-vars to EFH/EFM (T380332)]] (duration: 13m 50s) * 14:14 sgimeno@deploy2002: eggroll97, sgimeno: Continuing with sync * 14:11 sgimeno@deploy2002: eggroll97, sgimeno: Backport for [[gerrit:1092956{{!}}enwiki: Add abusefilter-access-protected-vars to EFH/EFM (T380332)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:11 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage1006.eqiad.wmnet with OS bookworm * 14:07 sgimeno@deploy2002: Started scap sync-world: Backport for [[gerrit:1092956{{!}}enwiki: Add abusefilter-access-protected-vars to EFH/EFM (T380332)]] * 14:06 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage1005.eqiad.wmnet with OS bookworm * 14:05 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2170.codfw.wmnet with OS bookworm * 14:05 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2169.codfw.wmnet with OS bookworm * 14:04 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2168.codfw.wmnet with OS bookworm * 14:04 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2167.codfw.wmnet with OS bookworm * 14:03 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2166.codfw.wmnet with OS bookworm * 13:54 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage1006.eqiad.wmnet with reason: host reimage * 13:51 jayme@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage1006.eqiad.wmnet with reason: host reimage * 13:47 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage1005.eqiad.wmnet with reason: host reimage * 13:44 jayme@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage1005.eqiad.wmnet with reason: host reimage * 13:34 jayme@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage1006.eqiad.wmnet with OS bookworm * 13:33 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from kubernetes1008 to kubestage1006 * 13:32 jayme@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubestage1006 * 13:31 jayme@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host kubestage1006 * 13:31 jayme@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:31 jayme@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes1008 to kubestage1006 - jayme@cumin2002" * 13:30 jayme@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes1008 to kubestage1006 - jayme@cumin2002" * 13:27 jayme@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage1005.eqiad.wmnet with OS bookworm * 13:25 jayme@cumin2002: START - Cookbook sre.dns.netbox * 13:25 jayme@cumin2002: START - Cookbook sre.hosts.rename from kubernetes1008 to kubestage1006 * 13:24 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from kubernetes1007 to kubestage1005 * 13:24 jayme@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubestage1005 * 13:22 jayme@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host kubestage1005 * 13:22 jayme@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:22 jayme@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes1007 to kubestage1005 - jayme@cumin2002" * 13:21 jayme@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes1007 to kubestage1005 - jayme@cumin2002" * 13:18 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2160.codfw.wmnet with OS bookworm * 13:18 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on P<nowiki>{</nowiki>cp5026*<nowiki>}</nowiki> and A:cp for 9.2.6-1wm2 * 13:17 jayme@cumin2002: START - Cookbook sre.dns.netbox * 13:17 jayme@cumin2002: START - Cookbook sre.hosts.rename from kubernetes1007 to kubestage1005 * 13:14 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2164.codfw.wmnet with OS bookworm * 13:14 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on P<nowiki>{</nowiki>cp5026*<nowiki>}</nowiki> and A:cp for 9.2.6-1wm2 * 13:14 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on P<nowiki>{</nowiki>cp5018*<nowiki>}</nowiki> and A:cp for 9.2.6-1wm2 * 13:11 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2162.codfw.wmnet with OS bookworm * 13:10 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on P<nowiki>{</nowiki>cp5018*<nowiki>}</nowiki> and A:cp for 9.2.6-1wm2 * 13:10 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2165.codfw.wmnet with OS bookworm * 13:05 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 13:02 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2158.codfw.wmnet with OS bookworm * 12:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2161.codfw.wmnet with OS bookworm * 12:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 12:55 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2156.codfw.wmnet with OS bookworm * 12:55 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 12:52 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 12:49 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 12:46 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2163.codfw.wmnet with reason: host reimage * 12:42 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 12:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 12:38 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 12:38 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 12:38 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2163.codfw.wmnet with reason: host reimage * 12:37 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 12:36 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 12:36 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 12:35 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 12:32 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 12:32 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 12:19 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2165.codfw.wmnet with OS bookworm * 12:18 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2164.codfw.wmnet with OS bookworm * 12:18 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 12:17 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2162.codfw.wmnet with OS bookworm * 12:17 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2160.codfw.wmnet with OS bookworm * 12:16 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2161.codfw.wmnet with OS bookworm * 12:16 jmm@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 12:13 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2158.codfw.wmnet with OS bookworm * 12:13 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2156.codfw.wmnet with OS bookworm * 12:09 jmm@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 12:09 jmm@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 12:02 jmm@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply * 11:56 jmm@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 11:56 jmm@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply * 11:00 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-be1005.eqiad.wmnet with OS bullseye * 11:00 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 10:59 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 10:41 jayme@cumin2002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubernetes[1007-1008].eqiad.wmnet * 10:41 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-be1005.eqiad.wmnet with reason: host reimage * 10:40 jayme@cumin2002: START - Cookbook sre.k8s.pool-depool-node depool for host kubernetes[1007-1008].eqiad.wmnet * 10:39 urbanecm@deploy2002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply * 10:38 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P71113 and previous config saved to /var/cache/conftool/dbconfig/20241121-103834-arnaudb.json * 10:38 urbanecm@deploy2002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply * 10:38 urbanecm@deploy2002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply * 10:37 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-be1005.eqiad.wmnet with reason: host reimage * 10:36 urbanecm@deploy2002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply * 10:34 urbanecm@deploy2002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply * 10:33 urbanecm@deploy2002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply * 10:25 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host thanos-be1005.eqiad.wmnet with OS bullseye * 10:23 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P71112 and previous config saved to /var/cache/conftool/dbconfig/20241121-102328-arnaudb.json * 10:19 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 102 * 10:19 ayounsi@cumin1002: START - Cookbook sre.network.debug for Netbox circuit ID 102 * 10:08 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P71111 and previous config saved to /var/cache/conftool/dbconfig/20241121-100821-arnaudb.json * 10:01 dcausse@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 10:01 dcausse@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 09:59 dcausse: restarting eventgate-main@codfw * 09:53 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P71110 and previous config saved to /var/cache/conftool/dbconfig/20241121-095313-arnaudb.json * 09:51 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P71109 and previous config saved to /var/cache/conftool/dbconfig/20241121-095102-arnaudb.json * 09:50 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 09:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 09:50 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 09:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 09:35 moritzm: installing nghttp2 security updates * 09:18 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1246.eqiad.wmnet with OS bookworm * 09:17 aklapper@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.4 refs [[phab:T375663|T375663]] * 09:07 moritzm: installing exim4 security updates * 09:03 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1246.eqiad.wmnet with reason: host reimage * 09:00 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1246.eqiad.wmnet with reason: host reimage * 08:45 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host db1246.eqiad.wmnet with OS bookworm * 08:21 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1093733{{!}}Enable the Contribute menu in 4th group of Wikis (T375303)]] (duration: 14m 05s) * 08:14 kartik@deploy2002: kartik: Continuing with sync * 08:10 kartik@deploy2002: kartik: Backport for [[gerrit:1093733{{!}}Enable the Contribute menu in 4th group of Wikis (T375303)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:06 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1093733{{!}}Enable the Contribute menu in 4th group of Wikis (T375303)]] * 07:48 moritzm: removing ganeti1017 from active Ganeti nodes [[phab:T378921|T378921]] * 05:51 aikochou@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'article-models' for release 'main' . * 02:30 brett: Import libvmod-re2_2.0.0-2~bpo11u1 into varnish-staging apt component * 00:45 urandom: decommissioning Cassandra/restbase2021-<nowiki>{</nowiki>a,b,c<nowiki>}</nowiki> — [[phab:T380236|T380236]] * 00:42 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on restbase2023.codfw.wmnet with reason: Decommissioning — [[phab:T380236|T380236]] * 00:42 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on restbase2023.codfw.wmnet with reason: Decommissioning — [[phab:T380236|T380236]] * 00:42 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on restbase2022.codfw.wmnet with reason: Decommissioning — [[phab:T380236|T380236]] * 00:42 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on restbase2022.codfw.wmnet with reason: Decommissioning — [[phab:T380236|T380236]] * 00:42 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on restbase2021.codfw.wmnet with reason: Decommissioning — [[phab:T380236|T380236]] * 00:42 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on restbase2021.codfw.wmnet with reason: Decommissioning — [[phab:T380236|T380236]] * 00:40 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase2038.codfw.wmnet * 00:40 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for restbase2038.codfw.wmnet * 00:40 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase2037.codfw.wmnet * 00:40 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for restbase2037.codfw.wmnet * 00:40 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase2036.codfw.wmnet * 00:40 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for restbase2036.codfw.wmnet * 00:15 urbanecm: [urbanecm@deploy2002 ~]$ mwscript-k8s -- extensions/GrowthExperiments/maintenance/revalidateLinkRecommendations.php --wiki=azwiki --all --verbose # [[phab:T380329|T380329]] == 2024-11-20 == * 23:22 cjming: end of UTC late backport window * 23:20 eileen: civicrm upgraded from {{Gerrit|7c940d6f}} to {{Gerrit|3311520a}} * 23:17 cjming@deploy2002: Finished scap sync-world: Backport for [[gerrit:1093408{{!}}Temporarily disable dark mode for anonymous users (T379765)]] (duration: 13m 06s) * 23:10 cjming@deploy2002: jdlrobson, cjming: Continuing with sync * 23:08 cjming@deploy2002: jdlrobson, cjming: Backport for [[gerrit:1093408{{!}}Temporarily disable dark mode for anonymous users (T379765)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 23:04 cjming@deploy2002: Started scap sync-world: Backport for [[gerrit:1093408{{!}}Temporarily disable dark mode for anonymous users (T379765)]] * 23:03 cjming@deploy2002: Finished scap sync-world: Backport for [[gerrit:1093328{{!}}knwiki: update portal namespace (T380366)]] (duration: 12m 17s) * 22:56 cjming@deploy2002: cjming, anzx: Continuing with sync * 22:55 cjming@deploy2002: cjming, anzx: Backport for [[gerrit:1093328{{!}}knwiki: update portal namespace (T380366)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 22:52 brett: Import libvmod-querysort 0.4-3 into varnish-staging apt component * 22:51 cjming@deploy2002: Started scap sync-world: Backport for [[gerrit:1093328{{!}}knwiki: update portal namespace (T380366)]] * 22:49 cjming@deploy2002: Finished scap sync-world: Backport for [[gerrit:1093446{{!}}Revert "Add contact form for U4C"]] (duration: 14m 22s) * 22:49 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-be2005.codfw.wmnet with OS bullseye * 22:41 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply * 22:41 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply * 22:40 cjming@deploy2002: trainbranchbot, cjming: Continuing with sync * 22:40 cjming@deploy2002: trainbranchbot, cjming: Backport for [[gerrit:1093446{{!}}Revert "Add contact form for U4C"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 22:39 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply * 22:39 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply * 22:34 cjming@deploy2002: Started scap sync-world: Backport for [[gerrit:1093446{{!}}Revert "Add contact form for U4C"]] * 22:31 cjming@deploy2002: Sync cancelled. * 22:28 cjming@deploy2002: nmw03, cjming: Backport for [[gerrit:1091868{{!}}Add contact form for U4C (T379317)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 22:27 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-be2005.codfw.wmnet with reason: host reimage * 22:24 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-be2005.codfw.wmnet with reason: host reimage * 22:23 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply * 22:22 cjming@deploy2002: Started scap sync-world: Backport for [[gerrit:1091868{{!}}Add contact form for U4C (T379317)]] * 22:21 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply * 22:20 cjming@deploy2002: Finished scap sync-world: Backport for [[gerrit:1093358{{!}}Bump wikimedia/parsoid to 0.21.0-a7 (T373776 T380333)]], [[gerrit:1093359{{!}}Bump wikimedia/parsoid to 0.21.0-a7 (T380333)]] (duration: 17m 11s) * 22:18 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply * 22:16 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply * 22:13 cjming@deploy2002: arlolra, cjming: Continuing with sync * 22:12 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2005.codfw.wmnet with OS bullseye * 22:11 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-be2005.codfw.wmnet with OS bullseye * 22:11 jhathaway@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhathaway@cumin2002" * 22:09 jhathaway@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhathaway@cumin2002" * 22:08 cjming@deploy2002: arlolra, cjming: Backport for [[gerrit:1093358{{!}}Bump wikimedia/parsoid to 0.21.0-a7 (T373776 T380333)]], [[gerrit:1093359{{!}}Bump wikimedia/parsoid to 0.21.0-a7 (T380333)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 22:06 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply * 22:03 cjming@deploy2002: Started scap sync-world: Backport for [[gerrit:1093358{{!}}Bump wikimedia/parsoid to 0.21.0-a7 (T373776 T380333)]], [[gerrit:1093359{{!}}Bump wikimedia/parsoid to 0.21.0-a7 (T380333)]] * 22:02 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply * 21:52 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply * 21:50 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply * 21:47 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-be2005.codfw.wmnet with reason: host reimage * 21:43 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-be2005.codfw.wmnet with reason: host reimage * 21:40 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply * 21:32 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2005.codfw.wmnet with OS bullseye * 21:31 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-be2005.codfw.wmnet with OS bullseye * 21:28 cjming@deploy2002: Finished scap sync-world: Backport for [[gerrit:1091810{{!}}[ptwiki] Enable the CampaignEvents extension (T380090)]] (duration: 15m 04s) * 21:23 eileen: * civicrm upgraded from {{Gerrit|e29243f0}} to {{Gerrit|7c940d6f}} * 21:20 cjming@deploy2002: cjming, albertoleoncio: Continuing with sync * 21:19 cjming@deploy2002: cjming, albertoleoncio: Backport for [[gerrit:1091810{{!}}[ptwiki] Enable the CampaignEvents extension (T380090)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:13 cjming@deploy2002: Started scap sync-world: Backport for [[gerrit:1091810{{!}}[ptwiki] Enable the CampaignEvents extension (T380090)]] * 21:08 dancy@deploy2002: Installing scap version "4.124.0" for 209 hosts * 21:06 dancy@deploy2002: Installing scap version "4.124.0" for 209 hosts * 21:05 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-ctrl2003.codfw.wmnet * 21:05 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-ctrl2003.codfw.wmnet with OS bookworm * 21:03 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2005.codfw.wmnet with OS bullseye * 21:00 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-be2005.codfw.wmnet with OS bullseye * 20:51 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-ctrl2003.codfw.wmnet with reason: host reimage * 20:48 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply * 20:48 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply * 20:48 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-ctrl2003.codfw.wmnet with reason: host reimage * 20:48 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply * 20:47 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es2041.codfw.wmnet with OS bookworm * 20:44 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply * 20:40 dancy@deploy2002: Installation of scap version "4.126.0" completed for 1 hosts * 20:39 dancy@deploy2002: Installing scap version "4.126.0" for 1 hosts * 20:32 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-ctrl2003.codfw.wmnet with OS bookworm * 20:30 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2005.codfw.wmnet with OS bullseye * 20:30 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-be2005.codfw.wmnet with OS bullseye * 20:28 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-ctrl2003.codfw.wmnet - herron@cumin1002" * 20:28 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-ctrl2003.codfw.wmnet - herron@cumin1002" * 20:28 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-ctrl2003.codfw.wmnet on all recursors * 20:28 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-ctrl2003.codfw.wmnet on all recursors * 20:28 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:28 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-ctrl2003.codfw.wmnet - herron@cumin1002" * 20:26 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-ctrl2003.codfw.wmnet - herron@cumin1002" * 20:13 herron@cumin1002: START - Cookbook sre.dns.netbox * 20:13 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-ctrl2003.codfw.wmnet * 20:10 dancy@deploy2002: Installing scap version "4.126.0" for 1 hosts * 20:08 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2005.codfw.wmnet with OS bullseye * 20:05 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-be2005.codfw.wmnet with OS bullseye * 20:03 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host es2041.codfw.wmnet with OS bookworm * 19:52 hashar@deploy2002: Finished deploy [integration/docroot@1627206]: build: update mediawiki-codesniffer to 45.0.0 & prevent LibUp from removing a phpcs rule (duration: 00m 10s) * 19:52 hashar@deploy2002: Started deploy [integration/docroot@1627206]: build: update mediawiki-codesniffer to 45.0.0 & prevent LibUp from removing a phpcs rule * 19:51 dancy@deploy2002: Installing scap version "4.126.0" for 1 hosts * 19:47 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2005.codfw.wmnet with OS bullseye * 19:42 dancy@deploy2002: Installing scap version "4.126.0" for 209 hosts * 19:35 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-ctrl2002.codfw.wmnet * 19:35 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-ctrl2002.codfw.wmnet with OS bookworm * 19:20 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-ctrl2002.codfw.wmnet with reason: host reimage * 19:17 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-ctrl2002.codfw.wmnet with reason: host reimage * 19:12 urandom: bootstrapping cassandra, restbase2038-<nowiki>{</nowiki>a,b,c<nowiki>}</nowiki> — [[phab:T380236|T380236]] * 19:08 inflatador: bking@krb1001 add kerberos keytab for blunderbuss https://phabricator.wikimedia.org/P71106 [[phab:T371994|T371994]] * 19:04 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-ctrl2002.codfw.wmnet with OS bookworm * 19:03 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-ctrl2002.codfw.wmnet - herron@cumin1002" * 19:03 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-ctrl2002.codfw.wmnet - herron@cumin1002" * 19:03 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-ctrl2002.codfw.wmnet on all recursors * 19:03 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-ctrl2002.codfw.wmnet on all recursors * 19:03 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:03 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-ctrl2002.codfw.wmnet - herron@cumin1002" * 19:03 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-ctrl2002.codfw.wmnet - herron@cumin1002" * 18:58 herron@cumin1002: START - Cookbook sre.dns.netbox * 18:58 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-ctrl2002.codfw.wmnet * 17:32 joal@deploy2002: Finished deploy [analytics/refinery@295d5a4] (hadoop-test): Regular analytics weekly train BIS TEST [analytics/refinery@295d5a44] (duration: 03m 36s) * 17:28 joal@deploy2002: Started deploy [analytics/refinery@295d5a4] (hadoop-test): Regular analytics weekly train BIS TEST [analytics/refinery@295d5a44] * 17:28 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply * 17:27 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply * 17:22 joal@deploy2002: Finished deploy [analytics/refinery@295d5a4] (thin): Regular analytics weekly train BIS THIN [analytics/refinery@295d5a44] (duration: 05m 02s) * 17:22 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply * 17:21 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply * 17:20 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply * 17:19 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply * 17:18 joal@deploy2002: Started deploy [analytics/refinery@295d5a4] (thin): Regular analytics weekly train BIS THIN [analytics/refinery@295d5a44] * 17:17 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply * 17:16 joal@deploy2002: Finished deploy [analytics/refinery@295d5a4]: Regular analytics weekly train BIS [analytics/refinery@295d5a44] (duration: 03m 41s) * 17:12 joal@deploy2002: Started deploy [analytics/refinery@295d5a4]: Regular analytics weekly train BIS [analytics/refinery@295d5a44] * 17:05 sukhe: restart tomcat on idp2004 * 17:04 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply * 17:03 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply * 17:02 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply * 17:01 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply * 17:00 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply * 17:00 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply * 16:43 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply * 16:43 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply * 16:43 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply * 16:43 jiji@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply * 16:43 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventstreams: apply * 16:42 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/tegola-vector-tiles: apply * 16:40 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/tegola-vector-tiles: apply * 16:39 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply * 16:38 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply * 16:37 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/eventstreams: apply * 16:36 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply * 16:35 klausman@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'. * 16:35 jiji@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams: apply * 16:34 klausman@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'. * 16:28 jiji@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-main: apply * 16:26 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 16:25 aikochou@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revision-models' for release 'main' . * 16:24 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 16:23 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 16:22 jiji@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 16:22 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/benthos-cache-invalidator: apply * 16:21 jiji@deploy2002: helmfile [staging] START helmfile.d/services/benthos-cache-invalidator: apply * 16:15 aikochou@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revision-models' for release 'main' . * 16:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1017.eqiad.wmnet * 15:51 apine@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 15:50 apine@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 15:50 apine@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 15:49 apine@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 15:48 dancy@deploy2002: Finished scap sync-world: no-op deployment for testing. (duration: 03m 21s) * 15:44 dancy@deploy2002: Started scap sync-world: no-op deployment for testing. * 15:44 apine@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 15:44 apine@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 15:37 apine@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 15:37 apine@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 15:33 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: host overworked by dumps - [[phab:T368098|T368098]] * 15:33 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: host overworked by dumps - [[phab:T368098|T368098]] * 15:31 jynus: starting resharding of commons backup files into new host backup2010 [[phab:T376892|T376892]] * 15:27 apine@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 15:23 apine@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 15:23 apine@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 15:22 apine@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 15:22 apine@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 15:19 apine@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 15:19 apine@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 15:15 apine@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 15:14 apine@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 15:13 apine@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 15:13 apine@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 15:10 apine@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 15:09 apine@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 15:09 urandom: bootstrapping cassandra, restbase2037-<nowiki>{</nowiki>a,b,c<nowiki>}</nowiki> — [[phab:T380236|T380236]] * 15:04 btullis@cumin1002: END (PASS) - Cookbook sre.ceph.roll-restart-reboot-server (exit_code=0) rolling reboot on P<nowiki>{</nowiki>cephosd100[2-4].eqiad.wmnet<nowiki>}</nowiki> and (A:cephosd) * 14:57 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 14:53 JennH: power cycling unresponsive mgmt switch in codfw: msw-c3-codfw * 14:50 btullis@cumin1002: END (FAIL) - Cookbook sre.hadoop.roll-restart-workers (exit_code=99) restart workers for Hadoop analytics cluster: Roll restart of jvm daemons for openjdk upgrade. * 14:43 elukey@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 14:29 cdanis: [[phab:T380226|T380226]] 💙cdanis@mwmaint2002.codfw.wmnet ~ 🕤☕ mwscript sql.php --wiki=commonswiki --cluster=extension1 /srv/mediawiki/php-1.44.0-wmf.4/extensions/JsonConfig/sql/mysql/tables-generated.sql * 14:25 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp7007.magru.wmnet [reason: host reimaged] * 14:24 btullis@cumin1002: START - Cookbook sre.ceph.roll-restart-reboot-server rolling reboot on P<nowiki>{</nowiki>cephosd100[2-4].eqiad.wmnet<nowiki>}</nowiki> and (A:cephosd) * 14:23 jynus: starting resharding of commons backup files into new host backup1010 [[phab:T376892|T376892]] * 14:23 sukhe: running homer on asw*magru* * 14:06 jiji@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 14:05 jiji@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'. * 14:05 jiji@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 14:05 jiji@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 14:05 jiji@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'. * 14:04 jiji@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'. * 14:04 jiji@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 14:04 jiji@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 14:04 jiji@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 14:03 jiji@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 14:03 jiji@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 14:03 jiji@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 14:03 jiji@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 14:03 jiji@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 14:03 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 14:02 jiji@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'. * 14:02 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 14:02 jiji@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'. * 13:56 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2136-2139,2141-2155].codfw.wmnet * 13:55 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2136-2139,2141-2155].codfw.wmnet * 13:53 claime: homer 'lsw1-d4-codfw*' commit '[[phab:T377028|T377028]]' * 13:52 claime: homer 'lsw1-b4-codfw*' commit '[[phab:T377028|T377028]]' * 13:52 claime: homer 'lsw1-d2-codfw*' commit '[[phab:T377028|T377028]]' * 13:51 claime: homer 'lsw1-c2-codfw*' commit '[[phab:T377028|T377028]]' * 13:50 claime: homer 'lsw1-d7-codfw*' commit '[[phab:T377028|T377028]]' * 13:50 claime: homer 'lsw1-c4-codfw*' commit '[[phab:T377028|T377028]]' * 13:49 claime: homer 'lsw1-d5-codfw*' commit '[[phab:T377028|T377028]]' * 13:48 claime: homer 'lsw1-b7-codfw*' commit '[[phab:T377028|T377028]]' * 13:47 claime: homer 'lsw1-c7-codfw*' commit '[[phab:T377028|T377028]]' * 13:46 claime: homer 'lsw1-d6-codfw*' commit '[[phab:T377028|T377028]]' * 13:45 claime: homer 'lsw1-b2-codfw*' commit '[[phab:T377028|T377028]]' * 13:44 claime: homer 'lsw1-d1-codfw*' commit '[[phab:T377028|T377028]]' * 13:41 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2151.codfw.wmnet with OS bookworm * 13:38 effie: putting kafka-main1006.eqiad.wmnet in production * 13:38 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2152.codfw.wmnet with OS bookworm * 13:36 jiji@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-main-eqiad * 13:33 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2154.codfw.wmnet with OS bookworm * 13:31 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2155.codfw.wmnet with OS bookworm * 13:29 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 13:28 btullis@cumin1002: START - Cookbook sre.hadoop.roll-restart-workers restart workers for Hadoop analytics cluster: Roll restart of jvm daemons for openjdk upgrade. * 13:28 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 13:26 jiji@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-main-eqiad * 13:26 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2153.codfw.wmnet with OS bookworm * 13:23 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2150.codfw.wmnet with OS bookworm * 13:21 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 13:17 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7007.magru.wmnet with OS bullseye * 13:17 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 13:14 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 13:11 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 13:07 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 13:03 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 13:02 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 13:02 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 13:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1017.eqiad.wmnet * 13:01 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 13:01 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 13:00 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 13:00 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 12:55 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1017.eqiad.wmnet * 12:51 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 12:50 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7007.magru.wmnet with reason: host reimage * 12:50 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 12:49 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1017.eqiad.wmnet * 12:46 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7007.magru.wmnet with reason: host reimage * 12:44 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2155.codfw.wmnet with OS bookworm * 12:43 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2154.codfw.wmnet with OS bookworm * 12:42 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2153.codfw.wmnet with OS bookworm * 12:42 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2152.codfw.wmnet with OS bookworm * 12:41 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2143.codfw.wmnet with OS bookworm * 12:41 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2151.codfw.wmnet with OS bookworm * 12:41 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2150.codfw.wmnet with OS bookworm * 12:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2146.codfw.wmnet with OS bookworm * 12:38 sukhe: re-enable puppet on cumin2002 * 12:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 12:34 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2145.codfw.wmnet with OS bookworm * 12:33 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 12:31 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2147.codfw.wmnet with OS bookworm * 12:26 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2148.codfw.wmnet with OS bookworm * 12:23 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2149.codfw.wmnet with OS bookworm * 12:23 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 12:22 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 12:22 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 12:21 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2144.codfw.wmnet with OS bookworm * 12:20 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp7007.magru.wmnet with OS bullseye * 12:19 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host cp7007.magru.wmnet * 12:18 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 12:16 sukhe@cumin2002: START - Cookbook sre.hosts.dhcp for host cp7007.magru.wmnet * 12:16 sukhe@cumin1002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host cp7007.magru.wmnet * 12:15 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 12:14 sukhe@cumin1002: START - Cookbook sre.hosts.dhcp for host cp7007.magru.wmnet * 12:11 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 12:08 sukhe: disable puppet on cumin2002 to test cumin alias for A:installserver * 12:07 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 12:04 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 12:01 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 11:59 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 11:59 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 11:58 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 11:57 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 11:57 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 11:56 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 11:56 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 11:40 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2149.codfw.wmnet with OS bookworm * 11:39 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2148.codfw.wmnet with OS bookworm * 11:39 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2147.codfw.wmnet with OS bookworm * 11:38 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2146.codfw.wmnet with OS bookworm * 11:38 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2145.codfw.wmnet with OS bookworm * 11:37 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2144.codfw.wmnet with OS bookworm * 11:36 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2143.codfw.wmnet with OS bookworm * 11:30 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_magru * 11:24 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_magru * 11:22 akosiaris: decommission cxserver endpoints /api/rest_v1/transform/html/from, /api/rest_v1/transform/word/from from RESTBase [[phab:T375616|T375616]] * 10:43 btullis@cumin1002: END (PASS) - Cookbook sre.ceph.roll-restart-reboot-server (exit_code=0) rolling reboot on P<nowiki>{</nowiki>cephosd1001.eqiad.wmnet<nowiki>}</nowiki> and (A:cephosd) * 10:38 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_magru * 10:38 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_magru * 10:37 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_esams * 10:34 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_esams * 10:33 btullis@cumin1002: START - Cookbook sre.ceph.roll-restart-reboot-server rolling reboot on P<nowiki>{</nowiki>cephosd1001.eqiad.wmnet<nowiki>}</nowiki> and (A:cephosd) * 10:33 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on kafka-main[1001,1006].eqiad.wmnet with reason: Hardware refresh * 10:33 jayme: re-enabled puppet on all k8s controll planes for rollout of [[phab:T380142|T380142]] * 10:33 jiji@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on kafka-main[1001,1006].eqiad.wmnet with reason: Hardware refresh * 10:22 effie: removing leadership from kafka-main1001 - [[phab:T363214|T363214]] * 10:19 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 10:18 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 09:52 aklapper@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.4 refs [[phab:T375663|T375663]] * 09:44 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 09:44 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 09:41 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . * 09:38 akosiaris: decommission cxserver endpoints /api/rest_v1/list/(pair{{!}}tool{{!}}languagepairs) from RESTBase [[phab:T375616|T375616]] * 09:35 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 09:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 09:33 aklapper@deploy2002: Finished scap sync-world: Backport for [[gerrit:1093172{{!}}EditionLookup: Update EntityLookup calls (T380304)]] (duration: 13m 33s) * 09:33 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_esams * 09:33 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_esams * 09:28 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 09:27 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 09:27 aklapper@deploy2002: aklapper, thiemowmde: Continuing with sync * 09:26 aklapper@deploy2002: aklapper, thiemowmde: Backport for [[gerrit:1093172{{!}}EditionLookup: Update EntityLookup calls (T380304)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 09:21 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of prometheus7001.magru.wmnet to plain * 09:20 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of prometheus7001.magru.wmnet to plain * 09:20 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 09:20 aklapper@deploy2002: Started scap sync-world: Backport for [[gerrit:1093172{{!}}EditionLookup: Update EntityLookup calls (T380304)]] * 09:19 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 09:18 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of doh7002.wikimedia.org to plain * 09:15 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of doh7002.wikimedia.org to plain * 09:13 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir7002.magru.wmnet to plain * 09:13 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir7002.magru.wmnet to plain * 08:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of durum7002.magru.wmnet to plain * 08:51 jayme: disabling puppet on all k8s controll planes for rollout of [[phab:T380142|T380142]] * 08:48 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of durum7002.magru.wmnet to plain * 08:46 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of bast7001.wikimedia.org to plain * 08:44 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of bast7001.wikimedia.org to plain * 08:35 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti7004.magru.wmnet * 08:35 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti7004.magru.wmnet * 08:35 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti7004.magru.wmnet * 08:34 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti7004.magru.wmnet * 08:18 hashar: Restarted CI Jenkins to upgrade Leastload plugin and remove the SSH server plugin == 2024-11-19 == * 22:50 ryankemper@deploy2002: Started deploy [wdqs/wdqs@9927a5a] (wcqs): Deploy 0.3.150 to WCQS * 22:00 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1092341{{!}}Enable experimental Parsoid fragment support on labs and test wikis (T374661)]], [[gerrit:1092850{{!}}Revert "editcheck: Remove try/catch around transaction squashing" (T333710 T380234)]], [[gerrit:1092851{{!}}Revert "editcheck: Remove try/catch around transaction squashing" (T333710 T380234)]] (duration: 20m 39s) * 21:53 urbanecm@deploy2002: cscott, kemayo, urbanecm: Continuing with sync * 21:45 urbanecm@deploy2002: cscott, kemayo, urbanecm: Backport for [[gerrit:1092341{{!}}Enable experimental Parsoid fragment support on labs and test wikis (T374661)]], [[gerrit:1092850{{!}}Revert "editcheck: Remove try/catch around transaction squashing" (T333710 T380234)]], [[gerrit:1092851{{!}}Revert "editcheck: Remove try/catch around transaction squashing" (T333710 T380234)]] synced to the testservers (https://wikitech.wikimedia.or * 21:39 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es2041.codfw.wmnet with OS bookworm * 21:39 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1092341{{!}}Enable experimental Parsoid fragment support on labs and test wikis (T374661)]], [[gerrit:1092850{{!}}Revert "editcheck: Remove try/catch around transaction squashing" (T333710 T380234)]], [[gerrit:1092851{{!}}Revert "editcheck: Remove try/catch around transaction squashing" (T333710 T380234)]] * 21:38 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1092296{{!}}Promote Vector 2022 as default on 3 wikis (T379765)]], [[gerrit:1092912{{!}}Separate cache key space for test & production JsonConfig data (T380320)]] (duration: 14m 38s) * 21:31 urbanecm@deploy2002: bvibber, jdlrobson, urbanecm: Continuing with sync * 21:29 urbanecm@deploy2002: bvibber, jdlrobson, urbanecm: Backport for [[gerrit:1092296{{!}}Promote Vector 2022 as default on 3 wikis (T379765)]], [[gerrit:1092912{{!}}Separate cache key space for test & production JsonConfig data (T380320)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:23 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1092296{{!}}Promote Vector 2022 as default on 3 wikis (T379765)]], [[gerrit:1092912{{!}}Separate cache key space for test & production JsonConfig data (T380320)]] * 21:16 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on restbase2038.codfw.wmnet with reason: Bootstrapping — [[phab:T380236|T380236]] * 21:15 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on restbase2038.codfw.wmnet with reason: Bootstrapping — [[phab:T380236|T380236]] * 21:15 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on restbase2037.codfw.wmnet with reason: Bootstrapping — [[phab:T380236|T380236]] * 21:15 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on restbase2037.codfw.wmnet with reason: Bootstrapping — [[phab:T380236|T380236]] * 21:15 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on restbase2036.codfw.wmnet with reason: Bootstrapping — [[phab:T380236|T380236]] * 21:15 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on restbase2036.codfw.wmnet with reason: Bootstrapping — [[phab:T380236|T380236]] * 20:56 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host es2041.codfw.wmnet with OS bookworm * 20:50 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host thanos-be2005.codfw.wmnet with OS bullseye * 20:40 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2005.codfw.wmnet with OS bullseye * 20:40 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-be2005.codfw.wmnet with OS bullseye * 20:32 sukhe@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp7007.magru.wmnet with OS bullseye * 20:29 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host cp7007.magru.wmnet with OS bullseye * 20:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es2041.codfw.wmnet with OS bookworm * 20:24 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2005.codfw.wmnet with OS bullseye * 20:10 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 20:10 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 20:05 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host es2041.codfw.wmnet with OS bookworm * 20:03 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1183.eqiad.wmnet with OS bullseye * 20:03 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 19:47 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host cp7007.magru.wmnet * 19:41 sukhe@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp7007.magru.wmnet with OS bullseye * 19:40 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host cp7007.magru.wmnet * 19:34 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 19:17 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@a4d0954]: mjolnir: [[phab:T379045|T379045]] Increase maxResultSize (duration: 00m 26s) * 19:16 ebernhardson@deploy2002: Started deploy [airflow-dags/search@a4d0954]: mjolnir: [[phab:T379045|T379045]] Increase maxResultSize * 19:15 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host cp7007.magru.wmnet with OS bullseye * 19:14 sukhe@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp7007.magru.wmnet with OS bullseye * 19:12 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1183.eqiad.wmnet with reason: host reimage * 19:08 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host cp7007.magru.wmnet with OS bullseye * 19:08 sukhe@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp7007.magru.wmnet with OS bullseye * 19:08 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1183.eqiad.wmnet with reason: host reimage * 19:05 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:05 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 18:53 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1183.eqiad.wmnet with OS bullseye * 18:53 brett: Import ncmonitor 1.3.0-1 into main apt repo * 18:52 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1183.eqiad.wmnet with OS bullseye * 18:48 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host cp7007.magru.wmnet with OS bullseye * 18:47 sukhe@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp7007.magru.wmnet with OS bullseye * 18:39 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply * 18:36 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply * 18:34 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply * 18:34 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host cp7007.magru.wmnet with OS bullseye * 18:34 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply * 18:34 sukhe@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp7007.magru.wmnet with OS bullseye * 18:32 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 18:32 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 18:07 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host cp7007.magru.wmnet with OS bullseye * 17:57 brennen@deploy2002: Finished scap sync-world: Backport for [[gerrit:1092875{{!}}Prevent ce_event_wikis query when feature flag is off (T380288)]] (duration: 15m 10s) * 17:56 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1326.eqiad.wmnet with OS bookworm * 17:56 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 17:55 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 17:54 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1327.eqiad.wmnet with OS bookworm * 17:53 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 17:53 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 17:52 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1183.eqiad.wmnet with OS bullseye * 17:50 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1325.eqiad.wmnet with OS bookworm * 17:50 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 17:50 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 17:50 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1183.eqiad.wmnet with OS bullseye * 17:50 brennen@deploy2002: daimona, brennen: Continuing with sync * 17:48 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1323.eqiad.wmnet with OS bookworm * 17:48 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 17:47 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host wikikube-worker1290 * 17:47 cmooney@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1290 * 17:47 brennen@deploy2002: daimona, brennen: Backport for [[gerrit:1092875{{!}}Prevent ce_event_wikis query when feature flag is off (T380288)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 17:47 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 17:45 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1322.eqiad.wmnet with OS bookworm * 17:45 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 17:43 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 17:42 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on wikikube-worker1290.eqiad.wmnet with reason: being moved to new port * 17:42 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on wikikube-worker1290.eqiad.wmnet with reason: being moved to new port * 17:42 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 17:41 brennen@deploy2002: Started scap sync-world: Backport for [[gerrit:1092875{{!}}Prevent ce_event_wikis query when feature flag is off (T380288)]] * 17:41 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 17:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1324.eqiad.wmnet with OS bookworm * 17:41 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 17:40 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 17:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1326.eqiad.wmnet with reason: host reimage * 17:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2110.codfw.wmnet with OS bullseye * 17:37 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:36 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1327.eqiad.wmnet with reason: host reimage * 17:34 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1183.eqiad.wmnet with OS bullseye * 17:32 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1325.eqiad.wmnet with reason: host reimage * 17:29 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1323.eqiad.wmnet with reason: host reimage * 17:28 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1326.eqiad.wmnet with reason: host reimage * 17:28 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1327.eqiad.wmnet with reason: host reimage * 17:28 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1325.eqiad.wmnet with reason: host reimage * 17:26 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1322.eqiad.wmnet with reason: host reimage * 17:23 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1324.eqiad.wmnet with reason: host reimage * 17:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2110.codfw.wmnet with reason: host reimage * 17:18 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1323.eqiad.wmnet with reason: host reimage * 17:18 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1314.eqiad.wmnet with OS bookworm * 17:18 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 17:18 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1324.eqiad.wmnet with reason: host reimage * 17:18 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1322.eqiad.wmnet with reason: host reimage * 17:18 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 17:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2110.codfw.wmnet with reason: host reimage * 17:15 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2140.codfw.wmnet with OS bookworm * 17:15 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1318.eqiad.wmnet with OS bookworm * 17:15 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 17:14 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 17:11 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1319.eqiad.wmnet with OS bookworm * 17:11 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 17:11 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 17:11 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1326.eqiad.wmnet with OS bookworm * 17:10 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1327.eqiad.wmnet with OS bookworm * 17:10 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1325.eqiad.wmnet with OS bookworm * 17:09 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1320.eqiad.wmnet with OS bookworm * 17:09 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 17:08 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 17:04 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1321.eqiad.wmnet with OS bookworm * 17:04 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 17:04 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 17:02 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1316.eqiad.wmnet with OS bookworm * 17:02 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 17:01 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 17:00 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1323.eqiad.wmnet with OS bookworm * 17:00 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1324.eqiad.wmnet with OS bookworm * 17:00 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1322.eqiad.wmnet with OS bookworm * 17:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2110.codfw.wmnet with OS bullseye * 17:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2110'] * 17:00 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1314.eqiad.wmnet with reason: host reimage * 17:00 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2110'] * 16:58 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1317.eqiad.wmnet with OS bookworm * 16:58 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 16:58 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 16:56 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1318.eqiad.wmnet with reason: host reimage * 16:56 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1315.eqiad.wmnet with OS bookworm * 16:56 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 16:55 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 16:53 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1319.eqiad.wmnet with reason: host reimage * 16:52 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1313.eqiad.wmnet with OS bookworm * 16:52 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 16:52 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 16:50 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1320.eqiad.wmnet with reason: host reimage * 16:46 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1321.eqiad.wmnet with reason: host reimage * 16:43 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1316.eqiad.wmnet with reason: host reimage * 16:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1317.eqiad.wmnet with reason: host reimage * 16:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2110.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:37 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1315.eqiad.wmnet with reason: host reimage * 16:36 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1320.eqiad.wmnet with reason: host reimage * 16:36 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp7007.magru.wmnet * 16:35 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1321.eqiad.wmnet with reason: host reimage * 16:34 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1318.eqiad.wmnet with reason: host reimage * 16:34 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1319.eqiad.wmnet with reason: host reimage * 16:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1313.eqiad.wmnet with reason: host reimage * 16:33 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1316.eqiad.wmnet with reason: host reimage * 16:33 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1317.eqiad.wmnet with reason: host reimage * 16:33 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1315.eqiad.wmnet with reason: host reimage * 16:31 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1314.eqiad.wmnet with reason: host reimage * 16:30 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1313.eqiad.wmnet with reason: host reimage * 16:29 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2110.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:28 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic2110.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2110.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:24 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2142.codfw.wmnet with OS bookworm * 16:19 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2139.codfw.wmnet with OS bookworm * 16:17 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1319.eqiad.wmnet with OS bookworm * 16:17 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1320.eqiad.wmnet with OS bookworm * 16:17 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1321.eqiad.wmnet with OS bookworm * 16:17 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1318.eqiad.wmnet with OS bookworm * 16:16 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2141.codfw.wmnet with OS bookworm * 16:15 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1317.eqiad.wmnet with OS bookworm * 16:15 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1316.eqiad.wmnet with OS bookworm * 16:15 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1315.eqiad.wmnet with OS bookworm * 16:13 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1314.eqiad.wmnet with OS bookworm * 16:13 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1313.eqiad.wmnet with OS bookworm * 16:13 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2138.codfw.wmnet with OS bookworm * 16:09 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2137.codfw.wmnet with OS bookworm * 16:07 dreamyjazz@deploy2002: Finished scap sync-world: Backport for [[gerrit:1092856{{!}}ExperimentUserDefaultsManager: Decrease log severity to debug (T380271)]] (duration: 13m 16s) * 16:04 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 16:03 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 16:00 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2139.codfw.wmnet with reason: host reimage * 15:59 dreamyjazz@deploy2002: dreamyjazz: Continuing with sync * 15:59 dreamyjazz@deploy2002: dreamyjazz: Backport for [[gerrit:1092856{{!}}ExperimentUserDefaultsManager: Decrease log severity to debug (T380271)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:57 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:55 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2140.codfw.wmnet with OS bookworm * 15:54 cgoubert@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-worker2140.codfw.wmnet with OS bookworm * 15:53 dreamyjazz@deploy2002: Started scap sync-world: Backport for [[gerrit:1092856{{!}}ExperimentUserDefaultsManager: Decrease log severity to debug (T380271)]] * 15:53 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 15:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 15:48 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:47 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:47 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2139.codfw.wmnet with reason: host reimage * 15:46 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 15:46 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 15:45 moritzm: installing libheif security updates * 15:44 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 15:40 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 15:29 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2142.codfw.wmnet with OS bookworm * 15:29 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2141.codfw.wmnet with OS bookworm * 15:29 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2139.codfw.wmnet with OS bookworm * 15:28 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2138.codfw.wmnet with OS bookworm * 15:28 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2137.codfw.wmnet with OS bookworm * 15:25 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2139.codfw.wmnet with OS bookworm * 15:25 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2138.codfw.wmnet with OS bookworm * 15:22 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 15:21 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2142.codfw.wmnet with OS bookworm * 15:21 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2141.codfw.wmnet with OS bookworm * 15:21 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2137.codfw.wmnet with OS bookworm * 15:21 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 15:15 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7007.magru.wmnet with OS bullseye * 15:14 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_eqiad * 15:11 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_eqiad * 15:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from codfw to eqiad * 15:06 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.finalize for the switch from codfw to eqiad * 15:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from codfw to eqiad * 15:05 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from codfw to eqiad * away: UTC afternoon deploys done * 14:59 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1092333{{!}}Use 'auth' rather than 'sso' as cookie prefix on the auth domain (T379811)]] (duration: 14m 16s) * 14:52 tgr@deploy2002: tgr: Continuing with sync * 14:50 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7007.magru.wmnet with reason: host reimage * 14:50 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from eqiad to codfw * 14:50 tgr@deploy2002: tgr: Backport for [[gerrit:1092333{{!}}Use 'auth' rather than 'sso' as cookie prefix on the auth domain (T379811)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:49 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.finalize for the switch from eqiad to codfw * 14:49 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from eqiad to codfw * 14:48 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from eqiad to codfw * 14:46 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7007.magru.wmnet with reason: host reimage * 14:45 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2140.codfw.wmnet with OS bookworm * 14:44 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1092333{{!}}Use 'auth' rather than 'sso' as cookie prefix on the auth domain (T379811)]] * 14:44 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2142.codfw.wmnet with OS bookworm * 14:44 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2141.codfw.wmnet with OS bookworm * 14:43 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2139.codfw.wmnet with OS bookworm * 14:42 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2138.codfw.wmnet with OS bookworm * 14:41 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2137.codfw.wmnet with OS bookworm * 14:40 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 14:39 elukey: limit /v2/_catalog to internal IPs only for all Docker Registry nodes - [[phab:T378618|T378618]] * 14:38 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1092740{{!}}Enable message group subscription feature for MediaWiki.org (T372386)]] (duration: 16m 21s) * 14:35 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from codfw to eqiad * 14:34 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.finalize for the switch from codfw to eqiad * 14:34 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from codfw to eqiad * 14:33 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from codfw to eqiad * 14:31 kartik@deploy2002: kartik, abi: Continuing with sync * 14:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from eqiad to codfw * 14:30 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.finalize for the switch from eqiad to codfw * 14:29 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from eqiad to codfw * 14:28 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from eqiad to codfw * 14:28 kartik@deploy2002: kartik, abi: Backport for [[gerrit:1092740{{!}}Enable message group subscription feature for MediaWiki.org (T372386)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:26 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_eqiad * 14:26 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_eqiad * 14:25 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from codfw to eqiad * 14:24 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.finalize for the switch from codfw to eqiad * 14:23 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from codfw to eqiad * 14:23 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from codfw to eqiad * 14:22 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1092740{{!}}Enable message group subscription feature for MediaWiki.org (T372386)]] * 14:22 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from codfw to eqiad * 14:21 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.finalize for the switch from codfw to eqiad * 14:21 fabfur@cumin1002: START - Cookbook sre.hosts.reimage for host cp7007.magru.wmnet with OS bullseye * 14:21 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_drmrs * 14:18 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_drmrs * 14:17 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1092257{{!}}Enable the Contribute menu in 3rd group of Wikis (T375301)]] (duration: 15m 07s) * 14:15 joal@deploy2002: Finished deploy [analytics/refinery@295d5a4]: Regular analytics weekly train [analytics/refinery@295d5a44] (duration: 08m 56s) * 14:11 kartik@deploy2002: kartik: Continuing with sync * 14:10 akosiaris@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker1290.eqiad.wmnet * 14:10 kartik@deploy2002: kartik: Backport for [[gerrit:1092257{{!}}Enable the Contribute menu in 3rd group of Wikis (T375301)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:10 akosiaris@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker1290.eqiad.wmnet * 14:07 ihurbain@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: apply * 14:06 joal@deploy2002: Started deploy [analytics/refinery@295d5a4]: Regular analytics weekly train [analytics/refinery@295d5a44] * 14:06 ihurbain@deploy2002: helmfile [codfw] START helmfile.d/services/proton: apply * 14:05 ihurbain@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: apply * 14:04 ihurbain@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: apply * 14:03 ihurbain@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: apply * 14:02 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1092257{{!}}Enable the Contribute menu in 3rd group of Wikis (T375301)]] * 14:02 ihurbain@deploy2002: helmfile [staging] START helmfile.d/services/proton: apply * 14:01 ihurbain@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: apply * 14:01 ihurbain@deploy2002: helmfile [staging] START helmfile.d/services/proton: apply * 13:27 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_drmrs * 13:27 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_drmrs * 13:08 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 266098 * 13:08 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 266098 * 13:08 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 267521 * 13:07 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 267521 * 13:07 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 201838 * 13:06 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 201838 * 13:06 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 262979 * 13:06 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 262979 * 13:06 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 266631 * 13:06 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 266631 * 13:05 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 53180 * 13:05 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 53180 * 13:05 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 21574 * 13:05 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 21574 * 12:57 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:55 cgoubert@cumin1002: START - Cookbook sre.dns.netbox * 12:43 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from eqiad to codfw * 12:42 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.finalize for the switch from eqiad to codfw * 12:41 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from eqiad to codfw * 12:40 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from eqiad to codfw * 12:38 arnaudb@cumin1002: END (FAIL) - Cookbook sre.switchdc.databases.prepare (exit_code=99) for the switch from eqiad to codfw * 12:36 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from eqiad to codfw * 12:35 moritzm: removing ganeti1016 from active Ganeti nodes [[phab:T378921|T378921]] * 12:30 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_codfw * 12:27 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_codfw * 12:23 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from codfw to eqiad * 12:22 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.finalize for the switch from codfw to eqiad * 12:20 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from codfw to eqiad * 12:18 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from codfw to eqiad * 11:59 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1016.eqiad.wmnet * 11:44 arnaudb@cumin1002: dbctl commit (dc=all): 'db2216 (re)pooling @ 100%: repool', diff saved to https://phabricator.wikimedia.org/P71095 and previous config saved to /var/cache/conftool/dbconfig/20241119-114422-arnaudb.json * 11:40 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_codfw * 11:40 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_codfw * 11:29 arnaudb@cumin1002: dbctl commit (dc=all): 'db2216 (re)pooling @ 75%: repool', diff saved to https://phabricator.wikimedia.org/P71094 and previous config saved to /var/cache/conftool/dbconfig/20241119-112917-arnaudb.json * 11:14 arnaudb@cumin1002: dbctl commit (dc=all): 'db2216 (re)pooling @ 50%: repool', diff saved to https://phabricator.wikimedia.org/P71093 and previous config saved to /var/cache/conftool/dbconfig/20241119-111411-arnaudb.json * 11:05 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2004.codfw.wmnet * 11:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 207947 * 11:03 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 207947 * 10:59 arnaudb@cumin1002: dbctl commit (dc=all): 'db2216 (re)pooling @ 25%: repool', diff saved to https://phabricator.wikimedia.org/P71092 and previous config saved to /var/cache/conftool/dbconfig/20241119-105906-arnaudb.json * 10:58 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-gp2004.codfw.wmnet * 10:44 arnaudb@cumin1002: dbctl commit (dc=all): 'db2216 (re)pooling @ 15%: repool', diff saved to https://phabricator.wikimedia.org/P71091 and previous config saved to /var/cache/conftool/dbconfig/20241119-104401-arnaudb.json * 10:41 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_eqsin * 10:37 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_eqsin * 10:28 arnaudb@cumin1002: dbctl commit (dc=all): 'db2216 (re)pooling @ 10%: repool', diff saved to https://phabricator.wikimedia.org/P71090 and previous config saved to /var/cache/conftool/dbconfig/20241119-102855-arnaudb.json * 10:27 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-docker-registry (exit_code=0) rolling restart_daemons on A:docker-registry * 10:25 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-docker-registry rolling restart_daemons on A:docker-registry * 10:16 moritzm: restart spamd on vrts to pick up openssl updates * 10:13 arnaudb@cumin1002: dbctl commit (dc=all): 'db2216 (re)pooling @ 5%: repool', diff saved to https://phabricator.wikimedia.org/P71089 and previous config saved to /var/cache/conftool/dbconfig/20241119-101350-arnaudb.json * 10:02 moritzm: installing openssl security updates * 10:00 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from eqiad to codfw * 10:00 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.finalize for the switch from eqiad to codfw * 09:59 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from eqiad to codfw * 09:59 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.finalize for the switch from eqiad to codfw * 09:58 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from eqiad to codfw * 09:58 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.finalize for the switch from eqiad to codfw * 09:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from eqiad to codfw * 09:52 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from eqiad to codfw * 09:51 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply * 09:51 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply * 09:49 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from eqiad to codfw * 09:49 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from eqiad to codfw * 09:42 fabfur: upgrade haproxy on cp-text{{!}}upload_eqsin ([[phab:T379891|T379891]]) * 09:42 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_eqsin * 09:41 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_eqsin * 09:39 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from codfw to eqiad * 09:39 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply * 09:39 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply * 09:39 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.finalize for the switch from codfw to eqiad * 09:39 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 09:38 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from codfw to eqiad * 09:38 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 09:35 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from codfw to eqiad * 09:33 dcausse@deploy2002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply * 09:32 dcausse@deploy2002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply * 09:19 aklapper@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.4 refs [[phab:T375663|T375663]] * 09:18 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from codfw to eqiad * 09:18 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from codfw to eqiad * 08:59 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1092752{{!}}Add + to nowiki in core-Permissions.php (T380252)]] (duration: 10m 17s) * 08:54 urbanecm@deploy2002: urbanecm, jhsoby: Continuing with sync * 08:54 urbanecm@deploy2002: urbanecm, jhsoby: Backport for [[gerrit:1092752{{!}}Add + to nowiki in core-Permissions.php (T380252)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:49 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1092752{{!}}Add + to nowiki in core-Permissions.php (T380252)]] * 08:48 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1092741{{!}}fix tours by finishing partial variable rename (T380071)]], [[gerrit:1092364{{!}}affcom contactpages: Fix Letter of intent and logo field labels (T375392)]], [[gerrit:1092743{{!}}Add nowiki to commonsuploads dblist (T380252)]] (duration: 14m 29s) * 08:43 urbanecm@deploy2002: ammarpad, migr, jhsoby, urbanecm: Continuing with sync * 08:39 urbanecm@deploy2002: ammarpad, migr, jhsoby, urbanecm: Backport for [[gerrit:1092741{{!}}fix tours by finishing partial variable rename (T380071)]], [[gerrit:1092364{{!}}affcom contactpages: Fix Letter of intent and logo field labels (T375392)]], [[gerrit:1092743{{!}}Add nowiki to commonsuploads dblist (T380252)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:34 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1092741{{!}}fix tours by finishing partial variable rename (T380071)]], [[gerrit:1092364{{!}}affcom contactpages: Fix Letter of intent and logo field labels (T375392)]], [[gerrit:1092743{{!}}Add nowiki to commonsuploads dblist (T380252)]] * 08:29 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1082726{{!}}Translate Event Logging: Enable using $wgTranslateEnableEventLogging (T364460)]], [[gerrit:1092258{{!}}CirrusSearch: enable offloading weighted tags via EventBus (T378983 T377150)]], [[gerrit:1091197{{!}}[GrowthExperiments] Add virtual domain config (T354939)]] (duration: 24m 42s) * 08:22 urbanecm@deploy2002: urbanecm, wangombe, pfischer: Continuing with sync * 08:12 urbanecm@deploy2002: urbanecm, wangombe, pfischer: Backport for [[gerrit:1082726{{!}}Translate Event Logging: Enable using $wgTranslateEnableEventLogging (T364460)]], [[gerrit:1092258{{!}}CirrusSearch: enable offloading weighted tags via EventBus (T378983 T377150)]], [[gerrit:1091197{{!}}[GrowthExperiments] Add virtual domain config (T354939)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:04 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1082726{{!}}Translate Event Logging: Enable using $wgTranslateEnableEventLogging (T364460)]], [[gerrit:1092258{{!}}CirrusSearch: enable offloading weighted tags via EventBus (T378983 T377150)]], [[gerrit:1091197{{!}}[GrowthExperiments] Add virtual domain config (T354939)]] * 07:45 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2202.codfw.wmnet with reason: sad * 07:45 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2202.codfw.wmnet with reason: sad * 07:41 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db1246.eqiad.wmnet with reason: [[phab:T374215|T374215]] - hw maintenance * 07:40 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db1246.eqiad.wmnet with reason: [[phab:T374215|T374215]] - hw maintenance * 07:32 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1016.eqiad.wmnet * 07:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1016.eqiad.wmnet * 07:24 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1016.eqiad.wmnet * 05:01 mwpresync@deploy2002: Pruned MediaWiki: 1.44.0-wmf.1 (duration: 01m 18s) * 04:52 mwpresync@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.4 refs [[phab:T375663|T375663]] (duration: 49m 01s) * 04:16 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1062.eqiad.wmnet with OS bookworm * 04:03 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.4 refs [[phab:T375663|T375663]] * 04:00 ejegg: fundraising civicrm upgraded from {{Gerrit|463a12c5}} to {{Gerrit|e29243f0}} * 03:51 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1062.eqiad.wmnet with reason: host reimage * 03:48 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1062.eqiad.wmnet with reason: host reimage * 03:33 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1062.eqiad.wmnet with OS bookworm * 03:09 ejegg: payments-wiki upgraded from {{Gerrit|459f259b}} to {{Gerrit|c4463536}} * 02:31 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-jumbo1018.eqiad.wmnet with OS bullseye * 02:30 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 02:30 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 02:23 ejegg: standalone (IPN listener) SmashPig upgraded from {{Gerrit|601405dc}} to {{Gerrit|131e92a5}} * 02:12 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-jumbo1018.eqiad.wmnet with reason: host reimage * 02:08 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-jumbo1018.eqiad.wmnet with reason: host reimage * 01:54 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-jumbo1018.eqiad.wmnet with OS bullseye * 01:54 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-jumbo1018.eqiad.wmnet with OS bullseye * 01:51 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-jumbo1016.eqiad.wmnet with OS bullseye * 01:51 jclark@cumin1002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 01:50 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-jumbo1017.eqiad.wmnet with OS bullseye * 01:50 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 01:40 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 01:24 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 01:24 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-jumbo1017.eqiad.wmnet with reason: host reimage * 01:21 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-jumbo1017.eqiad.wmnet with reason: host reimage * 01:12 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host maps-test2006.codfw.wmnet with OS bookworm * 01:12 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002" * 01:07 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-jumbo1018.eqiad.wmnet with OS bullseye * 01:07 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-jumbo1017.eqiad.wmnet with OS bullseye * 01:06 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-jumbo1017.eqiad.wmnet with OS bullseye * 01:03 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002" * 01:02 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-jumbo1016.eqiad.wmnet with reason: host reimage * 00:58 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-jumbo1016.eqiad.wmnet with reason: host reimage * 00:54 tzatziki: removing 1 file for legal compliance * 00:53 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-be2005.codfw.wmnet with OS bookworm * 00:51 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host maps-test2005.codfw.wmnet with OS bookworm * 00:51 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002" * 00:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-jumbo1016.eqiad.wmnet with OS bullseye * 00:42 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps-test2006.codfw.wmnet with reason: host reimage * 00:41 tzatziki: removing 1 file for legal compliance * 00:39 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-jumbo1016.eqiad.wmnet with OS bullseye * 00:39 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps-test2006.codfw.wmnet with reason: host reimage * 00:34 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002" * 00:18 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-jumbo1017.eqiad.wmnet with OS bullseye * 00:18 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-jumbo1017.eqiad.wmnet with OS bullseye * 00:14 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host maps-test2006.codfw.wmnet with OS bookworm * 00:14 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps-test2005.codfw.wmnet with reason: host reimage * 00:14 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host maps-test2004.codfw.wmnet with OS bookworm * 00:14 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002" * 00:10 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002" * 00:10 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps-test2005.codfw.wmnet with reason: host reimage * 00:03 tzatziki: removing 1 file for legal compliance * 00:00 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host maps-test2003.codfw.wmnet with OS bookworm * 00:00 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002" == 2024-11-18 == * 23:51 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002" * 23:50 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps-test2004.codfw.wmnet with reason: host reimage * 23:48 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps-test2004.codfw.wmnet with reason: host reimage * 23:46 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host maps-test2005.codfw.wmnet with OS bookworm * 23:32 tzatziki: removing 1 file for legal compliance * 23:31 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps-test2003.codfw.wmnet with reason: host reimage * 23:28 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host maps-test2002.codfw.wmnet with OS bookworm * 23:28 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002" * 23:27 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002" * 23:26 tzatziki: removing 1 file for legal compliance * 23:26 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps-test2003.codfw.wmnet with reason: host reimage * 23:25 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host maps-test2004.codfw.wmnet with OS bookworm * 23:19 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-be2005.codfw.wmnet with reason: host reimage * 23:15 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-be2005.codfw.wmnet with reason: host reimage * 23:12 tzatziki: removing 2 files for legal compliance * 23:09 eevans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:09 eevans@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Additional IPs for Cassandra — restbase2036 - eevans@cumin1002" * 23:09 eevans@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Additional IPs for Cassandra — restbase2036 - eevans@cumin1002" * 23:08 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps-test2002.codfw.wmnet with reason: host reimage * 23:06 eevans@cumin1002: START - Cookbook sre.dns.netbox * 23:05 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps-test2002.codfw.wmnet with reason: host reimage * 23:04 eevans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:04 eevans@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Additional IPs for Cassandra — restbase2036 - eevans@cumin1002" * 23:04 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host maps-test2003.codfw.wmnet with OS bookworm * 23:04 eevans@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Additional IPs for Cassandra — restbase2036 - eevans@cumin1002" * 23:03 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2005.codfw.wmnet with OS bookworm * 23:01 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host thanos-be2005.codfw.wmnet with OS bookworm * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-jumbo1018.eqiad.wmnet with OS bullseye * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-jumbo1017.eqiad.wmnet with OS bullseye * 23:00 eevans@cumin1002: START - Cookbook sre.dns.netbox * 22:59 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-jumbo1016.eqiad.wmnet with OS bullseye * 22:57 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2005.codfw.wmnet with OS bookworm * 22:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es2045.codfw.wmnet with OS bookworm * 22:55 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host thanos-be2005.codfw.wmnet with OS bookworm * 22:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es2044.codfw.wmnet with OS bookworm * 22:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es2046.codfw.wmnet with OS bookworm * 22:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es2043.codfw.wmnet with OS bookworm * 22:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es2041.codfw.wmnet with OS bookworm * 22:52 tzatziki: removing 10 files for legal compliance * 22:50 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host maps-test2001.codfw.wmnet with OS bookworm * 22:50 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002" * 22:49 bking@deploy2002: Finished deploy [wdqs/wdqs@9927a5a]: 0.3.150 (duration: 11m 59s) * 22:47 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2005.codfw.wmnet with OS bookworm * 22:37 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es2042.codfw.wmnet with OS bookworm * 22:37 bking@deploy2002: Started deploy [wdqs/wdqs@9927a5a]: 0.3.150 * 22:22 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-be2005.codfw.wmnet with OS bookworm * 22:18 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1092336{{!}}[GrowthExperiments] testwiki: Only enable Add Link for new accounts (T380204)]] (duration: 09m 14s) * 22:13 urbanecm@deploy2002: urbanecm: Continuing with sync * 22:13 urbanecm@deploy2002: urbanecm: Backport for [[gerrit:1092336{{!}}[GrowthExperiments] testwiki: Only enable Add Link for new accounts (T380204)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 22:09 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1092336{{!}}[GrowthExperiments] testwiki: Only enable Add Link for new accounts (T380204)]] * 21:58 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1092304{{!}}Use WAN cache for JsonConfig remote fetch cache (T374746)]], [[gerrit:1092300{{!}}Create no-link-recommendation variant (T377787 T380204)]], [[gerrit:1092295{{!}}[GrowthExperiments] testwiki: Enable no-link-recommendation experiment (T380204)]] (duration: 12m 10s) * 21:54 urbanecm@deploy2002: urbanecm, bvibber: Continuing with sync * 21:52 urbanecm@deploy2002: urbanecm, bvibber: Backport for [[gerrit:1092304{{!}}Use WAN cache for JsonConfig remote fetch cache (T374746)]], [[gerrit:1092300{{!}}Create no-link-recommendation variant (T377787 T380204)]], [[gerrit:1092295{{!}}[GrowthExperiments] testwiki: Enable no-link-recommendation experiment (T380204)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:48 effie: upload prometheus-mcrouter-exporter_0.4.0+git20241118-1~wmf1 - [[phab:T380212|T380212]] * 21:46 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1092304{{!}}Use WAN cache for JsonConfig remote fetch cache (T374746)]], [[gerrit:1092300{{!}}Create no-link-recommendation variant (T377787 T380204)]], [[gerrit:1092295{{!}}[GrowthExperiments] testwiki: Enable no-link-recommendation experiment (T380204)]] * 21:42 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002" * 21:36 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1091839{{!}}Rename everything referring to "SSO domain" to use "shared domain" (T379811)]], [[gerrit:1091841{{!}}Rename shared domain sso.wikimedia.org to auth.wikimedia.org (T379811)]], [[gerrit:1091842{{!}}Use DB name rather than server name in shared domain path prefix (T379811)]] (duration: 10m 54s) * 21:31 urbanecm@deploy2002: matmarex, urbanecm: Continuing with sync * 21:30 urbanecm@deploy2002: matmarex, urbanecm: Backport for [[gerrit:1091839{{!}}Rename everything referring to "SSO domain" to use "shared domain" (T379811)]], [[gerrit:1091841{{!}}Rename shared domain sso.wikimedia.org to auth.wikimedia.org (T379811)]], [[gerrit:1091842{{!}}Use DB name rather than server name in shared domain path prefix (T379811)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:29 urbanecm: Add bvibber to wmf-deployment Gerrit group (existing deployer) * 21:26 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1091839{{!}}Rename everything referring to "SSO domain" to use "shared domain" (T379811)]], [[gerrit:1091841{{!}}Rename shared domain sso.wikimedia.org to auth.wikimedia.org (T379811)]], [[gerrit:1091842{{!}}Use DB name rather than server name in shared domain path prefix (T379811)]] * 21:21 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps-test2001.codfw.wmnet with reason: host reimage * 21:18 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps-test2001.codfw.wmnet with reason: host reimage * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host es2046.codfw.wmnet with OS bookworm * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host es2045.codfw.wmnet with OS bookworm * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host es2044.codfw.wmnet with OS bookworm * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host es2043.codfw.wmnet with OS bookworm * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host es2042.codfw.wmnet with OS bookworm * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host es2041.codfw.wmnet with OS bookworm * 21:16 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host maps-test2002.codfw.wmnet with OS bookworm * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['es2042'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['es2042'] * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['es2041'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['es2041'] * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es2042.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es2044.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es2041.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:03 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2005.codfw.wmnet with OS bookworm * 21:01 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host thanos-be2005.codfw.wmnet with OS bookworm * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es2046.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:52 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2005.codfw.wmnet with OS bookworm * 20:51 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host thanos-be2005.codfw.wmnet with OS bullseye * 20:51 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2005.codfw.wmnet with OS bullseye * 20:49 jhathaway: disabling auto-reboot on re-imaging for debugging * 20:49 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host maps-test2001.codfw.wmnet with OS bookworm * 20:39 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host es2046.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:39 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host es2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:39 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host es2044.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:39 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host es2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:39 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host es2042.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:39 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host es2041.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:39 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host thanos-be2005.codfw.wmnet with OS bullseye * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding es2041 to codfw - jhancock@cumin2002" * 20:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding es2041 to codfw - jhancock@cumin2002" * 20:33 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:29 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2005.codfw.wmnet with OS bullseye * 20:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2037.codfw.wmnet with OS bullseye * 20:23 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 20:19 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 20:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2112.codfw.wmnet with OS bullseye * 20:19 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 20:14 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 20:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2113.codfw.wmnet with OS bullseye * 20:12 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 20:11 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 20:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2037.codfw.wmnet with reason: host reimage * 19:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2037.codfw.wmnet with reason: host reimage * 19:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2112.codfw.wmnet with reason: host reimage * 19:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 19:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 19:55 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 19:55 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@594d3b5]: [[phab:T377153|T377153]] Release glent 0.3.5 (duration: 00m 27s) * 19:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2113.codfw.wmnet with reason: host reimage * 19:54 ebernhardson@deploy2002: Started deploy [airflow-dags/search@594d3b5]: [[phab:T377153|T377153]] Release glent 0.3.5 * 19:52 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2112.codfw.wmnet with reason: host reimage * 19:51 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2113.codfw.wmnet with reason: host reimage * 19:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2163.codfw.wmnet with reason: host reimage * 19:36 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2112.codfw.wmnet with OS bullseye * 19:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2113.codfw.wmnet with OS bullseye * 19:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host restbase2037.codfw.wmnet with OS bullseye * 19:34 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2163.codfw.wmnet with reason: host reimage * 19:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2113'] * 19:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['restbase2037'] * 19:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2113'] * 19:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase2037'] * 19:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2113.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host restbase2037.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:22 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic2110.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2113.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2110.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host restbase2037.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:17 swfrench@deploy2002: Finished scap sync-world: Test deployment after adding mwdebug-next check command - [[phab:T372604|T372604]] (duration: 01m 31s) * 19:15 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 19:15 swfrench@deploy2002: Started scap sync-world: Test deployment after adding mwdebug-next check command - [[phab:T372604|T372604]] * 19:08 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply * 18:58 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply * 18:57 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 18:56 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 18:46 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply * 18:45 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply * 18:43 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply * 18:41 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1183.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:27 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply * 18:17 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply * 18:15 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply * 18:15 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply * 18:14 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply * 18:13 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply * 18:12 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host thanos-be2005.codfw.wmnet with OS bullseye * 18:09 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply * 18:08 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply * 18:04 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply * 18:03 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply * 18:03 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply * 18:01 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply * 17:53 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2005.codfw.wmnet with OS bullseye * 17:34 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply * 17:28 xcollazo@deploy2002: Finished deploy [airflow-dags/analytics@16a5867]: Deploy latest DAGs to analytics Airflow instance. [[phab:T368755|T368755]]. (duration: 02m 10s) * 17:25 xcollazo@deploy2002: Started deploy [airflow-dags/analytics@16a5867]: Deploy latest DAGs to analytics Airflow instance. [[phab:T368755|T368755]]. * 17:24 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply * 16:55 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:55 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: set DNS for new maps-test nodes - pt1979@cumin2002" * 16:55 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: set DNS for new maps-test nodes - pt1979@cumin2002" * 16:50 volans: installing spicerack v8.16.2 on cumin1002 * 16:50 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 16:38 volans: installing spicerack v8.16.2 on cumin2002 * 16:34 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[1305-1312].eqiad.wmnet * 16:34 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[1305-1312].eqiad.wmnet * 16:34 volans: uploaded spicerack_8.16.2 to apt.wikimedia.org bullseye-wikimedia * 16:30 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1311.eqiad.wmnet with OS bookworm * 16:25 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1310.eqiad.wmnet with OS bookworm * 16:22 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1312.eqiad.wmnet with OS bookworm * 16:19 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1306.eqiad.wmnet with OS bookworm * 16:16 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1308.eqiad.wmnet with OS bookworm * 16:14 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1309.eqiad.wmnet with OS bookworm * 16:13 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1005.eqiad.wmnet * 16:11 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1311.eqiad.wmnet with reason: host reimage * 16:10 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1307.eqiad.wmnet with OS bookworm * 16:08 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1305.eqiad.wmnet with OS bookworm * 16:07 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1310.eqiad.wmnet with reason: host reimage * 16:06 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-gp1005.eqiad.wmnet * 16:04 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1312.eqiad.wmnet with reason: host reimage * 16:01 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1306.eqiad.wmnet with reason: host reimage * 15:58 Lucas_WMDE: UTC afternoon backport+config window done * 15:58 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1092259{{!}}Unified dashboard: Add UI for page collection recommendations (T368718)]] (duration: 27m 17s) * 15:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1308.eqiad.wmnet with reason: host reimage * 15:56 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1312.eqiad.wmnet with reason: host reimage * 15:55 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1311.eqiad.wmnet with reason: host reimage * 15:54 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1309.eqiad.wmnet with reason: host reimage * 15:51 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1307.eqiad.wmnet with reason: host reimage * 15:51 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1310.eqiad.wmnet with reason: host reimage * 15:50 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1309.eqiad.wmnet with reason: host reimage * 15:49 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1308.eqiad.wmnet with reason: host reimage * 15:49 lucaswerkmeister-wmde@deploy2002: sbisson, lucaswerkmeister-wmde: Continuing with sync * 15:48 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1305.eqiad.wmnet with reason: host reimage * 15:48 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1307.eqiad.wmnet with reason: host reimage * 15:46 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1306.eqiad.wmnet with reason: host reimage * 15:45 lucaswerkmeister-wmde@deploy2002: sbisson, lucaswerkmeister-wmde: Backport for [[gerrit:1092259{{!}}Unified dashboard: Add UI for page collection recommendations (T368718)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:45 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1305.eqiad.wmnet with reason: host reimage * 15:36 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1312.eqiad.wmnet with OS bookworm * 15:36 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1311.eqiad.wmnet with OS bookworm * 15:31 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1310.eqiad.wmnet with OS bookworm * 15:31 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1309.eqiad.wmnet with OS bookworm * 15:31 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1092259{{!}}Unified dashboard: Add UI for page collection recommendations (T368718)]] * 15:30 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1308.eqiad.wmnet with OS bookworm * 15:29 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1307.eqiad.wmnet with OS bookworm * 15:27 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1306.eqiad.wmnet with OS bookworm * 15:26 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1305.eqiad.wmnet with OS bookworm * 15:11 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1091605{{!}}Revert "Allow other input and changes to trigger searchsuggestions to update" (T379983)]] (duration: 08m 14s) * 15:07 lucaswerkmeister-wmde@deploy2002: samtar, lucaswerkmeister-wmde: Continuing with sync * 15:06 lucaswerkmeister-wmde@deploy2002: samtar, lucaswerkmeister-wmde: Backport for [[gerrit:1091605{{!}}Revert "Allow other input and changes to trigger searchsuggestions to update" (T379983)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:03 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1091605{{!}}Revert "Allow other input and changes to trigger searchsuggestions to update" (T379983)]] * 15:00 arnaudb@cumin1002: dbctl commit (dc=all): 'manual depool commit', diff saved to https://phabricator.wikimedia.org/P71077 and previous config saved to /var/cache/conftool/dbconfig/20241118-150020-arnaudb.json * 14:59 arnaudb@cumin1002: dbctl commit (dc=all): 'manual repool commit', diff saved to https://phabricator.wikimedia.org/P71076 and previous config saved to /var/cache/conftool/dbconfig/20241118-145946-arnaudb.json * 14:56 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db2216 slowly with 10 steps - slow motion repool [[phab:T380131|T380131]] * 14:56 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db2216 slowly with 10 steps - slow motion repool [[phab:T380131|T380131]] * 14:52 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2150 slowly with 10 steps - slow repool db2150 [[phab:T380117|T380117]] * 14:32 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[1305-1312].eqiad.wmnet * 14:28 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[1305-1312].eqiad.wmnet * 14:16 claime: running homer 'cr*-eqiad' '[[phab:T379454|T379454]]' * 14:11 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1004.eqiad.wmnet * 14:09 btullis@cumin1002: END (PASS) - Cookbook sre.presto.roll-restart-workers (exit_code=0) for Presto an-presto cluster: Roll restart of all Presto's jvm daemons. * 14:04 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-gp1004.eqiad.wmnet * 13:50 jelto@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikidata-query-gui: apply * 13:49 jelto@deploy2002: helmfile [codfw] START helmfile.d/services/wikidata-query-gui: apply * 13:49 jelto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikidata-query-gui: apply * 13:48 jelto@deploy2002: helmfile [eqiad] START helmfile.d/services/wikidata-query-gui: apply * 13:47 jelto@deploy2002: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply * 13:46 jelto@deploy2002: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply * 13:37 jelto@deploy2002: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply * 13:37 jelto@deploy2002: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply * 13:35 jelto@deploy2002: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply * 13:35 jelto@deploy2002: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply * 13:35 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 13:34 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply * 13:34 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 13:33 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply * 13:31 jelto@deploy2002: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply * 13:31 jelto@deploy2002: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply * 13:31 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply * 13:30 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply * 13:28 jelto@deploy2002: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply * 13:28 jelto@deploy2002: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply * 13:27 btullis@cumin1002: START - Cookbook sre.presto.roll-restart-workers for Presto an-presto cluster: Roll restart of all Presto's jvm daemons. * 13:26 topranks: stopping netbox service on netbox-next test server to restore new database backup from production * 13:25 jelto@deploy2002: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply * 13:25 jelto@deploy2002: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply * 13:20 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1018.eqiad.wmnet with OS bullseye * 13:16 urbanecm: mwmaint2002: Run `extensions/GrowthExperiments/maintenance/refreshLinkRecommendations.php` at `testwiki` for a bunch of pages (P71064 is list of commands executed; [[phab:T378983|T378983]]) * 13:04 jelto@deploy2002: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply * 13:03 jelto@deploy2002: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply * 13:01 moritzm: removing ganeti1021 from active Ganeti nodes [[phab:T378921|T378921]] * 12:56 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1018.eqiad.wmnet with reason: host reimage * 12:54 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1018.eqiad.wmnet with reason: host reimage * 12:39 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host an-presto1018.eqiad.wmnet with OS bullseye * 12:38 btullis@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1018.eqiad.wmnet with OS bullseye * 12:38 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:37 kart_: Updated recommendation api to 2024-11-13-183159-production ([[phab:T379592|T379592]], [[phab:T379037|T379037]]) * 12:36 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db2150 slowly with 10 steps - slow repool db2150 [[phab:T380117|T380117]] * 12:36 cgoubert@cumin1002: START - Cookbook sre.dns.netbox * 12:24 kartik@deploy2002: helmfile [ml-serve-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 12:22 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host an-presto1018.eqiad.wmnet with OS bullseye * 12:22 kartik@deploy2002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 12:21 btullis@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1018.eqiad.wmnet with OS bullseye * 12:19 btullis@cumin1002: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid jvm daemons. * 12:15 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-product: apply * 12:14 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-product: apply * 12:13 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-ulsfo * 12:13 kartik@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 12:10 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 12:09 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-product: apply * 12:08 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host an-presto1018.eqiad.wmnet with OS bullseye * 12:02 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-product: apply * 12:00 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 11:59 elukey@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:59 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 11:58 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:58 elukey@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1021.eqiad.wmnet * 11:45 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:45 elukey@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:41 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 11:41 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2216.codfw.wmnet with reason: [[phab:T380131|T380131]] - table corruption * 11:41 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2216.codfw.wmnet with reason: [[phab:T380131|T380131]] - table corruption * 11:41 elukey@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 11:41 urbanecm: mwmaint2002: Run `extensions/GrowthExperiments/maintenance/refreshLinkRecommendations.php` at `testwiki` for a bunch of pages (P71064 is list of commands executed; [[phab:T378983|T378983]]) * 11:33 btullis@cumin1002: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid jvm daemons. * 11:25 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:25 elukey@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:21 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:16 elukey@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:50 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:50 elukey@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:50 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:49 elukey@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:46 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply * 10:46 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply * 10:45 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:45 elukey@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:43 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:43 elukey@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:41 dcausse@deploy2002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply * 10:41 dcausse@deploy2002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply * 10:39 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:37 elukey@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:27 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:27 elukey@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:15 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:14 fabfur: upgrade haproxy on cp-ulsfo ([[phab:T379891|T379891]]) * 10:14 elukey@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:14 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-ulsfo * 10:13 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:13 elukey@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:47 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply * 09:47 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply * 09:42 moritzm: restarting nginx on acmechief hosts to pick up openssl updates * 09:24 moritzm: installing openssl security updates * 09:18 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:17 elukey@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:57 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1091932{{!}}Enable the Contribute menu in 2nd group of Wikis (T375300)]] (duration: 11m 45s) * 08:55 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 40850 * 08:55 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 40850 * 08:53 kartik@deploy2002: kartik: Continuing with sync * 08:49 kartik@deploy2002: kartik: Backport for [[gerrit:1091932{{!}}Enable the Contribute menu in 2nd group of Wikis (T375300)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:45 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1091932{{!}}Enable the Contribute menu in 2nd group of Wikis (T375300)]] * 08:44 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on registry1004.eqiad.wmnet with reason: testing * 08:44 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on registry1004.eqiad.wmnet with reason: testing * 08:43 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1091912{{!}}bjnwikiquote: Add local logo (T375054)]] (duration: 22m 55s) * 08:31 kartik@deploy2002: kartik, hamishz: Continuing with sync * 08:30 kartik@deploy2002: kartik, hamishz: Backport for [[gerrit:1091912{{!}}bjnwikiquote: Add local logo (T375054)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:20 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1091912{{!}}bjnwikiquote: Add local logo (T375054)]] * 08:07 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1021.eqiad.wmnet * 08:07 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1021.eqiad.wmnet * 08:05 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1021.eqiad.wmnet * 08:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1021.eqiad.wmnet * 08:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1021.eqiad.wmnet * 08:01 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1021.eqiad.wmnet * 07:56 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1021.eqiad.wmnet * 07:54 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1020.eqiad.wmnet * 07:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1020.eqiad.wmnet * 07:51 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1020.eqiad.wmnet * 07:47 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1020.eqiad.wmnet * 07:46 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:46 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:46 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 07:46 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 06:31 kart_: Updated MinT to 2024-10-16-065051-production on eqiad * 06:28 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply * 06:19 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply == 2024-11-17 == * 16:41 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2216.codfw.wmnet with reason: Sad * 16:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db2216.codfw.wmnet with reason: Sad * 16:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2216 sad', diff saved to https://phabricator.wikimedia.org/P71059 and previous config saved to /var/cache/conftool/dbconfig/20241117-163522-ladsgroup.json == 2024-11-16 == * 20:30 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-jumbo1017.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:29 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-jumbo1016.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:29 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-jumbo1018.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:09 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:09 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for wikikube-worker - jclark@cumin1002" * 18:08 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for wikikube-worker - jclark@cumin1002" * 18:06 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1183.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:05 jclark@cumin1002: START - Cookbook sre.dns.netbox * 18:01 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:59 jclark@cumin1002: START - Cookbook sre.dns.netbox * 17:59 jclark@cumin1002: START - Cookbook sre.hosts.provision for host kafka-jumbo1018.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:56 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-jumbo1018.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:56 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:56 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for wikikube-worker - jclark@cumin1002" * 17:56 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for wikikube-worker - jclark@cumin1002" * 17:55 jclark@cumin1002: START - Cookbook sre.hosts.provision for host kafka-jumbo1016.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:55 jclark@cumin1002: START - Cookbook sre.hosts.provision for host kafka-jumbo1017.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:53 jclark@cumin1002: START - Cookbook sre.hosts.provision for host kafka-jumbo1018.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:52 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1313.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:52 jclark@cumin1002: START - Cookbook sre.dns.netbox * 17:50 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:50 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for wikikube-worker - jclark@cumin1002" * 17:50 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for wikikube-worker - jclark@cumin1002" * 17:45 jclark@cumin1002: START - Cookbook sre.dns.netbox * 17:14 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1323.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:11 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1327.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:11 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1327.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:09 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:09 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for wikikube-worker - jclark@cumin1002" * 17:09 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for wikikube-worker - jclark@cumin1002" * 17:08 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1313.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:05 jclark@cumin1002: START - Cookbook sre.dns.netbox * 17:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1327.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:01 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1326.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:57 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1321.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:55 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1324.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:54 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1322.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:54 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1320.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:53 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1325.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:52 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1319.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:52 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1316.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:51 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1318.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:50 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1315.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:49 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1317.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:49 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1314.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:42 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1326.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:42 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1327.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:36 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1323.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:36 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1324.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:36 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1322.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:36 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1321.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:36 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1320.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1325.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:32 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1318.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:32 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1317.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:32 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1316.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:31 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1315.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:31 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1314.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:31 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1319.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:30 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:30 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for wikikube-worker - jclark@cumin1002" * 16:30 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for wikikube-worker - jclark@cumin1002" * 16:27 jclark@cumin1002: START - Cookbook sre.dns.netbox * 00:44 tzatziki: removing 103 files for legal compliance == 2024-11-15 == * 23:42 tzatziki: removing 1 file for legal compliance * 23:19 tzatziki: removing 3 files for legal compliance * 22:34 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2112.codfw.wmnet with OS bullseye * 21:59 Dreamy_Jazz: Started MediaModeration scan on all wikis other than commonswiki attempting to scan all failed to be scanned images - https://wikitech.wikimedia.org/wiki/MediaModeration * 21:59 Dreamy_Jazz: Started MediaModeration scan on commons wiki attempting to scan all failed to be scanned images - https://wikitech.wikimedia.org/wiki/MediaModeration * 21:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2115.codfw.wmnet with OS bullseye * 21:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:56 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2114.codfw.wmnet with OS bullseye * 21:53 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:53 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:51 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2111.codfw.wmnet with OS bullseye * 21:50 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2115.codfw.wmnet with reason: host reimage * 21:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2038.codfw.wmnet with OS bullseye * 21:35 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2114.codfw.wmnet with reason: host reimage * 21:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2036.codfw.wmnet with OS bullseye * 21:35 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:33 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2111.codfw.wmnet with reason: host reimage * 21:30 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2115.codfw.wmnet with reason: host reimage * 21:30 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2114.codfw.wmnet with reason: host reimage * 21:30 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2111.codfw.wmnet with reason: host reimage * 21:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:14 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2115.codfw.wmnet with OS bullseye * 21:14 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2114.codfw.wmnet with OS bullseye * 21:14 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2112.codfw.wmnet with OS bullseye * 21:14 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2111.codfw.wmnet with OS bullseye * 21:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2038.codfw.wmnet with reason: host reimage * 21:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2115'] * 21:13 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2115'] * 21:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2114'] * 21:12 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2114'] * 21:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2112'] * 21:12 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2112'] * 21:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2111'] * 21:12 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2111'] * 21:11 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2110'] * 21:11 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic2113.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2036.codfw.wmnet with reason: host reimage * 21:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2114.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2111.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2038.codfw.wmnet with reason: host reimage * 21:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2115.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2112.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2036.codfw.wmnet with reason: host reimage * 21:04 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic2110.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:56 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2115.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:56 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2114.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:56 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2113.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:56 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2112.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:56 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2111.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:56 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2110.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding elastic2110 to codfw - jhancock@cumin2002" * 20:54 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding elastic2110 to codfw - jhancock@cumin2002" * 20:50 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:45 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host restbase2038.codfw.wmnet with OS bullseye * 20:45 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host restbase2036.codfw.wmnet with OS bullseye * 20:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['restbase2036'] * 20:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['restbase2038'] * 20:43 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase2038'] * 20:43 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase2036'] * 20:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host restbase2038.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host restbase2036.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:41 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host restbase2037 * 20:40 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host restbase2037 * 20:40 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host restbase2037.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host restbase2038.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host restbase2037.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host restbase2036.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:31 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding restbase2036 to codfw - jhancock@cumin2002" * 20:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding restbase2036 to codfw - jhancock@cumin2002" * 20:27 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 19:54 dancy@deploy2002: Finished scap sync-world: Testing [[phab:T377883|T377883]] (duration: 03m 06s) * 19:51 dancy@deploy2002: Started scap sync-world: Testing [[phab:T377883|T377883]] * 19:50 dancy@deploy2002: Installation of scap version "4.124.0" completed for 206 hosts * 19:46 dancy@deploy2002: Installing scap version "4.124.0" for 206 hosts * 18:53 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 18:52 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 18:35 cjming@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply * 18:34 cjming@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply * 18:32 cjming@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply * 18:31 cjming@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply * 18:15 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 18:15 elukey@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 18:09 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 18:08 elukey@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:58 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@82083c4]: image suggestions hotfix - section titles denylist dependency (duration: 01m 58s) * 16:57 taavi: copy python3-flask-<nowiki>{</nowiki>keystone,oslolog<nowiki>}</nowiki> from bullseye-wikimedia to bookworm-wikimedia * 16:56 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@82083c4]: image suggestions hotfix - section titles denylist dependency * 16:27 herron@cumin2002: conftool action : set/pooled=yes; selector: name=aux-k8s-worker1005.eqiad.wmnet,cluster=aux-k8s,service=kubesvc * 16:27 herron@cumin2002: conftool action : set/weight=10; selector: name=aux-k8s-worker1005.eqiad.wmnet,cluster=aux-k8s,service=kubesvc * 16:22 herron@cumin2002: conftool action : set/pooled=yes; selector: name=aux-k8s-worker1004.eqiad.wmnet,cluster=aux-k8s,service=kubesvc * 16:22 herron@cumin2002: conftool action : set/weight=10; selector: name=aux-k8s-worker1004.eqiad.wmnet,cluster=aux-k8s,service=kubesvc * 16:09 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp4043.ulsfo.wmnet [reason: ATS fixed] * 16:08 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp4043.ulsfo.wmnet * 16:08 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for cp4043.ulsfo.wmnet * 16:06 sukhe@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on P<nowiki>{</nowiki>cp4051*<nowiki>}</nowiki> and A:cp for 9.2.6-1wm2 * 16:03 sukhe@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on P<nowiki>{</nowiki>cp4051*<nowiki>}</nowiki> and A:cp for 9.2.6-1wm2 * 16:00 sukhe: reprepro -C main include bullseye-wikimedia trafficserver_9.2.6-1wm2_amd64.changes: [[phab:T379797|T379797]] * 15:47 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on db2230.codfw.wmnet,db1125.eqiad.wmnet with reason: testing stuff on test-s4 * 15:47 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on db2230.codfw.wmnet,db1125.eqiad.wmnet with reason: testing stuff on test-s4 * 15:42 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from eqiad to codfw * 15:41 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.finalize for the switch from eqiad to codfw * 15:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from codfw to eqiad * 15:39 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.finalize for the switch from codfw to eqiad * 15:39 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from codfw to eqiad * 15:38 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-platform-eng: apply * 15:38 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from codfw to eqiad * 15:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-platform-eng: apply * 15:35 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 15:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 13:59 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:59 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove e8 lo0 IP - ayounsi@cumin1002" * 13:59 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove e8 lo0 IP - ayounsi@cumin1002" * 13:55 ayounsi@cumin1002: START - Cookbook sre.dns.netbox * 13:55 ayounsi@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 13:52 ayounsi@cumin1002: START - Cookbook sre.dns.netbox * 13:41 XioNoX: test no-passwords on mr1-eqsin - [[phab:T379464|T379464]] * 13:31 ayounsi@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts sretest1004.eqiad.wmnet * 13:31 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:31 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: sretest1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ayounsi@cumin1002" * 13:31 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: sretest1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ayounsi@cumin1002" * 13:27 ayounsi@cumin1002: START - Cookbook sre.dns.netbox * 13:24 cmooney@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1002.eqiad.wmnet with reason: Update homer wmf-plugin to export Netbox ipsec data - cmooney@cumin1002 * 13:23 ayounsi@cumin1002: START - Cookbook sre.hosts.decommission for hosts sretest1004.eqiad.wmnet * 13:21 cmooney@cumin1002: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1002.eqiad.wmnet with reason: Update homer wmf-plugin to export Netbox ipsec data - cmooney@cumin1002 * 13:19 cmooney@cumin1002: END (FAIL) - Cookbook sre.deploy.python-code (exit_code=99) homer to cumin2002.codfw.wmnet,cumin1002.eqiad.wmnet with reason: Update homer wmf-plugin to export Netbox ipsec data - cmooney@cumin1002 * 13:17 cmooney@cumin1002: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1002.eqiad.wmnet with reason: Update homer wmf-plugin to export Netbox ipsec data - cmooney@cumin1002 * 13:01 moritzm: imported 8u432-b06-2~deb12u1 to component/jdk8 for bookworm (forward port of the latest Java 8 security fixes for Bookworm) * 12:54 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host build2002.codfw.wmnet * 12:54 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host build2002.codfw.wmnet with OS bookworm * 12:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on build2002.codfw.wmnet with reason: host reimage * 12:32 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on build2002.codfw.wmnet with reason: host reimage * 12:27 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics: apply * 12:26 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics: apply * 12:19 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics: apply * 12:18 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 12:17 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host build2002.codfw.wmnet with OS bookworm * 12:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 12:16 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM build2002.codfw.wmnet - jmm@cumin2002" * 12:15 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM build2002.codfw.wmnet - jmm@cumin2002" * 12:15 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) build2002.codfw.wmnet on all recursors * 12:15 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache build2002.codfw.wmnet on all recursors * 12:15 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:15 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM build2002.codfw.wmnet - jmm@cumin2002" * 12:11 cmooney@cumin1002: END (FAIL) - Cookbook sre.netbox.update-extras (exit_code=1) rolling restart_daemons on A:netbox * 12:11 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM build2002.codfw.wmnet - jmm@cumin2002" * 12:08 aokoth@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Security Update * 12:03 jmm@cumin2002: START - Cookbook sre.dns.netbox * 12:03 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host build2002.codfw.wmnet * 12:01 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox * 12:01 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.resource-report (exit_code=0) * 12:01 jmm@cumin2002: START - Cookbook sre.ganeti.resource-report * 12:00 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary * 11:58 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary * 11:38 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@2c533d6]: hotfix image suggestions weekly snapshots (duration: 00m 57s) * 11:37 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@2c533d6]: hotfix image suggestions weekly snapshots * 11:27 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 11:24 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[1305-1312].eqiad.wmnet * 11:24 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[1305-1312].eqiad.wmnet * 11:22 claime: homer 'lsw1-f5-eqiad*' commit '[[phab:T377022|T377022]]' * 11:22 claime: homer 'lsw1-f6-eqiad*' commit '[[phab:T377022|T377022]]' * 11:22 elukey@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 11:21 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 11:21 claime: homer 'lsw1-f7-eqiad*' commit '[[phab:T377022|T377022]]' * 11:21 elukey@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 11:20 claime: homer 'lsw1-e7-eqiad*' commit '[[phab:T377022|T377022]]' * 11:20 claime: homer 'lsw1-e6-eqiad*' commit '[[phab:T377022|T377022]]' * 11:19 claime: homer 'lsw1-e5-eqiad*' commit '[[phab:T377022|T377022]]' * 11:15 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:14 elukey@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:12 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:12 elukey@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:06 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:06 elukey@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:05 claime: homer 'cr*eqiad*' commit '[[phab:T377022|T377022]]' * 10:36 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:36 elukey@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:36 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:34 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:34 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:31 elukey@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:28 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:28 elukey@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:28 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.provision (exit_code=97) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:27 elukey@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:23 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:23 elukey@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:22 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:21 elukey@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:15 aokoth@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Security Update * 08:48 moritzm: installing Linux 6.1.115 kernel updates from Bookworm point release * 04:54 rzl@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 12:00:00 on db1246.eqiad.wmnet with reason: depooled * 04:54 rzl@cumin2002: START - Cookbook sre.hosts.downtime for 3 days, 12:00:00 on db1246.eqiad.wmnet with reason: depooled * 04:51 rzl@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 12:00:00 on db1246.eqiad.wmnet with reason: depooled * 04:50 rzl@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 12:00:00 on db1246.eqiad.wmnet with reason: depooled * 04:47 rzl@cumin2002: dbctl commit (dc=all): 'db1246 depooled', diff saved to https://phabricator.wikimedia.org/P71052 and previous config saved to /var/cache/conftool/dbconfig/20241115-044705-rzl.json * 03:44 ejegg: fundraising python tools upgraded from {{Gerrit|c6e2dbcc}} to {{Gerrit|b230f718}} == 2024-11-14 == * 23:17 eileen: civicrm upgraded from {{Gerrit|2a53f697}} to {{Gerrit|d49a064d}} * 22:59 eileen: civicrm upgraded from {{Gerrit|2ab8334a}} to {{Gerrit|2a53f697}} * 22:37 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp4043.ulsfo.wmnet with reason: ATS upgrade 9.2.6 * 22:37 brett@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp4043.ulsfo.wmnet with reason: ATS upgrade 9.2.6 * 22:30 ryankemper: [[phab:T376150|T376150]] Depooled `wdqs20[18-20]` in preparation of merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/1088185 * 21:49 aqu@deploy2002: Finished deploy [airflow-dags/analytics@7a66849]: Stage Refine: fix Airflow skip (duration: 00m 59s) * 21:48 aqu@deploy2002: Started deploy [airflow-dags/analytics@7a66849]: Stage Refine: fix Airflow skip * 21:47 aqu@deploy2002: Finished deploy [airflow-dags/analytics_test@7a66849]: Stage Refine: fix Airflow skip (duration: 00m 14s) * 21:47 aqu@deploy2002: Started deploy [airflow-dags/analytics_test@7a66849]: Stage Refine: fix Airflow skip * 21:26 aqu@deploy2002: Finished deploy [airflow-dags/analytics_test@2220747]: Stage Refine test fix (duration: 00m 16s) * 21:26 aqu@deploy2002: Started deploy [airflow-dags/analytics_test@2220747]: Stage Refine test fix * 21:20 cjming: end of UTC late backport window * 21:17 cjming@deploy2002: Finished scap sync-world: Backport for [[gerrit:1082853{{!}}Redirect to wikis using subpages rather than namespaces too (T376923)]] (duration: 13m 44s) * 21:13 cjming@deploy2002: cjming, pppery: Continuing with sync * 21:08 cjming@deploy2002: cjming, pppery: Backport for [[gerrit:1082853{{!}}Redirect to wikis using subpages rather than namespaces too (T376923)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:04 cjming@deploy2002: Started scap sync-world: Backport for [[gerrit:1082853{{!}}Redirect to wikis using subpages rather than namespaces too (T376923)]] * 20:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2139.codfw.wmnet with OS bookworm * 20:47 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 20:38 bvibber@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 20:37 bvibber@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 20:37 bvibber@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 20:36 bvibber@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 20:35 bvibber@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 20:35 bvibber@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 20:29 swfrench@cumin2002: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) * 20:28 swfrench@cumin2002: START - Cookbook sre.discovery.datacenter * 20:24 bvibber@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 20:24 bvibber@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 20:24 bvibber@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 20:24 bvibber@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 20:23 bvibber@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 20:23 bvibber@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 20:23 swfrench@cumin2002: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all active/active services in eqiad: Network maintenance complete - None * 20:01 swfrench@cumin2002: START - Cookbook sre.discovery.datacenter pool all active/active services in eqiad: Network maintenance complete - None * 19:55 brennen@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.3 refs [[phab:T375662|T375662]] * 19:40 eileen: tools upgraded from {{Gerrit|68f64e43}} to {{Gerrit|c6e2dbcc}} * 19:37 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site eqiad [reason: junos upgrade done, [[phab:T364092|T364092]]] * 19:37 sukhe@cumin1002: START - Cookbook sre.dns.admin DNS admin: pool site eqiad [reason: junos upgrade done, [[phab:T364092|T364092]]] * 19:20 James_F: Running `mwscript-k8s -f -- extensions/WikiLambda/maintenance/updateSecondaryTables.php --wiki=wikifunctionswiki --zType Z8 --report --verbose` for [[phab:T375972|T375972]], [[phab:T367005|T367005]], [[phab:T373038|T373038]], [[phab:T358737|T358737]] * 19:19 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-restart-ntp (exit_code=0) rolling restart_daemons on A:dnsbox * 19:14 swfrench@cumin2002: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) * 19:14 swfrench@cumin2002: START - Cookbook sre.discovery.datacenter * 19:14 swfrench-wmf: running sre.discovery.datacenter status all to test deployed fix * 19:00 brennen: 1.44.0-wmf.3 train status ([[phab:T375662|T375662]]): no current blockers, but holding for network maintenance. * 18:20 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1312.eqiad.wmnet with OS bullseye * 18:19 swfrench@cumin2002: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) * 18:18 swfrench@cumin2002: START - Cookbook sre.discovery.datacenter * 18:16 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1310.eqiad.wmnet with OS bullseye * 18:13 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cp4043.ulsfo.wmnet with reason: depooled, debugging * 18:13 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on cp4043.ulsfo.wmnet with reason: depooled, debugging * 18:11 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:09 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1311.eqiad.wmnet with OS bullseye * 18:05 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1308.eqiad.wmnet with OS bullseye * 18:04 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1190 gradually with 4 steps - Maint over * 18:02 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1309.eqiad.wmnet with OS bullseye * 18:01 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1312.eqiad.wmnet with reason: host reimage * 17:59 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1307.eqiad.wmnet with OS bullseye * 17:57 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1310.eqiad.wmnet with reason: host reimage * 17:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2139.codfw.wmnet with reason: host reimage * 17:52 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1306.eqiad.wmnet with OS bullseye * 17:49 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1311.eqiad.wmnet with reason: host reimage * 17:46 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1308.eqiad.wmnet with reason: host reimage * 17:45 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1312.eqiad.wmnet with reason: host reimage * 17:45 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2139.codfw.wmnet with reason: host reimage * 17:44 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1311.eqiad.wmnet with reason: host reimage * 17:43 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1310.eqiad.wmnet with reason: host reimage * 17:42 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1309.eqiad.wmnet with reason: host reimage * 17:39 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1309.eqiad.wmnet with reason: host reimage * 17:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1307.eqiad.wmnet with reason: host reimage * 17:37 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1308.eqiad.wmnet with reason: host reimage * 17:37 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1307.eqiad.wmnet with reason: host reimage * 17:32 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1306.eqiad.wmnet with reason: host reimage * 17:29 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1306.eqiad.wmnet with reason: host reimage * 17:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2139.codfw.wmnet with OS bookworm * 17:26 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1312.eqiad.wmnet with OS bullseye * 17:25 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1311.eqiad.wmnet with OS bullseye * 17:25 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1310.eqiad.wmnet with OS bullseye * 17:24 swfrench@cumin2002: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) status all services in all: None - None * 17:24 swfrench@cumin2002: START - Cookbook sre.discovery.datacenter status all services in all: None - None * 17:21 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1309.eqiad.wmnet with OS bullseye * 17:19 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1308.eqiad.wmnet with OS bullseye * 17:19 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool db1190 gradually with 4 steps - Maint over * 17:18 swfrench@cumin2002: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) depool all active/active services in eqiad: Network maintenance - None * 17:18 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1307.eqiad.wmnet with OS bullseye * 17:15 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=4043.ulsfo.wmnet * 17:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:13 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 17:13 elukey@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 17:10 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1306.eqiad.wmnet with OS bullseye * 16:59 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1305.eqiad.wmnet with OS bullseye * 16:57 swfrench@cumin2002: START - Cookbook sre.discovery.datacenter depool all active/active services in eqiad: Network maintenance - None * 16:52 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@7c4873e]: decouple article-level image suggestions from section-level ones (duration: 00m 53s) * 16:51 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@7c4873e]: decouple article-level image suggestions from section-level ones * 16:45 swfrench@cumin2002: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) status all services in all: None - None * 16:45 swfrench@cumin2002: START - Cookbook sre.discovery.datacenter status all services in all: None - None * 16:40 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1305.eqiad.wmnet with reason: host reimage * 16:38 swfrench@cumin2002: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) * 16:37 swfrench@cumin2002: START - Cookbook sre.discovery.datacenter * 16:36 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1305.eqiad.wmnet with reason: host reimage * 16:36 swfrench@cumin2002: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) * 16:36 swfrench@cumin2002: START - Cookbook sre.discovery.datacenter * 16:33 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1190.eqiad.wmnet with reason: Sad * 16:33 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1190.eqiad.wmnet with reason: Sad * 16:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1190 sad', diff saved to https://phabricator.wikimedia.org/P71044 and previous config saved to /var/cache/conftool/dbconfig/20241114-163317-ladsgroup.json * 16:31 klausman@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'. * 16:31 klausman@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'. * 16:18 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1305.eqiad.wmnet with OS bullseye * 16:04 cmooney@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 151575 * 16:03 cmooney@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 151575 * 16:01 papaul: ongoing maintenance on cr1-eqiad * 16:00 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:57 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cr1-eqiad,cr1-eqiad IPV6,re0.cr1-eqiad.mgmt with reason: router upgrade * 15:57 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cr1-eqiad,cr1-eqiad IPV6,re0.cr1-eqiad.mgmt with reason: router upgrade * 15:56 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cp4043.ulsfo.wmnet with reason: depooled, debugging * 15:56 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on cp4043.ulsfo.wmnet with reason: depooled, debugging * 15:55 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cr1-eqiad,cr1-eqiad IPV6,cr1-eqiad.mgmt with reason: router upgrade * 15:55 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cr1-eqiad,cr1-eqiad IPV6,cr1-eqiad.mgmt with reason: router upgrade * 15:49 moritzm: installing nss security updates * 15:48 reedy@deploy2002: Synchronized wmf-config/CommonSettings.php: [[phab:T379834|T379834]] (duration: 08m 02s) * 15:47 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=cp4043.ulsfo.wmnet * 15:47 sukhe@cumin1002: END (ERROR) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=97) Rolling upgrade/restart of Apache Traffic Server on P<nowiki>{</nowiki>cp4043*,cp4051*<nowiki>}</nowiki> and A:cp for 9.2.6-1wm1 * 15:45 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wikikube-ctrl2002.codfw.wmnet * 15:45 jayme@cumin2002: START - Cookbook sre.hosts.remove-downtime for wikikube-ctrl2002.codfw.wmnet * 15:45 jayme@cumin2002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-ctrl2002.codfw.wmnet * 15:45 jayme@cumin2002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-ctrl2002.codfw.wmnet * 15:43 pt1979@cumin2002: END (PASS) - Cookbook sre.network.cf (exit_code=0) * 15:43 pt1979@cumin2002: START - Cookbook sre.network.cf * 15:42 sukhe@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on P<nowiki>{</nowiki>cp4043*,cp4051*<nowiki>}</nowiki> and A:cp for 9.2.6-1wm1 * 15:40 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1016.eqiad.wmnet with OS bullseye * 15:39 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1020.eqiad.wmnet with OS bullseye * 15:37 volans: installed spicerack v8.16.1 to cumin hosts * 15:36 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site eqiad [reason: junos upgrade, [[phab:T364092|T364092]]] * 15:36 sukhe@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site eqiad [reason: junos upgrade, [[phab:T364092|T364092]]] * 15:35 ladsgroup@deploy2002: Finished scap sync-world: Backport for [[gerrit:1091248{{!}}Revert "mmv.js: Store comingFromHashChange as a class property" (T379835)]] (duration: 12m 10s) * 15:33 sukhe: reprepro -C main include bullseye-wikimedia trafficserver_9.2.6-1wm1_amd64.changes: [[phab:T379797|T379797]] * 15:30 sukhe@cumin1002: START - Cookbook sre.dns.roll-restart-ntp rolling restart_daemons on A:dnsbox * 15:29 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-ctrl2002.codfw.wmnet with reason: [[phab:T379719|T379719]] * 15:29 jayme@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-ctrl2002.codfw.wmnet with reason: [[phab:T379719|T379719]] * 15:28 jayme@cumin2002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-ctrl2002.codfw.wmnet * 15:28 jayme@cumin2002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-ctrl2002.codfw.wmnet * 15:27 ladsgroup@deploy2002: ladsgroup: Continuing with sync * 15:27 ladsgroup@deploy2002: ladsgroup: Backport for [[gerrit:1091248{{!}}Revert "mmv.js: Store comingFromHashChange as a class property" (T379835)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:24 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 15:24 elukey@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 15:24 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-restart (exit_code=0) rolling restart_daemons on A:dnsbox and not A:magru and A:dnsbox * 15:23 ladsgroup@deploy2002: Started scap sync-world: Backport for [[gerrit:1091248{{!}}Revert "mmv.js: Store comingFromHashChange as a class property" (T379835)]] * 15:16 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-search: apply * 15:15 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-search: apply * 15:07 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 15:07 sergi0: UTC afternoon deploys done * 15:06 sgimeno@deploy2002: Finished scap sync-world: Backport for [[gerrit:1091231{{!}}HomepageHooks: run metrics increment in deferred update (T379682)]] (duration: 11m 15s) * 15:02 elukey@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 15:02 sgimeno@deploy2002: sgimeno: Continuing with sync * 14:59 sgimeno@deploy2002: sgimeno: Backport for [[gerrit:1091231{{!}}HomepageHooks: run metrics increment in deferred update (T379682)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:55 sgimeno@deploy2002: Started scap sync-world: Backport for [[gerrit:1091231{{!}}HomepageHooks: run metrics increment in deferred update (T379682)]] * 14:53 volans: uploaded spicerack_8.16.1 to apt.wikimedia.org bullseye-wikimedia * 14:50 sgimeno@deploy2002: Finished scap sync-world: Backport for [[gerrit:1090830{{!}}GrowthExperiments: set experiment config only in pilot wikis (T379681)]] (duration: 13m 02s) * 14:45 sgimeno@deploy2002: sgimeno: Continuing with sync * 14:41 sgimeno@deploy2002: sgimeno: Backport for [[gerrit:1090830{{!}}GrowthExperiments: set experiment config only in pilot wikis (T379681)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:37 sgimeno@deploy2002: Started scap sync-world: Backport for [[gerrit:1090830{{!}}GrowthExperiments: set experiment config only in pilot wikis (T379681)]] * 14:33 sukhe@cumin1002: START - Cookbook sre.dns.roll-restart rolling restart_daemons on A:dnsbox and not A:magru and A:dnsbox * 14:30 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-restart (exit_code=0) rolling restart_daemons on A:dnsbox and A:magru and A:dnsbox * 14:27 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1091227{{!}}CX3 Build 0.2.0+20241114]] (duration: 13m 23s) * 14:25 sukhe@cumin1002: START - Cookbook sre.dns.roll-restart rolling restart_daemons on A:dnsbox and A:magru and A:dnsbox * 14:22 kartik@deploy2002: kartik: Continuing with sync * 14:18 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns (exit_code=0) rolling restart_daemons on A:wikidough and A:wikidough * 14:17 kartik@deploy2002: kartik: Backport for [[gerrit:1091227{{!}}CX3 Build 0.2.0+20241114]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:13 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1091227{{!}}CX3 Build 0.2.0+20241114]] * 14:05 sukhe@cumin1002: START - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns rolling restart_daemons on A:wikidough and A:wikidough * 13:50 aqu@deploy2002: Finished deploy [airflow-dags/analytics@2220747]: Stage Refine parallelization improvment [airflow-dags@2220747d] (duration: 01m 08s) * 13:49 aqu@deploy2002: Started deploy [airflow-dags/analytics@2220747]: Stage Refine parallelization improvment [airflow-dags@2220747d] * 13:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti7004.magru.wmnet * 13:36 aqu@deploy2002: Finished deploy [airflow-dags/analytics_test@2220747]: Stage Refine parallelization improvment [airflow-dags@2220747d] (duration: 00m 15s) * 13:36 aqu@deploy2002: Started deploy [airflow-dags/analytics_test@2220747]: Stage Refine parallelization improvment [airflow-dags@2220747d] * 13:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti7004.magru.wmnet * 13:21 kcvelaga@deploy2002: Finished deploy [airflow-dags/analytics_product@c5ab766]: [[phab:T379546|T379546]] (duration: 00m 54s) * 13:21 kcvelaga@deploy2002: Started deploy [airflow-dags/analytics_product@c5ab766]: [[phab:T379546|T379546]] * 13:19 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Fix search button height - oblivian@cumin1002" * 13:18 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix search button height - oblivian@cumin1002 * 13:18 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix search button height - oblivian@cumin1002 * 13:18 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Fix search button height - oblivian@cumin1002" * 13:05 jayme@cumin2002: END (PASS) - Cookbook sre.k8s.reimage-stacked-control-plane (exit_code=0) Reimaging k8s control planes of cluster wikikube-codfw: containerd migration * 13:04 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-ctrl2003.codfw.wmnet with OS bookworm * 12:54 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas (exit_code=0) rolling restart_daemons on A:schema-eqiad * 12:53 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas rolling restart_daemons on A:schema-eqiad * 12:53 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti7004.magru.wmnet * 12:52 moritzm: installing apache2 security updates * 12:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti7004.magru.wmnet * 12:51 dreamyjazz@deploy2002: Finished scap sync-world: Backport for [[gerrit:1090511{{!}}Hide IP reveal tools on Special:AbuseLog and Special:GlobalBlockList (T379583)]] (duration: 09m 08s) * 12:49 moritzm: failover ganeti master of magru02 to ganeti7002 * 12:46 dreamyjazz@deploy2002: dreamyjazz: Continuing with sync * 12:45 dreamyjazz@deploy2002: dreamyjazz: Backport for [[gerrit:1090511{{!}}Hide IP reveal tools on Special:AbuseLog and Special:GlobalBlockList (T379583)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 12:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti7002.magru.wmnet * 12:42 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-ctrl2003.codfw.wmnet with reason: host reimage * 12:41 dreamyjazz@deploy2002: Started scap sync-world: Backport for [[gerrit:1090511{{!}}Hide IP reveal tools on Special:AbuseLog and Special:GlobalBlockList (T379583)]] * 12:38 jayme@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-ctrl2003.codfw.wmnet with reason: host reimage * 12:35 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti7002.magru.wmnet * 12:29 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti7002.magru.wmnet * 12:25 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti7002.magru.wmnet * 12:22 jayme@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2003.codfw.wmnet with OS bookworm * 12:19 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas (exit_code=0) rolling restart_daemons on A:schema-codfw * 12:18 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas rolling restart_daemons on A:schema-codfw * 12:17 jayme@cumin2002: START - Cookbook sre.k8s.reimage-stacked-control-plane Reimaging k8s control planes of cluster wikikube-codfw: containerd migration * 12:10 jmm@cumin2002: END (PASS) - Cookbook sre.cdn.roll-restart-reboot-ncredir (exit_code=0) rolling restart_daemons on A:ncredir * 12:00 jmm@cumin2002: START - Cookbook sre.cdn.roll-restart-reboot-ncredir rolling restart_daemons on A:ncredir * 11:57 moritzm: restarting postfix on inbound/outbound servers to pick up openssl updates * 11:17 moritzm: installing openssl security updates * 11:08 jayme@cumin2002: END (PASS) - Cookbook sre.k8s.reimage-stacked-control-plane (exit_code=0) Reimaging k8s control planes of cluster wikikube-codfw: containerd migration * 11:08 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-ctrl2001.codfw.wmnet with OS bookworm * 10:47 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production * 10:45 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-ctrl2001.codfw.wmnet with reason: host reimage * 10:44 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production * 10:42 jayme@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-ctrl2001.codfw.wmnet with reason: host reimage * 10:16 moritzm: remove ganeti2017 from active ganeti nodes [[phab:T376594|T376594]] * 10:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2017.codfw.wmnet * 10:11 jayme@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2001.codfw.wmnet with OS bookworm * 10:07 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@34b35a5] (releasing): (no justification provided) (duration: 00m 47s) * 10:06 jayme@cumin2002: START - Cookbook sre.k8s.reimage-stacked-control-plane Reimaging k8s control planes of cluster wikikube-codfw: containerd migration * 10:06 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@34b35a5] (releasing): (no justification provided) * 10:03 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@34b35a5] (releasing): (no justification provided) (duration: 00m 21s) * 10:03 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@34b35a5] (releasing): (no justification provided) * 09:43 kart_: Done: UTC morning backport window * 09:37 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1090988{{!}}Correction to virtual-globaljsonlinks mapping (T374746)]] (duration: 10m 03s) * 09:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 09:36 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 09:35 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 09:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 09:32 kartik@deploy2002: bvibber, kartik: Continuing with sync * 09:31 kartik@deploy2002: bvibber, kartik: Backport for [[gerrit:1090988{{!}}Correction to virtual-globaljsonlinks mapping (T374746)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 09:27 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1090988{{!}}Correction to virtual-globaljsonlinks mapping (T374746)]] * 09:25 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1091007{{!}}CX3 Build 0.2.0+20241113 (T368718 T374567)]] (duration: 29m 40s) * 09:21 kartik@deploy2002: kartik: Continuing with sync * 09:17 volans: installed spicerack v8.16.0 on cumin2002 * 09:08 vgutierrez@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P<nowiki>{</nowiki>cp4044.ulsfo.wmnet,cp4052.ulsfo.wmnet<nowiki>}</nowiki> and A:cp * 09:04 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P<nowiki>{</nowiki>cp4044.ulsfo.wmnet,cp4052.ulsfo.wmnet<nowiki>}</nowiki> and A:cp * 09:00 kartik@deploy2002: kartik: Backport for [[gerrit:1091007{{!}}CX3 Build 0.2.0+20241113 (T368718 T374567)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:56 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1091007{{!}}CX3 Build 0.2.0+20241113 (T368718 T374567)]] * 08:55 vgutierrez: import haproxy 2.8.12 to thirtdparty/haproxy28 component for bullseye-wikimedia (apt.wm.o) - [[phab:T379891|T379891]] * 08:54 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1090937{{!}}Allow Wikidata bureaucrats to remove admin rights (T379635)]] (duration: 11m 49s) * 08:49 kartik@deploy2002: dreamrimmer, kartik: Continuing with sync * 08:47 kartik@deploy2002: dreamrimmer, kartik: Backport for [[gerrit:1090937{{!}}Allow Wikidata bureaucrats to remove admin rights (T379635)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:42 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1090937{{!}}Allow Wikidata bureaucrats to remove admin rights (T379635)]] * 08:38 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 26744 * 08:37 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 26744 * 08:35 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 141082 * 08:35 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 141082 * 08:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 9299 * 08:33 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 9299 * 08:33 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 140407 * 08:33 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 140407 * 08:28 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084704{{!}}Update stream registration and config for MinT for Readers (T378565)]] (duration: 24m 50s) * 08:23 kartik@deploy2002: kcvelaga, kartik: Continuing with sync * 08:08 kartik@deploy2002: kcvelaga, kartik: Backport for [[gerrit:1084704{{!}}Update stream registration and config for MinT for Readers (T378565)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:03 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1084704{{!}}Update stream registration and config for MinT for Readers (T378565)]] * 07:42 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2017.codfw.wmnet * 07:41 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2017.codfw.wmnet * 07:34 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2017.codfw.wmnet * 07:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove office link dns records - ayounsi@cumin1002" * 07:34 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove office link dns records - ayounsi@cumin1002" * 07:30 ayounsi@cumin1002: START - Cookbook sre.dns.netbox * 07:06 XioNoX: delete office interco IP/prefixes/vlan in ulsfo - [[phab:T379778|T379778]] * 04:34 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 04:11 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 04:09 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 03:56 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 02:32 eileen: config revision changed from {{Gerrit|7af5769b}} to {{Gerrit|fbddc1f5}} * 02:29 eileen: civicrm upgraded from {{Gerrit|7b300007}} to {{Gerrit|2ab8334a}} * 00:14 eileen: config revision changed from {{Gerrit|2b08b881}} to {{Gerrit|7af5769b}} * 00:13 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es1046.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:13 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:12 eileen: civicrm upgraded from {{Gerrit|23e08fc2}} to {{Gerrit|7b300007}} * 00:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es1043.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es1042.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es1045.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es1041.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2024-11-13 == * 23:45 jclark@cumin1002: START - Cookbook sre.hosts.provision for host es1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:43 jclark@cumin1002: START - Cookbook sre.hosts.provision for host es1042.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:43 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host es1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:43 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host es1042.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:42 jclark@cumin1002: START - Cookbook sre.hosts.provision for host es1046.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:42 jclark@cumin1002: START - Cookbook sre.hosts.provision for host es1045.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:42 jclark@cumin1002: START - Cookbook sre.hosts.provision for host es1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:42 jclark@cumin1002: START - Cookbook sre.hosts.provision for host es1043.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:42 jclark@cumin1002: START - Cookbook sre.hosts.provision for host es1042.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:42 jclark@cumin1002: START - Cookbook sre.hosts.provision for host es1041.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for es104 - jclark@cumin1002" * 23:41 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for es104 - jclark@cumin1002" * 23:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs1027.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs1026.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs1025.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:37 jclark@cumin1002: START - Cookbook sre.dns.netbox * 23:20 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bookworm * 23:04 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:04 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for wikikube-worker - jclark@cumin1002" * 23:04 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for wikikube-worker - jclark@cumin1002" * 22:59 jclark@cumin1002: START - Cookbook sre.dns.netbox * 22:58 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wdqs1025.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:58 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wdqs1026.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:58 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wdqs1027.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:57 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:55 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:30 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:25 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 22:21 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 22:20 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 22:20 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 22:19 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 22:18 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 22:17 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 22:14 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:11 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 22:11 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 22:10 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 22:10 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 22:09 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:04 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 22:03 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 22:00 tchanders@deploy2002: Finished scap sync-world: Backport for [[gerrit:1090965{{!}}Revert "Disallow AbuseFilter protected variables use on non-temp-user wikis" (T379503)]] (duration: 09m 03s) * 21:55 tchanders@deploy2002: tchanders: Continuing with sync * 21:55 tchanders@deploy2002: tchanders: Backport for [[gerrit:1090965{{!}}Revert "Disallow AbuseFilter protected variables use on non-temp-user wikis" (T379503)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:51 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1090965{{!}}Revert "Disallow AbuseFilter protected variables use on non-temp-user wikis" (T379503)]] * 21:48 cjming@deploy2002: Finished scap sync-world: Backport for [[gerrit:1090953{{!}}Enable autocreateaccount on testcommonswiki (T378216)]] (duration: 12m 59s) * 21:44 cjming@deploy2002: aude, cjming: Continuing with sync * 21:40 cjming@deploy2002: aude, cjming: Backport for [[gerrit:1090953{{!}}Enable autocreateaccount on testcommonswiki (T378216)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:36 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bookworm * 21:36 cjming@deploy2002: Started scap sync-world: Backport for [[gerrit:1090953{{!}}Enable autocreateaccount on testcommonswiki (T378216)]] * 21:34 cjming@deploy2002: Finished scap sync-world: Backport for [[gerrit:1090928{{!}}GlobalJsonLinksCachePurgeJob to actually invalidate caches (T374746)]] (duration: 13m 27s) * 21:27 cjming@deploy2002: cjming, bvibber: Continuing with sync * 21:27 cjming@deploy2002: cjming, bvibber: Backport for [[gerrit:1090928{{!}}GlobalJsonLinksCachePurgeJob to actually invalidate caches (T374746)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:21 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:21 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 21:20 cjming@deploy2002: Started scap sync-world: Backport for [[gerrit:1090928{{!}}GlobalJsonLinksCachePurgeJob to actually invalidate caches (T374746)]] * 21:19 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 21:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 21:15 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 21:09 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:09 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:09 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:09 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:07 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:07 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:07 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host thanos-be2005 * 21:07 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host thanos-be2005 * 21:05 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 21:05 jclark@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 21:01 aqu@deploy2002: Finished deploy [airflow-dags/analytics@3487da3]: Stage Refine [airflow-dags@3487da3a] (duration: 01m 22s) * 21:00 aqu@deploy2002: Started deploy [airflow-dags/analytics@3487da3]: Stage Refine [airflow-dags@3487da3a] * 20:56 aqu@deploy2002: Finished deploy [airflow-dags/analytics@3fc12d6]: Stage Refine [airflow-dags@3fc12d60] (duration: 01m 14s) * 20:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 20:56 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 20:55 aqu@deploy2002: Started deploy [airflow-dags/analytics@3fc12d6]: Stage Refine [airflow-dags@3fc12d60] * 20:49 cdanis@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply * 20:49 cdanis@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply * 20:48 swfrench-wmf: deployed changeprop to clear no-op chart version diffs from CR {{Gerrit|1089313}} * 20:47 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply * 20:47 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply * 20:46 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 20:39 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bookworm * 20:37 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply * 20:37 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply * 20:36 cdanis@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply * 20:36 cdanis@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply * 20:35 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply * 20:34 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply * 20:34 aqu@deploy2002: Finished deploy [airflow-dags/analytics_test@3fc12d6]: Stage Refine [airflow-dags@3fc12d60] (duration: 00m 15s) * 20:34 aqu@deploy2002: Started deploy [airflow-dags/analytics_test@3fc12d6]: Stage Refine [airflow-dags@3fc12d60] * 20:31 cdanis@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply * 20:31 cdanis@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply * 20:28 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 20:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 20:16 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 20:14 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 20:02 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 20:02 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 19:59 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 19:59 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 19:59 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host thanos-be2005 * 19:59 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host thanos-be2005 * 19:58 cdanis@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply * 19:58 cdanis@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply * 19:58 brennen@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.3 refs [[phab:T375662|T375662]] (duration: 31m 07s) * 19:57 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 19:55 cdanis@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply * 19:55 cdanis@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply * 19:52 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 19:51 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:51 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding thanos-be2005 to codfw - jhancock@cumin2002" * 19:51 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding thanos-be2005 to codfw - jhancock@cumin2002" * 19:47 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 19:47 cdanis@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply * 19:46 cdanis@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply * 19:44 aokoth@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Security Update * 19:37 aokoth@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Security Update * 19:36 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 19:35 aokoth@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Security Update * 19:27 brennen@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.3 refs [[phab:T375662|T375662]] * 19:26 brennen@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.3 refs [[phab:T375662|T375662]] * 19:21 aokoth@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Security Update * 19:13 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host thanos-be1005.eqiad.wmnet with OS bullseye * 19:11 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 19:10 jclark@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 19:10 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 19:10 jclark@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 19:09 brennen: 1.44.0-wmf.3 train status ([[phab:T375662|T375662]]): no current blockers, rolling to group1. * 19:08 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/hdfs-synchronizer: apply * 19:03 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 19:03 jclark@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 19:02 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 19:02 jclark@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 19:01 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 19:01 jclark@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 19:00 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:00 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for thanos-be1005 - jclark@cumin1002" * 19:00 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for thanos-be1005 - jclark@cumin1002" * 18:58 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/hdfs-synchronizer: apply * 18:56 jclark@cumin1002: START - Cookbook sre.dns.netbox * 18:50 swfrench@deploy2002: Finished scap sync-world: Deployment to switch mwdebug-next to publish-81 - [[phab:T372604|T372604]] (duration: 01m 53s) * 18:48 swfrench@deploy2002: Started scap sync-world: Deployment to switch mwdebug-next to publish-81 - [[phab:T372604|T372604]] * 18:36 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply * 18:33 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply * 18:32 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply * 18:30 cdanis@deploy2002: Finished deploy [docker-pkg/deploy@3499887]: I really hope this works this time (duration: 00m 34s) * 18:29 cdanis@deploy2002: Started deploy [docker-pkg/deploy@3499887]: I really hope this works this time * 18:29 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply * 18:26 cdanis@deploy2002: Finished deploy [docker-pkg/deploy@9d71ac3]: (no justification provided) (duration: 00m 18s) * 18:26 cdanis@deploy2002: Started deploy [docker-pkg/deploy@9d71ac3]: (no justification provided) * 18:22 cdanis@deploy2002: Finished deploy [docker-pkg/deploy@9d71ac3]: (no justification provided) (duration: 00m 40s) * 18:21 cdanis@deploy2002: Started deploy [docker-pkg/deploy@9d71ac3]: (no justification provided) * 18:21 cdanis@deploy2002: Finished deploy [docker-pkg/deploy@9d71ac3]: deploy 4.0.2 for realsies (duration: 02m 41s) * 18:18 cdanis@deploy2002: Started deploy [docker-pkg/deploy@9d71ac3]: deploy 4.0.2 for realsies * 18:13 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 18:13 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 3:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 18:11 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 17:54 urbanecm: mwmaint2002: foreachwikiindblist growthexperiments extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --search-index --verbose --random # [[phab:T379057|T379057]] * 17:49 cdanis@deploy2002: Finished deploy [docker-pkg/deploy@38eb04d]: ship upstream_version helper (duration: 00m 32s) * 17:49 cdanis@deploy2002: Started deploy [docker-pkg/deploy@38eb04d]: ship upstream_version helper * 17:49 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:47 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:46 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:45 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 17:40 jayme@cumin1002: conftool action : set/pooled=yes; selector: name=wikikube-ctrl2002.codfw.wmnet * 17:39 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wikikube-ctrl2002.codfw.wmnet * 17:39 jayme@cumin2002: START - Cookbook sre.hosts.remove-downtime for wikikube-ctrl2002.codfw.wmnet * 17:38 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-ctrl2002.codfw.wmnet with OS bookworm * 17:37 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:35 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 17:33 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:32 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 17:23 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2128-2135].codfw.wmnet * 17:23 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2128-2135].codfw.wmnet * 17:20 claime: homer 'lsw1-d2-codfw*' commit '[[phab:T377008|T377008]]' * 17:18 claime: homer 'lsw1-c2-codfw*' commit '[[phab:T377008|T377008]]' * 17:18 claime: homer 'lsw1-d4-codfw*' commit '[[phab:T377008|T377008]]' * 17:17 claime: homer 'lsw1-c4-codfw*' commit '[[phab:T377008|T377008]]' * 17:15 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:14 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-ctrl2002.codfw.wmnet with reason: host reimage * 17:11 jayme@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-ctrl2002.codfw.wmnet with reason: host reimage * 17:03 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bullseye * 17:02 claime: homer 'cr*codfw*' commit [[phab:T377008|T377008]] * 17:01 claime: homer 'lsw1-b4-codfw*' commit [[phab:T377008|T377008]] * 17:01 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 16:58 claime: homer 'lsw1-b2-codfw*' commit [[phab:T377008|T377008]] * 16:53 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthbook: apply * 16:53 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-ctrl2002 * 16:53 jayme@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-ctrl2002 * 16:53 jayme@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-ctrl2002 * 16:53 jayme@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-ctrl2002.codfw.wmnet 76.32.192.10.in-addr.arpa 6.7.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 16:53 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply * 16:53 jayme@cumin2002: START - Cookbook sre.dns.wipe-cache wikikube-ctrl2002.codfw.wmnet 76.32.192.10.in-addr.arpa 6.7.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 16:53 jayme@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:53 jayme@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-ctrl2002 - jayme@cumin2002" * 16:53 jayme@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-ctrl2002 - jayme@cumin2002" * 16:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2135.codfw.wmnet with OS bookworm * 16:49 jayme@cumin2002: START - Cookbook sre.dns.netbox * 16:48 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2134.codfw.wmnet with OS bookworm * 16:47 jayme@cumin2002: START - Cookbook sre.hosts.move-vlan for host wikikube-ctrl2002 * 16:47 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthbook: apply * 16:47 jayme@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2002.codfw.wmnet with OS bookworm * 16:47 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply * 16:41 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on wikikube-ctrl2002.codfw.wmnet with reason: reimage * 16:40 jayme@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wikikube-ctrl2002.codfw.wmnet with reason: reimage * 16:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti7003.magru.wmnet * 16:31 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 16:31 jayme@cumin2002: conftool action : set/pooled=inactive; selector: name=wikikube-ctrl2002.codfw.wmnet * 16:30 elukey: reload nginx on registry* to pick up logging changes (log of X-Client-IP from the CDN) * 16:30 XioNoX: shutdown old office link interface - [[phab:T379778|T379778]] * 16:29 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2133.codfw.wmnet with OS bookworm * 16:29 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 16:29 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti7003.magru.wmnet * 16:26 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 16:25 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 16:24 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2132.codfw.wmnet with OS bookworm * 16:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti7003.magru.wmnet * 16:14 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti7003.magru.wmnet * 16:08 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 16:08 sukhe: running agent on A:ulsfo and A:lvs * 16:07 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2135.codfw.wmnet with OS bookworm * 16:06 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2134.codfw.wmnet with OS bookworm * 16:05 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 16:04 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 16:02 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 15:56 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2131.codfw.wmnet with OS bookworm * 15:53 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2130.codfw.wmnet with OS bookworm * 15:47 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 15:47 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 15:45 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/hdfs-synchronizer: apply * 15:45 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2133.codfw.wmnet with OS bookworm * 15:42 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2132.codfw.wmnet with OS bookworm * 15:37 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2129.codfw.wmnet with OS bookworm * 15:37 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 15:36 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:35 moritzm: failover ganeti master of magru01 to ganeti7001 * 15:34 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 15:33 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 15:33 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 15:33 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:30 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 15:30 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:30 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for IPs moving from old to new fundraising firewalls - cmooney@cumin1002" * 15:30 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for IPs moving from old to new fundraising firewalls - cmooney@cumin1002" * 15:30 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 15:26 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 15:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti7001.magru.wmnet * 15:18 moritzm: installing apache2 security updates * 15:18 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 15:15 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2131.codfw.wmnet with OS bookworm * 15:15 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 15:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti7001.magru.wmnet * 15:14 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2128.codfw.wmnet with OS bookworm * 15:12 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2130.codfw.wmnet with OS bookworm * 14:59 volans: uploaded spicerack_8.16.0 to apt.wikimedia.org bullseye-wikimedia * 14:57 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2129.codfw.wmnet with OS bookworm * 14:56 aqu@deploy2002: Finished deploy [airflow-dags/analytics_test@2eb8320]: Stage Refine [airflow-dags@2eb8320d] (duration: 00m 14s) * 14:55 aqu@deploy2002: Started deploy [airflow-dags/analytics_test@2eb8320]: Stage Refine [airflow-dags@2eb8320d] * 14:55 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:51 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:51 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti7001.magru.wmnet * 14:50 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti7001.magru.wmnet * 14:37 moritzm: installing openssl security updates * 14:36 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2131.codfw.wmnet with OS bookworm * 14:36 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2130.codfw.wmnet with OS bookworm * 14:35 Lucas_WMDE: UTC afternoon backport+config window done * 14:33 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2128.codfw.wmnet with OS bookworm * 14:32 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1090526{{!}}TimedMediahandler: reenable shellbox-video for commons (T356241)]] (duration: 07m 28s) * 14:30 btullis@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-jumbo-eqiad * 14:27 lucaswerkmeister-wmde@deploy2002: hnowlan, lucaswerkmeister-wmde: Continuing with sync * 14:27 lucaswerkmeister-wmde@deploy2002: hnowlan, lucaswerkmeister-wmde: Backport for [[gerrit:1090526{{!}}TimedMediahandler: reenable shellbox-video for commons (T356241)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:26 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-research: apply * 14:25 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-research: apply * 14:24 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1090526{{!}}TimedMediahandler: reenable shellbox-video for commons (T356241)]] * 14:21 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-research: apply * 14:21 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-research: apply * 14:15 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2128.codfw.wmnet with OS bookworm * 14:14 tchanders@deploy2002: Finished scap sync-world: Backport for [[gerrit:1090515{{!}}Disallow AbuseFilter protected variables use on non-temp-user wikis (T379503)]] (duration: 11m 28s) * 14:12 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-research: apply * 14:10 tchanders@deploy2002: tchanders: Continuing with sync * 14:09 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-research: apply * 14:07 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/ipoid: apply * 14:07 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1052.eqiad.wmnet to cluster eqiad and group D * 14:07 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/ipoid: apply * 14:06 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1052.eqiad.wmnet to cluster eqiad and group D * 14:06 tchanders@deploy2002: tchanders: Backport for [[gerrit:1090515{{!}}Disallow AbuseFilter protected variables use on non-temp-user wikis (T379503)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:03 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1090515{{!}}Disallow AbuseFilter protected variables use on non-temp-user wikis (T379503)]] * 14:03 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/ipoid: apply * 14:02 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/ipoid: apply * 14:01 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply * 14:01 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply * 14:00 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wmde: apply * 13:59 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wmde: apply * 13:32 btullis@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-jumbo-eqiad * 13:21 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wmde: apply * 13:20 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 13:18 moritzm: installing python-cryptography security updates * 13:18 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wmde: apply * 13:18 btullis@cumin1002: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) restart masters for Hadoop test cluster: Restart of jvm daemons. * 13:17 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 13:14 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wmde: apply * 13:13 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wmde: apply * 13:12 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 13:11 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 13:09 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 13:08 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 13:08 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 13:07 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 13:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 13:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 13:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:03 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 12:59 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2129.codfw.wmnet with OS bookworm * 12:56 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 12:56 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 12:55 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2128.codfw.wmnet with OS bookworm * 12:54 cgoubert@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-worker2128.codfw.wmnet with OS bookworm * 12:45 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2128.codfw.wmnet with OS bookworm * 12:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1022 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P71030 and previous config saved to /var/cache/conftool/dbconfig/20241113-124504-ladsgroup.json * 12:44 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2128.codfw.wmnet with OS bookworm * 12:33 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1051.eqiad.wmnet to cluster eqiad and group D * 12:32 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2131.codfw.wmnet with OS bookworm * 12:32 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1051.eqiad.wmnet to cluster eqiad and group D * 12:31 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wmde: apply * 12:31 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2130.codfw.wmnet with OS bookworm * 12:30 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wmde: apply * 12:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1022', diff saved to https://phabricator.wikimedia.org/P71029 and previous config saved to /var/cache/conftool/dbconfig/20241113-122957-ladsgroup.json * 12:29 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2129.codfw.wmnet with OS bookworm * 12:29 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp5017.eqsin.wmnet * 12:28 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2128.codfw.wmnet with OS bookworm * 12:28 btullis@cumin1002: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid test cluster: Roll restart of Druid jvm daemons. * 12:18 btullis@cumin1002: START - Cookbook sre.druid.roll-restart-workers for Druid test cluster: Roll restart of Druid jvm daemons. * 12:15 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/zotero: apply * 12:15 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/zotero: apply * 12:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1022', diff saved to https://phabricator.wikimedia.org/P71028 and previous config saved to /var/cache/conftool/dbconfig/20241113-121450-ladsgroup.json * 12:14 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/zotero: apply * 12:14 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/zotero: apply * 12:13 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/zotero: apply * 12:13 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/zotero: apply * 12:11 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/zotero: apply * 12:11 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/zotero: apply * 12:06 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 12:06 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply * 12:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1052.eqiad.wmnet * 12:03 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 12:03 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply * 12:01 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply * 11:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1022 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P71027 and previous config saved to /var/cache/conftool/dbconfig/20241113-115943-ladsgroup.json * 11:57 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/ipoid: apply * 11:57 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/ipoid: apply * 11:57 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply * 11:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1052.eqiad.wmnet * 11:57 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply * 11:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1051.eqiad.wmnet * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1052 * 11:54 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1052 * 11:52 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wmde: apply * 11:51 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wmde: apply * 11:51 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 11:50 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 11:49 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 11:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1022 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P71026 and previous config saved to /var/cache/conftool/dbconfig/20241113-114913-ladsgroup.json * 11:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1051.eqiad.wmnet * 11:49 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1022.eqiad.wmnet with reason: Maintenance * 11:48 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1022.eqiad.wmnet with reason: Maintenance * 11:48 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1051 * 11:46 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 11:46 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1051 * 11:45 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 11:41 jayme@cumin2002: END (PASS) - Cookbook sre.k8s.reimage-stacked-control-plane (exit_code=0) Reimaging k8s control planes of cluster wikikube-eqiad: containerd migration * 11:41 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-ctrl1003.eqiad.wmnet with OS bookworm * 11:34 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 11:34 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 11:26 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on wikikube-worker1256.eqiad.wmnet with reason: Degraded RAID * 11:26 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on wikikube-worker1256.eqiad.wmnet with reason: Degraded RAID * 11:25 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker1256.eqiad.wmnet * 11:25 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker1256.eqiad.wmnet * 11:19 btullis@cumin1002: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid test cluster: Roll restart of Druid jvm daemons. * 11:18 btullis@cumin1002: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop test cluster: Restart of jvm daemons. * 11:17 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-ctrl1003.eqiad.wmnet with reason: host reimage * 11:14 jayme@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-ctrl1003.eqiad.wmnet with reason: host reimage * 11:10 btullis@cumin1002: START - Cookbook sre.druid.roll-restart-workers for Druid test cluster: Roll restart of Druid jvm daemons. * 11:09 btullis@cumin1002: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid public cluster: Roll restart of Druid jvm daemons. * 10:42 ladsgroup@deploy2002: Finished scap sync-world: Backport for [[gerrit:1090809{{!}}Set the ratio of the new ParserCache keys to 100 for prod (T373037)]] (duration: 07m 32s) * 10:37 ladsgroup@deploy2002: ladsgroup: Continuing with sync * 10:36 ladsgroup@deploy2002: ladsgroup: Backport for [[gerrit:1090809{{!}}Set the ratio of the new ParserCache keys to 100 for prod (T373037)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 10:35 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet * 10:34 ladsgroup@deploy2002: Started scap sync-world: Backport for [[gerrit:1090809{{!}}Set the ratio of the new ParserCache keys to 100 for prod (T373037)]] * 10:32 btullis@cumin1002: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0) restart workers for Hadoop test cluster: Roll restart of jvm daemons for openjdk upgrade. * 10:27 jayme@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1003.eqiad.wmnet with OS bookworm * 10:26 ladsgroup@deploy2002: ladsgroup: Continuing with sync * 10:26 jayme@cumin2002: START - Cookbook sre.k8s.reimage-stacked-control-plane Reimaging k8s control planes of cluster wikikube-eqiad: containerd migration * 10:24 jayme@cumin2002: END (PASS) - Cookbook sre.k8s.reimage-stacked-control-plane (exit_code=0) Reimaging k8s control planes of cluster wikikube-eqiad: containerd migration * 10:24 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-ctrl1002.eqiad.wmnet with OS bookworm * 10:21 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet * 10:20 btullis@cumin1002: START - Cookbook sre.hadoop.roll-restart-workers restart workers for Hadoop test cluster: Roll restart of jvm daemons for openjdk upgrade. * 10:20 ladsgroup@deploy2002: ladsgroup: Backport for [[gerrit:1090809{{!}}Set the ratio of the new ParserCache keys to 100 for prod (T373037)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 10:18 btullis@cumin1002: START - Cookbook sre.druid.roll-restart-workers for Druid public cluster: Roll restart of Druid jvm daemons. * 10:17 ladsgroup@deploy2002: Started scap sync-world: Backport for [[gerrit:1090809{{!}}Set the ratio of the new ParserCache keys to 100 for prod (T373037)]] * 10:09 elukey: disallow calls to /v2/_catalog from the outside internet on Docker Registry hosts - [[phab:T378618|T378618]] * 10:04 claime: Manual restart of dump_cloud_ip_ranges.service on 'A:puppetserver or A:puppetmaster' * 10:01 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-ctrl1002.eqiad.wmnet with reason: host reimage * 10:01 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2088.codfw.wmnet with OS bullseye * 10:00 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 10:00 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 09:55 jayme@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-ctrl1002.eqiad.wmnet with reason: host reimage * 09:41 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2088.codfw.wmnet with reason: host reimage * 09:38 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2088.codfw.wmnet with reason: host reimage * 09:25 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2088.codfw.wmnet with OS bullseye * 09:20 jayme@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1002.eqiad.wmnet with OS bookworm * 09:20 jayme@cumin2002: START - Cookbook sre.k8s.reimage-stacked-control-plane Reimaging k8s control planes of cluster wikikube-eqiad: containerd migration * 09:11 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2088.codfw.wmnet with OS bullseye * 09:01 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2088.codfw.wmnet with OS bullseye * 08:54 kart_: Updated recommedation-api to 2024-11-08-142328-production and fix wikidata host header ([[phab:T379592|T379592]]) * 08:49 kartik@deploy2002: helmfile [ml-serve-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 08:49 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2088.codfw.wmnet with OS bullseye * 08:46 kartik@deploy2002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 08:33 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2088.codfw.wmnet with reason: host reimage * 08:27 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2088.codfw.wmnet with reason: host reimage * 08:14 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2088.codfw.wmnet with OS bullseye * 08:13 ladsgroup@deploy2002: Finished scap sync-world: Backport for [[gerrit:1090493{{!}}Revert "cswiki: Add celebration logo"]] (duration: 09m 18s) * 08:08 ladsgroup@deploy2002: ladsgroup, hamishz: Continuing with sync * 08:07 ladsgroup@deploy2002: ladsgroup, hamishz: Backport for [[gerrit:1090493{{!}}Revert "cswiki: Add celebration logo"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:06 kartik@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 08:04 ladsgroup@deploy2002: Started scap sync-world: Backport for [[gerrit:1090493{{!}}Revert "cswiki: Add celebration logo"]] * 07:47 Amir1: running extensions/Echo/maintenance/removeOrphanedEvents.php --force on all wikis ([[phab:T308084|T308084]]) * 05:17 eileen: civicrm upgraded from {{Gerrit|ad008134}} to {{Gerrit|23e08fc2}} * 02:56 tchin@deploy2002: Finished deploy [airflow-dags/analytics@58d7b82]: (no justification provided) (duration: 00m 10s) * 02:56 tchin@deploy2002: Started deploy [airflow-dags/analytics@58d7b82]: (no justification provided) * 02:55 tchin@deploy2002: deploy aborted: failedpythonlol (duration: 00m 05s) * 02:55 tchin@deploy2002: Started deploy [airflow-dags/analytics@58d7b82]: failedpythonlol * 00:54 tchin@deploy2002: Started deploy [airflow-dags/analytics@58d7b82]: (no justification provided) * 00:35 ejegg: payments-wiki upgraded from {{Gerrit|7d24a942}} to {{Gerrit|459f259b}} == 2024-11-12 == * 23:28 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 23:11 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 23:08 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:35 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:11 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 21:55 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 21:55 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 21:28 ebysans@deploy2002: Finished deploy [airflow-dags/analytics@58d7b82]: (no justification provided) (duration: 03m 50s) * 21:27 SandraEbele_: deploying airflow as part of weekly deployment train * 21:27 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088770{{!}}Fix warning about missing central account for temp users (T378289)]], [[gerrit:1088771{{!}}Check session provider when autocreating (T378289)]] (duration: 16m 11s) * 21:25 ebysans@deploy2002: Started deploy [airflow-dags/analytics@58d7b82]: (no justification provided) * 21:23 SandraEbele_: Deployed refinery using scap, then deployed onto hdfs * 21:22 urbanecm@deploy2002: urbanecm, tgr: Continuing with sync * 21:22 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 21:13 urbanecm@deploy2002: urbanecm, tgr: Backport for [[gerrit:1088770{{!}}Fix warning about missing central account for temp users (T378289)]], [[gerrit:1088771{{!}}Check session provider when autocreating (T378289)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:11 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1088770{{!}}Fix warning about missing central account for temp users (T378289)]], [[gerrit:1088771{{!}}Check session provider when autocreating (T378289)]] * 21:09 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1090550{{!}}Revert^2 "[CirrusSearch] testwiki: enable offloading weighted tags via EventBus" (T378983)]] (duration: 07m 18s) * 21:04 ebysans@deploy2002: Finished deploy [analytics/refinery@113ea5a] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@113ea5ac] (duration: 04m 09s) * 21:02 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1090550{{!}}Revert^2 "[CirrusSearch] testwiki: enable offloading weighted tags via EventBus" (T378983)]] * 20:59 ebysans@deploy2002: Started deploy [analytics/refinery@113ea5a] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@113ea5ac] * 20:59 ebysans@deploy2002: Finished deploy [analytics/refinery@113ea5a] (thin): Regular analytics weekly train THIN [analytics/refinery@113ea5ac] (duration: 04m 54s) * 20:54 ebysans@deploy2002: Started deploy [analytics/refinery@113ea5a] (thin): Regular analytics weekly train THIN [analytics/refinery@113ea5ac] * 20:53 ebysans@deploy2002: Finished deploy [analytics/refinery@113ea5a]: Regular analytics weekly train [analytics/refinery@113ea5ac] (duration: 07m 37s) * 20:49 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . * 20:46 ebysans@deploy2002: Started deploy [analytics/refinery@113ea5a]: Regular analytics weekly train [analytics/refinery@113ea5ac] * 19:42 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wikikube-ctrl1001.eqiad.wmnet * 19:42 jayme@cumin2002: START - Cookbook sre.hosts.remove-downtime for wikikube-ctrl1001.eqiad.wmnet * 19:42 jayme@cumin2002: conftool action : set/pooled=yes; selector: name=wikikube-ctrl1001.* * 19:40 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-ctrl1001.eqiad.wmnet with OS bookworm * 19:16 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-ctrl1001.eqiad.wmnet with reason: host reimage * 19:14 brennen@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.3 refs [[phab:T375662|T375662]] * 19:13 jayme@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-ctrl1001.eqiad.wmnet with reason: host reimage * 19:06 brennen: 1.44.0-wmf.3 train status ([[phab:T375662|T375662]]): no current blockers, rolling to group0. * 18:55 moritzm: installing libarchive security updates * 18:55 jayme@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1001.eqiad.wmnet with OS bookworm * 18:31 swfrench@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087604{{!}}Add title-case mapping to support migration to PHP 8.1 (T372603)]] (duration: 18m 48s) * 18:25 swfrench@deploy2002: swfrench: Continuing with sync * 18:24 swfrench-wmf: verified consistent 7.4-like title-case behavior in 7.4- and 8.1-based images, verified expected treatment of eszett in mwdebug - [[phab:T372603|T372603]] * 18:19 swfrench@deploy2002: swfrench: Backport for [[gerrit:1087604{{!}}Add title-case mapping to support migration to PHP 8.1 (T372603)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 18:12 swfrench@deploy2002: Started scap sync-world: Backport for [[gerrit:1087604{{!}}Add title-case mapping to support migration to PHP 8.1 (T372603)]] * 18:08 jayme@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl1001.eqiad.wmnet with OS bookworm * 18:01 moritzm: remove ganeti1012 from active ganeti nodes [[phab:T378921|T378921]] * 17:59 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:57 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 17:57 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 17:35 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:34 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 17:26 brennen@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.3 refs [[phab:T375662|T375662]] (duration: 45m 29s) * 16:55 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/push-notifications: apply * 16:54 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/push-notifications: apply * 16:54 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/push-notifications: apply * 16:53 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/push-notifications: apply * 16:48 jayme@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1001.eqiad.wmnet with OS bookworm * 16:47 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl1001.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 16:40 brennen@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.3 refs [[phab:T375662|T375662]] * 16:39 jayme@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-ctrl1001.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 16:37 jayme@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl1001.eqiad.wmnet with OS bookworm * 16:34 dancy@deploy2002: Installation of scap version "4.123.0" completed for 209 hosts * 16:30 dancy@deploy2002: Installing scap version "4.123.0" for 209 hosts * 16:18 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/push-notifications: apply * 16:18 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/push-notifications: apply * 16:17 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/push-notifications: apply * 16:17 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/push-notifications: apply * 16:16 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/push-notifications: apply * 16:15 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/push-notifications: apply * 16:13 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cr[1-2]-eqiad * 16:13 cmooney@cumin1002: START - Cookbook sre.hosts.remove-downtime for cr[1-2]-eqiad * 16:08 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 16:07 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 15:57 jayme@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1001.eqiad.wmnet with OS bookworm * 15:56 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 15:55 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 15:52 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 15:52 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 15:47 jayme@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl1001.eqiad.wmnet with OS bookworm * 15:42 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:42 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for IPs moving from old to new fundraising firewalls - cmooney@cumin1002" * 15:35 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for IPs moving from old to new fundraising firewalls - cmooney@cumin1002" * 15:27 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 15:19 jayme@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1001.eqiad.wmnet with OS bookworm * 15:16 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wikikube-ctrl1002.eqiad.wmnet * 15:16 jayme@cumin2002: START - Cookbook sre.hosts.remove-downtime for wikikube-ctrl1002.eqiad.wmnet * 15:16 topranks: moving fundraising links in eqiad from old to new firewall cluster and switches ([[phab:T377381|T377381]]) * 15:14 jayme@cumin2002: START - Cookbook sre.k8s.reimage-stacked-control-plane Reimaging k8s control planes of cluster wikikube-eqiad: containerd migration * 15:13 jayme@cumin2002: END (FAIL) - Cookbook sre.k8s.reimage-stacked-control-plane (exit_code=99) Reimaging k8s control planes of cluster wikikube-eqiad: containerd migration * 15:10 jayme@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl1001.eqiad.wmnet with OS bookworm * 15:04 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cr[1-2]-eqiad,pfw3-eqiad with reason: fundraising tech migration to new equipment * 15:04 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on cr[1-2]-eqiad,pfw3-eqiad with reason: fundraising tech migration to new equipment * 15:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1012.eqiad.wmnet * 14:30 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on fasw-c-eqiad with reason: fundraising tech migration to new equipment * 14:30 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on fasw-c-eqiad with reason: fundraising tech migration to new equipment * 14:28 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:28 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for IPs moving from old to new fundraising firewalls - cmooney@cumin1002" * 14:28 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for IPs moving from old to new fundraising firewalls - cmooney@cumin1002" * 14:26 moritzm: installing apache2 security updates * 14:23 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 14:08 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 14:08 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 14:03 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1090455{{!}}[CirrusSearch] testwiki: enable offloading weighted tags via EventBus (T378983)]] * 13:58 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1090455{{!}}[CirrusSearch] testwiki: enable offloading weighted tags via EventBus (T378983)]] * 13:48 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 13:47 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 13:43 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.3 refs [[phab:T375662|T375662]] * 13:37 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.3 refs [[phab:T375662|T375662]] * 13:21 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1012.eqiad.wmnet * 13:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1003.eqiad.wmnet to plain * 13:14 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1003.eqiad.wmnet to plain * 13:11 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1012.eqiad.wmnet * 13:11 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1012.eqiad.wmnet * 13:10 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:10 jayme@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1001.eqiad.wmnet with OS bookworm * 13:09 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1003.eqiad.wmnet to drbd * 13:09 jayme@cumin2002: START - Cookbook sre.k8s.reimage-stacked-control-plane Reimaging k8s control planes of cluster wikikube-eqiad: containerd migration * 13:09 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 12:59 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1003.eqiad.wmnet to drbd * 12:54 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1002.eqiad.wmnet to plain * 12:53 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1002.eqiad.wmnet to plain * 12:53 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1012.eqiad.wmnet * 12:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1012.eqiad.wmnet * 12:45 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1002.eqiad.wmnet to drbd * 12:35 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1002.eqiad.wmnet to drbd * 12:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1012.eqiad.wmnet * 12:28 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2236 slowly with 10 steps - slow repool [[phab:T373579|T373579]] * 12:25 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1012.eqiad.wmnet * 12:09 moritzm: remove ganeti1015 from active ganeti nodes [[phab:T378921|T378921]] * 12:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti1010.eqiad.wmnet * 12:08 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:08 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti1010.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 12:04 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti1010.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:54 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1015.eqiad.wmnet * 11:54 jmm@cumin2002: START - Cookbook sre.dns.netbox * 11:52 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 11:48 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp5017.eqsin.wmnet * 11:47 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti1010.eqiad.wmnet * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti1013.eqiad.wmnet * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:40 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:37 jmm@cumin2002: START - Cookbook sre.dns.netbox * 11:27 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti1013.eqiad.wmnet * 11:23 btullis@cumin1002: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid jvm daemons. * 11:01 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 11:01 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:45 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2217 gradually with 4 steps - [[phab:T379491|T379491]] * 10:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 10:37 btullis@cumin1002: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid jvm daemons. * 10:36 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 10:36 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 10:36 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 10:12 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db2236 slowly with 10 steps - slow repool [[phab:T373579|T373579]] * 09:59 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db2217 gradually with 4 steps - [[phab:T379491|T379491]] * 09:48 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P71006 and previous config saved to /var/cache/conftool/dbconfig/20241112-094851-arnaudb.json * 09:41 moritzm: update d-i netboot image for 12.8 point release [[phab:T379600|T379600]] * 09:33 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P71005 and previous config saved to /var/cache/conftool/dbconfig/20241112-093343-arnaudb.json * 09:18 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1090428{{!}}Revert "CirrusSearch: re-enable offloading weighted tags via EventBus"]] (duration: 06m 46s) * 09:18 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P71004 and previous config saved to /var/cache/conftool/dbconfig/20241112-091836-arnaudb.json * 09:17 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 09:14 urbanecm@deploy2002: trainbranchbot, urbanecm: Continuing with sync * 09:14 urbanecm@deploy2002: trainbranchbot, urbanecm: Backport for [[gerrit:1090428{{!}}Revert "CirrusSearch: re-enable offloading weighted tags via EventBus"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 09:11 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1090428{{!}}Revert "CirrusSearch: re-enable offloading weighted tags via EventBus"]] * 09:10 urbanecm@deploy2002: Sync cancelled. * 09:03 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P71002 and previous config saved to /var/cache/conftool/dbconfig/20241112-090329-arnaudb.json * 08:38 urbanecm@deploy2002: pfischer, urbanecm: Backport for [[gerrit:1089826{{!}}CirrusSearch: re-enable offloading weighted tags via EventBus (T378983)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:36 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1089826{{!}}CirrusSearch: re-enable offloading weighted tags via EventBus (T378983)]] * 08:32 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1015.eqiad.wmnet * 08:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1015.eqiad.wmnet * 08:28 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1089230{{!}}Fix WeightedTagsUpdater (T378664 T378983)]] (duration: 06m 59s) * 08:25 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1015.eqiad.wmnet * 08:21 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1089230{{!}}Fix WeightedTagsUpdater (T378664 T378983)]] * 08:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1009.eqiad.wmnet * 08:17 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1009.eqiad.wmnet * 08:04 moritzm: installing apache security updates * 08:03 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P71001 and previous config saved to /var/cache/conftool/dbconfig/20241112-080303-arnaudb.json * 08:02 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:02 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:02 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 08:02 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 07:53 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti-test2003 * 07:53 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti-test2003 * 07:52 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:52 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 05:01 mwpresync@deploy2002: Pruned MediaWiki: 1.43.0-wmf.28 (duration: 01m 52s) == 2024-11-11 == * away: UTC late deploys done * 23:08 tgr@deploy2002: scap failed: <CalledProcessError> Command '['sudo', '-u', 'mwbuilder', '-n', '--', '/usr/bin/scap', 'mwscript', '--no-local-config', '--directory', '/srv/mediawiki-staging', '--user', 'www-data', '--network', '--', 'purgeMessageBlobStore.php']' returned non-zero exit status 1. (scap version: 4.122.0) (duration: 11m 44s) * 23:02 tgr@deploy2002: d3r1ck01, tgr: Continuing with sync * 22:59 tgr@deploy2002: d3r1ck01, tgr: Backport for [[gerrit:1089807{{!}}PageUpdater: restore call to RevisionFromEditComplete (T379152)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 22:56 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1089807{{!}}PageUpdater: restore call to RevisionFromEditComplete (T379152)]] * 22:30 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1089864{{!}}contactpage: Update AffCom contact form messages (Resubmit) (T375392)]] (duration: 25m 48s) * 22:21 tgr@deploy2002: tgr: Continuing with sync * 22:19 tgr@deploy2002: tgr: Backport for [[gerrit:1089864{{!}}contactpage: Update AffCom contact form messages (Resubmit) (T375392)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 22:13 eileen: civicrm upgraded from {{Gerrit|4330588d}} to {{Gerrit|bcd072a1}} * 22:05 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1089864{{!}}contactpage: Update AffCom contact form messages (Resubmit) (T375392)]] * 21:38 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1082174{{!}}contactpages: Update Affcom UserGroup application form (T375392)]] (duration: 28m 07s) * 21:33 tgr@deploy2002: ammarpad, tgr: Continuing with sync * 21:12 tgr@deploy2002: ammarpad, tgr: Backport for [[gerrit:1082174{{!}}contactpages: Update Affcom UserGroup application form (T375392)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:10 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1082174{{!}}contactpages: Update Affcom UserGroup application form (T375392)]] * 20:21 eileen: civicrm upgraded from {{Gerrit|65a8de90}} to {{Gerrit|4330588d}} * 17:55 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Add superset links - oblivian@cumin1002 - [[phab:T379567|T379567]]" * 17:55 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Add superset links - oblivian@cumin1002 - [[phab:T379567|T379567]] * 17:54 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Add superset links - oblivian@cumin1002 - [[phab:T379567|T379567]] * 17:54 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Add superset links - oblivian@cumin1002 - [[phab:T379567|T379567]]" * 16:19 elukey: restart pybal on lvs2013 (primary) to pick up new kartotherian-k8s-ssl service * 16:17 elukey: restart pybal on lvs2014 (secondary) to pick up new kartotherian-k8s-ssl service * 16:10 elukey: restart pybal on lvs1019 (primary) to pick up new kartotherian-k8s-ssl service * 16:09 elukey: restart pybal on lvs1020 (secondary) to pick up new kartotherian-k8s-ssl service * 16:09 moritzm: installing libarchive security updates * 15:55 elukey@puppetserver1001: conftool action : set/pooled=yes:weight=10; selector: dc=codfw,cluster=maps,service=kartotherian-k8s-ssl * 15:55 elukey@puppetserver1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=maps,service=kartotherian-k8s-ssl * 15:54 elukey@puppetserver1001: conftool action : set/pooled=yes:weight=1; selector: cluster=codfw,service=kartotherian-k8s-ssl * 15:04 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1311.eqiad.wmnet with OS bookworm * 15:04 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 15:04 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 15:03 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1309.eqiad.wmnet with OS bookworm * 15:03 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 15:00 Lucas_WMDE: UTC afternoon backport+config window done * 15:00 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1089739{{!}}wikipedias: clear link-recommendations on page save (T379522)]] (duration: 10m 59s) * 14:58 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 14:56 lucaswerkmeister-wmde@deploy2002: migr, lucaswerkmeister-wmde: Continuing with sync * 14:51 lucaswerkmeister-wmde@deploy2002: migr, lucaswerkmeister-wmde: Backport for [[gerrit:1089739{{!}}wikipedias: clear link-recommendations on page save (T379522)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:49 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1089739{{!}}wikipedias: clear link-recommendations on page save (T379522)]] * 14:44 btullis@cumin1002: END (FAIL) - Cookbook sre.presto.roll-restart-workers (exit_code=99) for Presto an-presto cluster: Roll restart of all Presto's jvm daemons. * 14:37 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1310.eqiad.wmnet with OS bookworm * 14:37 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 14:36 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 14:35 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2088.codfw.wmnet with OS bullseye * 14:33 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1312.eqiad.wmnet with OS bookworm * 14:33 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 14:32 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 14:32 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1306.eqiad.wmnet with OS bookworm * 14:32 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 14:32 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 14:28 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1308.eqiad.wmnet with OS bookworm * 14:28 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 14:28 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 14:27 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2088.codfw.wmnet with OS bullseye * 14:27 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1309.eqiad.wmnet with reason: host reimage * 14:26 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1307.eqiad.wmnet with OS bookworm * 14:26 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 14:25 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 14:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1311.eqiad.wmnet with reason: host reimage * 14:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1305.eqiad.wmnet with OS bookworm * 14:22 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 14:21 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 14:20 zabe@deploy2002: Finished scap sync-world: Backport for [[gerrit:1078764{{!}}zhwiki: Allow event-organizer self remove usergroup (T376061)]] (duration: 10m 40s) * 14:20 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2088.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 14:19 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1310.eqiad.wmnet with reason: host reimage * 14:16 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1306.eqiad.wmnet with reason: host reimage * 14:15 zabe@deploy2002: zabe, zhaofjx: Continuing with sync * 14:13 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1312.eqiad.wmnet with reason: host reimage * 14:12 zabe@deploy2002: zabe, zhaofjx: Backport for [[gerrit:1078764{{!}}zhwiki: Allow event-organizer self remove usergroup (T376061)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:10 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1308.eqiad.wmnet with reason: host reimage * 14:09 zabe@deploy2002: Started scap sync-world: Backport for [[gerrit:1078764{{!}}zhwiki: Allow event-organizer self remove usergroup (T376061)]] * 14:07 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2088.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 14:07 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1307.eqiad.wmnet with reason: host reimage * 14:06 btullis@cumin1002: START - Cookbook sre.presto.roll-restart-workers for Presto an-presto cluster: Roll restart of all Presto's jvm daemons. * 14:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts irc2002.wikimedia.org * 14:05 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:05 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: irc2002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:05 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: irc2002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:04 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1312.eqiad.wmnet with reason: host reimage * 14:04 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1308.eqiad.wmnet with reason: host reimage * 14:04 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1309.eqiad.wmnet with reason: host reimage * 14:04 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1311.eqiad.wmnet with reason: host reimage * 14:04 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1305.eqiad.wmnet with reason: host reimage * 14:04 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1310.eqiad.wmnet with reason: host reimage * 14:03 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1307.eqiad.wmnet with reason: host reimage * 14:03 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1306.eqiad.wmnet with reason: host reimage * 14:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1305.eqiad.wmnet with reason: host reimage * 13:55 moritzm: powercycled ganeti2031 * 13:44 jmm@cumin2002: START - Cookbook sre.dns.netbox * 13:39 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts irc2002.wikimedia.org * 13:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts irc1002.wikimedia.org * 13:38 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:38 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: irc1002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 13:34 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1312.eqiad.wmnet with OS bookworm * 13:34 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1311.eqiad.wmnet with OS bookworm * 13:34 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: irc1002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 13:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1311.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:33 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1312.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:33 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1310.eqiad.wmnet with OS bookworm * 13:32 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1309.eqiad.wmnet with OS bookworm * 13:32 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1308.eqiad.wmnet with OS bookworm * 13:32 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1307.eqiad.wmnet with OS bookworm * 13:32 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1306.eqiad.wmnet with OS bookworm * 13:31 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1306.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:31 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1305.eqiad.wmnet with OS bookworm * 13:30 jmm@cumin2002: START - Cookbook sre.dns.netbox * 13:29 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1307.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:29 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1309.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:29 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1310.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:29 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1308.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:29 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1305.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:25 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts irc1002.wikimedia.org * 13:22 jynus: reverting deleted rows on db1176 (mailman3) [[phab:T379519|T379519]] * 13:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1312.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:15 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1311.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:12 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1050.eqiad.wmnet to cluster eqiad and group D * 13:12 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1306.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:11 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1050.eqiad.wmnet to cluster eqiad and group D * 13:11 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1310.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:11 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1306.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:11 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1309.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:11 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1308.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:11 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1307.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:10 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1306.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:10 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1305.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:10 dreamyjazz@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085593{{!}}Exclude temp account viewer autopromotions from RC (T377829)]] (duration: 07m 07s) * 13:08 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:08 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for wikikube-worker - jclark@cumin1002" * 13:08 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for wikikube-worker - jclark@cumin1002" * 13:05 dreamyjazz@deploy2002: mszabo, dreamyjazz: Continuing with sync * 13:05 dreamyjazz@deploy2002: mszabo, dreamyjazz: Backport for [[gerrit:1085593{{!}}Exclude temp account viewer autopromotions from RC (T377829)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 13:05 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Fix bug in requestctl commit - oblivian@cumin1002" * 13:05 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix bug in requestctl commit - oblivian@cumin1002 * 13:04 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix bug in requestctl commit - oblivian@cumin1002 * 13:04 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Fix bug in requestctl commit - oblivian@cumin1002" * 13:04 jclark@cumin1002: START - Cookbook sre.dns.netbox * 13:03 dreamyjazz@deploy2002: Started scap sync-world: Backport for [[gerrit:1085593{{!}}Exclude temp account viewer autopromotions from RC (T377829)]] * 13:00 btullis@cumin1002: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-druid-analytics cluster: Roll restart of jvm daemons. * 12:54 btullis@cumin1002: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-druid-analytics cluster: Roll restart of jvm daemons. * 12:48 btullis@cumin1002: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-druid-public cluster: Roll restart of jvm daemons. * 12:42 btullis@cumin1002: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-druid-public cluster: Roll restart of jvm daemons. * 12:41 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1049.eqiad.wmnet to cluster eqiad and group D * 12:40 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1049.eqiad.wmnet to cluster eqiad and group D * 12:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1050.eqiad.wmnet * 12:29 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1050.eqiad.wmnet * 12:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1049.eqiad.wmnet * 12:23 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 12:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1049.eqiad.wmnet * 12:18 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1050 * 12:16 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1050 * 12:16 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1049 * 12:15 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1049 * 12:13 btullis@cumin1002: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-analytics cluster: Roll restart of jvm daemons. * 12:06 btullis@cumin1002: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-analytics cluster: Roll restart of jvm daemons. * 12:01 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 11:56 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 11:56 btullis@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host an-redacteddb1001.eqiad.wmnet * 11:54 btullis@cumin1002: END (PASS) - Cookbook sre.opensearch.roll-restart-reboot (exit_code=0) rolling restart_daemons on A:datahubsearch * 11:46 btullis@cumin1002: START - Cookbook sre.opensearch.roll-restart-reboot rolling restart_daemons on A:datahubsearch * 11:44 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 11:43 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-redacteddb1001.eqiad.wmnet * 11:43 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 11:43 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 11:30 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 11:06 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 11:04 btullis@cumin1002: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0) * 10:57 btullis@cumin1002: START - Cookbook sre.wikireplicas.update-views * 10:55 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . * 10:01 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 10:01 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 10:00 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 10:00 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 09:10 moritzm: remove ganeti1011 from active ganeti nodes [[phab:T378921|T378921]] * 09:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1011.eqiad.wmnet * 08:40 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088628{{!}}Update Wikimedia Foundation primary address. (T379417)]], [[gerrit:1082559{{!}}Update Office Wiki favicon to use wmf.ico and also delete now unused office.ico file. (T378026)]] (duration: 07m 15s) * 08:35 urbanecm@deploy2002: urbanecm, varnent: Continuing with sync * 08:35 urbanecm@deploy2002: urbanecm, varnent: Backport for [[gerrit:1088628{{!}}Update Wikimedia Foundation primary address. (T379417)]], [[gerrit:1082559{{!}}Update Office Wiki favicon to use wmf.ico and also delete now unused office.ico file. (T378026)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:32 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1088628{{!}}Update Wikimedia Foundation primary address. (T379417)]], [[gerrit:1082559{{!}}Update Office Wiki favicon to use wmf.ico and also delete now unused office.ico file. (T378026)]] * 08:32 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] (duration: 20m 59s) * 08:24 urbanecm@deploy2002: urbanecm, hamishz: Continuing with sync * 08:22 urbanecm@deploy2002: urbanecm, hamishz: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:18 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 08:18 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 08:17 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002 * 08:17 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002" * 08:11 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1089182{{!}}Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)]] * 07:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet * 07:49 _joe_: installing conftool 4.1.0 on puppetservers * 07:15 kartik@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' . == 2024-11-10 == * 23:43 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 23:17 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 23:14 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:51 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:29 jhathaway: re-imaging ms-be2082 to test efi boot order * 12:32 elukey: optimize table `archive` on db2217 - frwiki db - corrupt index error (host already depooled) * 12:26 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:26 slyngshede@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index * 12:25 slyngshede@cumin1002: dbctl commit (dc=all): 'Depool db2217', diff saved to https://phabricator.wikimedia.org/P70997 and previous config saved to /var/cache/conftool/dbconfig/20241110-122532-slyngshede.json == 2024-11-09 == * 14:49 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 14:49 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 14:48 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply == 2024-11-08 == * 23:35 zabe: attach Sotiale's local accounts on newly created wikis * 23:16 Reedy: ran `delete from oathauth_devices where oad_id=4506;` on centralauth for [[phab:T379398|T379398]] because oad_user=0 * 23:07 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 22:54 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:54 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:52 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:51 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:44 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:41 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:39 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply * 22:39 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 22:38 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply * 22:29 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 22:28 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 21:18 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:18 denisse: disabling Puppet on grafana2001 - [[phab:T379043|T379043]] * 21:17 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:12 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 21:08 mutante: cumint2002 [cumin2002:~] $ sudo systemctl reset-failed * 21:05 mutante: cumin2002 - sudo systemctl status httpbb_kubernetes_mw-api-int_hourly * 20:28 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] (duration: 10m 19s) * 20:24 aude@deploy2002: seddon, aude: Continuing with sync * 20:21 aude@deploy2002: seddon, aude: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:20 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 20:18 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088586{{!}}Reviving "Update interwiki map"]] * 20:15 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] (duration: 10m 55s) * 20:10 aude@deploy2002: aude: Continuing with sync * 20:06 aude@deploy2002: aude: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:04 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088375{{!}}Enable Tabular data for test commons (T378127)]] * 20:02 aude@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] (duration: 14m 33s) * 19:59 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:59 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on ms-be2082.codfw.wmnet with reason: [[phab:T371400|T371400]] * 19:57 aude@deploy2002: aude: Continuing with sync * 19:50 aude@deploy2002: aude: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:47 aude@deploy2002: Started scap sync-world: Backport for [[gerrit:1088366{{!}}Reopen testcommonswiki for testing Chart extension]] * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2168.codfw.wmnet with OS bookworm * 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:40 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 18:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2167.codfw.wmnet with OS bookworm * 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2170.codfw.wmnet with OS bookworm * 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:32 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2169.codfw.wmnet with OS bookworm * 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2166.codfw.wmnet with OS bookworm * 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2165.codfw.wmnet with OS bookworm * 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2164.codfw.wmnet with OS bookworm * 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:19 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:17 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:10 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage * 18:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 18:06 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage * 18:04 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 18:03 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage * 18:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2145.codfw.wmnet with OS bookworm * 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage * 17:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002" * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2144.codfw.wmnet with OS bookworm * 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:56 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-worker1005.eqiad.wmnet * 17:56 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage * 17:54 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2170.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2157.codfw.wmnet with OS bookworm * 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:49 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 17:47 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2169.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2160.codfw.wmnet with OS bookworm * 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2168.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2158.codfw.wmnet with OS bookworm * 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:43 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2167.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2162.codfw.wmnet with OS bookworm * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2166.codfw.wmnet with OS bookworm * 17:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:40 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2156.codfw.wmnet with OS bookworm * 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2165.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2161.codfw.wmnet with OS bookworm * 17:38 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:37 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2164.codfw.wmnet with OS bookworm * 17:37 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2159.codfw.wmnet with OS bookworm * 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:34 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:32 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage * 17:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:30 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:29 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 17:27 jynus: rebuild frwiki.geo_tags @ an-redacteddb1001 * 17:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:17 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:17 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bullseye * 17:15 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:14 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1005.eqiad.wmnet on all recursors * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:13 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:13 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002" * 17:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:10 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 17:09 herron@cumin1002: START - Cookbook sre.dns.netbox * 17:09 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1005.eqiad.wmnet * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage * 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage * 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm * 17:05 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:58 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:55 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2162.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2161.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2160.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2159.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2158.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2157.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2156.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2145.codfw.wmnet with OS bookworm * 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2144.codfw.wmnet with OS bookworm * 16:43 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:35 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 16:25 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:22 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 16:10 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage * 16:02 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2139.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:55 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:48 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2142.codfw.wmnet with OS bookworm * 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2143.codfw.wmnet with OS bookworm * 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2141.codfw.wmnet with OS bookworm * 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2129.codfw.wmnet with OS bookworm * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2140.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2138.codfw.wmnet with OS bookworm * 15:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2137.codfw.wmnet with OS bookworm * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:25 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2136.codfw.wmnet with OS bookworm * 15:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2128.codfw.wmnet with OS bookworm * 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:19 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm * 15:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2087.codfw.wmnet with OS bullseye * 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 15:15 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 15:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 15:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:08 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 15:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 15:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 15:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage * 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage * 15:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage * 14:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage * 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage * 14:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:55 elukey@cumin1002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 14:52 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2143.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2142.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2141.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2140.codfw.wmnet with OS bookworm * 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2139.codfw.wmnet with OS bookworm * 14:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 14:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2138.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2137.codfw.wmnet with OS bookworm * 14:38 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2129.codfw.wmnet with OS bookworm * 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2128.codfw.wmnet with OS bookworm * 14:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 14:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2158'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2157'] * 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2145'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144'] * 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2143'] * 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2143'] * 14:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2142'] * 14:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2141'] * 14:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2140'] * 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2140'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2139'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2138'] * 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2137'] * 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2137'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2136'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2129'] * 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128'] * 14:27 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128'] * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2086.codfw.wmnet with OS bullseye * 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 13:31 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 13:30 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:29 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 12:32 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:30 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 12:29 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 12:28 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 12:07 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 12:04 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2087.codfw.wmnet with OS bullseye * 11:59 apergos: testing of account creation backfill script on mwmaint2001 complete for the moment * 11:53 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye * 11:51 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:48 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:37 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 11:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2016.codfw.wmnet * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:24 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 11:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:16 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:13 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 11:13 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 11:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 11:04 jmm@cumin2002: START - Cookbook sre.dns.netbox * 11:00 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:58 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:56 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2016.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2015.codfw.wmnet * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:51 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:45 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2015.codfw.wmnet * 10:45 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:39 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 10:34 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:29 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1011.eqiad.wmnet * 10:18 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye * 10:16 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet * 10:02 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:01 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 09:57 apergos: testing account creation backfill script on mwmaint2001 in screen session as ariel * 09:49 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2085.codfw.wmnet with OS bullseye * 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:39 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 09:38 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 09:29 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:29 stevemunene@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure * 09:24 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:20 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage * 09:09 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 09:09 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2085.codfw.wmnet with OS bullseye * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a8-codfw * 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a1-codfw * 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a1-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b8-codfw * 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b7-codfw * 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b7-codfw * 08:56 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye * 08:54 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b6-codfw * 08:54 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b6-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b5-codfw * 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b4-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b3-codfw * 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b2-codfw * 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b2-codfw * 08:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a8-codfw * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a7-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a7-codfw * 08:43 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a6-codfw * 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a6-codfw * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a5-codfw * 08:42 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a5-codfw * 08:42 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C * 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a4-codfw * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a3-codfw * 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a3-codfw * 08:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a2-codfw * 08:40 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a2-codfw * 08:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-f1-eqiad * 08:39 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-f1-eqiad * 08:35 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-e1-eqiad * 08:35 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-e1-eqiad * 08:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cloudsw2-d5-eqiad * 08:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:34 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cloudsw2-d5-eqiad * 08:33 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 08:31 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:30 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr2-eqsin * 08:30 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cr2-eqsin * 08:27 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:27 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 08:26 moritzm: upgraded ircstream on irc.wikimedia.org to 1.0.1 * 08:08 XioNoX: update gnmic to 0.39 on all netflow hosts * 08:05 XioNoX: add gnmic 0.39 from official git repo to bookworm reprepro - [[phab:T347461|T347461]] * 07:48 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:48 XioNoX: manually install/test gnmic 0.39 on netflow6001 * 07:46 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1048.eqiad.wmnet * 07:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1047.eqiad.wmnet * 07:33 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C * 07:33 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C == 2024-11-07 == * 23:00 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bookworm * 22:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:34 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 22:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:19 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2026.codfw.wmnet with OS bullseye * 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:10 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2027.codfw.wmnet with OS bullseye * 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002" * 21:53 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:52 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:51 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002" * 21:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:47 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage * 21:41 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:33 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002" * 21:30 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:27 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:26 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002" * 21:22 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:21 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bookworm * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2027.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2026.codfw.wmnet with OS bullseye * 21:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2026'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2027'] * 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2026'] * 21:11 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:11 jsn@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] (duration: 08m 28s) * 21:09 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 21:06 jsn@deploy2002: suecarmol, jsn: Continuing with sync * 21:06 jsn@deploy2002: suecarmol, jsn: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002" * 21:03 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 21:02 jsn@deploy2002: Started scap sync-world: Backport for [[gerrit:1084883{{!}}Enable AutoModerator on viwiki (T378343)]] * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:59 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002" * 20:46 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm * 20:43 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:35 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] (duration: 13m 02s) * 20:30 cdanis@deploy2002: cdanis, aude: Continuing with sync * 20:25 cdanis@deploy2002: cdanis, aude: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:22 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087987{{!}}Enable Chart extension on testwiki and testcommonswiki (T378127)]] * 20:21 cdanis@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] (duration: 10m 45s) * 20:15 cdanis@deploy2002: cdanis, bvibber: Continuing with sync * 20:13 cdanis@deploy2002: cdanis, bvibber: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 20:10 cdanis@deploy2002: Started scap sync-world: Backport for [[gerrit:1087975{{!}}DB config for testcommonswiki deployment for Charts (T379199)]] * 20:02 dduvall@deploy2002: Installing scap version "4.122.0" for 209 hosts * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:42 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002" * 19:37 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:33 cmooney@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 19:33 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:23 cdanis: [[phab:T379199|T379199]] 💙cdanis@mwmaint2002.codfw.wmnet ~ 🕝☕ mwscript sql.php --wiki=testcommonswiki /srv/mediawiki/php-1.44.0-wmf.2/extensions/JsonConfig/sql/mysql/tables-generated.sql * 19:19 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:19 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:18 aokoth@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host vrts1003.eqiad.wmnet * 19:11 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:11 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables * 19:10 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:10 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables * 19:08 mutante: VRTS - switching firewall provider from iptables to nftables * 19:06 aokoth@cumin1002: START - Cookbook sre.hosts.reboot-single for host vrts1003.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host aux-k8s-worker1004.eqiad.wmnet * 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 19:00 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1004.eqiad.wmnet on all recursors * 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:58 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:58 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002" * 18:50 herron@cumin1002: START - Cookbook sre.dns.netbox * 18:50 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1004.eqiad.wmnet * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002" * 18:14 swfrench-wmf: updated changeprop-jobqueue to 2024-11-05-170900-production - [[phab:T356241|T356241]] * 18:13 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply * 18:11 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply * 18:01 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:59 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply * 17:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply * 17:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply * 17:55 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cloudvirt1063.eqiad.wmnet * 17:55 fnegri@cumin1002: START - Cookbook sre.hosts.remove-downtime for cloudvirt1063.eqiad.wmnet * 17:48 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply * 17:48 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply * 17:44 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply * 17:43 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply * 17:42 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply * 17:41 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1063.eqiad.wmnet with OS bookworm * 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:27 fnegri@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" * 17:18 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device fasw2-c1a-eqiad * 17:16 cmooney@cumin1002: START - Cookbook sre.network.tls for network device fasw2-c1a-eqiad * 17:12 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-global # [[phab:T375508|T375508]] * 17:09 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 17:08 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 17:06 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-per-wiki # [[phab:T375508|T375508]] * 17:03 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 17:02 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 17:01 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye * 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:57 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2084.codfw.wmnet with OS bullseye * 16:57 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply * 16:56 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply * 16:56 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage * 16:54 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply * 16:54 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 16:48 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:46 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2084.codfw.wmnet with OS bullseye * 16:45 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:41 fnegri@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bookworm * 16:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:32 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:28 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 16:28 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 16:24 arlolra@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply * 16:23 arlolra@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply * 16:15 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:07 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 16:04 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage * 15:57 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-eqiad * 15:54 moritzm: remove ganeti1010 from active ganeti nodes [[phab:T378921|T378921]] * 15:53 joelyrookewmde: Finished populateSitesTable for tcywiktionary ([[phab:T378466|T378466]]) and tcywikisource ([[phab:T378474|T378474]]) * 15:53 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 15:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 15:39 jgiannelos@deploy2002: Finished deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase (duration: 21m 33s) * 15:33 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-eqiad * 15:31 taavi: taavi@deploy2002 ~ $ mwscript-k8s migrateUserGroup.php -- --wiki=labswiki contentadmin sysop # [[phab:T375950|T375950]] * 15:31 joelyrookewmde: joelyrookewmde@mwmaint2002:~$ foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https * 15:29 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-codfw * 15:18 jgiannelos@deploy2002: Started deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase * 15:16 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye * 15:15 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 01m 13s) * 15:14 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:11 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 00m 52s) * 15:10 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) * 15:07 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-codfw * 14:55 hashar: Restarted CI Jenkins for plugins update * 14:41 moritzm: installing python-git security updates * 14:29 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye * 14:25 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] (duration: 09m 37s) * 14:20 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Continuing with sync * 14:18 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:15 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 14:15 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087927{{!}}Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)]] * 14:13 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] (duration: 10m 08s) * 14:09 kartik@deploy2002: kartik: Continuing with sync * 14:06 kartik@deploy2002: kartik: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:04 joal@deploy2002: Finished deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] (duration: 01m 44s) * 14:03 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1088215{{!}}Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)]] * 14:03 joal@deploy2002: Started deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] * 13:52 cwhite: running thanos bucket cleanup on titan1001 - [[phab:T351927|T351927]] * 13:37 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1048 * 13:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1048 * 13:35 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1047 * 13:34 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 13:23 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] (duration: 03m 44s) * 13:20 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] * 13:13 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] (duration: 05m 03s) * 13:08 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] * 12:53 joal@deploy2002: Finished deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] (duration: 16m 47s) * 12:40 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:40 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:39 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047 * 12:37 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047 * 12:36 joal@deploy2002: Started deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] * 12:16 vgutierrez: repool liberica on lvs1013 * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply * 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply * 11:27 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: sync * 11:26 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: sync * 11:25 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: sync * 11:24 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/proton: sync * 11:19 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply * 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply * 11:18 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 11:16 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 11:11 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 11:10 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 11:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 11:09 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet * 11:03 vgutierrez: depool liberica on lvs1013 * 11:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet * 10:58 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-test-eqiad * 10:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2081.codfw.wmnet with OS bullseye * 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002" * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:33 jmm@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-test-eqiad * 10:21 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply * 10:18 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage * 10:07 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 10:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1009.eqiad.wmnet * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002 * 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002" * 09:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70981 and previous config saved to /var/cache/conftool/dbconfig/20241107-095205-arnaudb.json * 09:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1009.eqiad.wmnet * 09:41 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2081.codfw.wmnet with OS bullseye * 09:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70980 and previous config saved to /var/cache/conftool/dbconfig/20241107-093657-arnaudb.json * 09:29 vgutierrez: upload liberica 0.4 to apt.wm.o (bookworm-wikimedia) * 09:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70979 and previous config saved to /var/cache/conftool/dbconfig/20241107-092150-arnaudb.json * 09:21 moritzm: installing openjdk-8 security updates * 09:21 moritzm: uploaded openjdk-8 8u412-ga-1~deb11u1 to apt.wikimedia.org for bookworm-wikimedia * 09:14 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70978 and previous config saved to /var/cache/conftool/dbconfig/20241107-090643-arnaudb.json * 08:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye * 08:40 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:26 kartik@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] (duration: 18m 39s) * 08:25 _joe_: runing scap pull on mwdebug2001/2002 * 08:19 kartik@deploy2002: kartik, abi: Continuing with sync * 08:13 kartik@deploy2002: kartik, abi: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 08:07 kartik@deploy2002: Started scap sync-world: Backport for [[gerrit:1087914{{!}}Translate: Enable message bundle Scribunto module on testwiki (T359918)]] * 08:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2155 ([[phab:T367781|T367781]])', diff saved to https://phabricator.wikimedia.org/P70977 and previous config saved to /var/cache/conftool/dbconfig/20241107-080618-arnaudb.json * 08:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance * 07:50 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 07:28 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C * 07:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C * 07:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B * 07:18 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply * 07:03 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply * 06:55 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply * 06:47 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply * 06:44 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply * 06:39 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply == 2024-11-06 == * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2152.codfw.wmnet with OS bookworm * 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1006.eqiad.wmnet with OS bookworm * 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:41 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2151.codfw.wmnet with OS bookworm * 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2154.codfw.wmnet with OS bookworm * 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:34 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1005.eqiad.wmnet with OS bookworm * 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:30 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2153.codfw.wmnet with OS bookworm * 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1004.eqiad.wmnet with OS bookworm * 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002" * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2155.codfw.wmnet with OS bookworm * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 23:12 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 23:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:02 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage * 23:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage * 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1004.eqiad.wmnet with OS bookworm * 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1005.eqiad.wmnet with OS bookworm * 22:43 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1006.eqiad.wmnet with OS bookworm * 22:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2155.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2154.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2153.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2152.codfw.wmnet with OS bookworm * 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2151.codfw.wmnet with OS bookworm * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2155'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2154'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2151'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2152'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2153'] * 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2154'] * 22:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2155'] * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002" * 22:18 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:14 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002" * 22:10 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2150.codfw.wmnet with OS bookworm * 21:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2148.codfw.wmnet with OS bookworm * 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2147.codfw.wmnet with OS bookworm * 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2146.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2149.codfw.wmnet with OS bookworm * 21:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:20 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:18 jclark@cumin1002: START - Cookbook sre.dns.netbox * 21:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 21:12 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet [reason: PSU replaced] * 21:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 21:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 21:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage * 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage * 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2148.codfw.wmnet with OS bookworm * 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2150.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2149.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2147.codfw.wmnet with OS bookworm * 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2146.codfw.wmnet with OS bookworm * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2149'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2148'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2147'] * 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2146'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2150'] * 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2149'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2148'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2147'] * 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2146'] * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:24 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002" * 20:19 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 19:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2006.codfw.wmnet with OS bookworm * 19:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:41 brett: Remove RSA cert support from P:idp clients (icinga, karma, klaxon, librenms, orchestrator) ([[phab:T375569|T375569]]) * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye * 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 18:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 18:03 sukhe: dummy authdns-update to test CR {{Gerrit|10857508}} * 17:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:45 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage * 17:35 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002" * 17:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:17 hnowlan: importing debs for mercurius-1.0.1 * 17:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:14 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:11 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002" * 17:05 vriley@cumin1002: START - Cookbook sre.dns.netbox * 16:58 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 16:37 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:36 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:32 moritzm: remove ganeti1014 from active ganeti nodes [[phab:T378921|T378921]] * 16:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 16:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:26 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:25 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 16:24 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:23 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:20 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002" * 16:17 jclark@cumin1002: START - Cookbook sre.dns.netbox * 16:10 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2136 gradually with 4 steps - cloned on db2236 * 16:10 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:08 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:01 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4010.ulsfo.wmnet * 15:59 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:58 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:57 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 (duration: 01m 23s) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002" * 15:57 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 * 15:55 topranks: rebooting lvs4010 to verify new IPv6 sysctl's for RA processing work [[phab:T358260|T358260]] * 15:55 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted * 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs4010.ulsfo.wmnet * 15:53 vriley@cumin1002: START - Cookbook sre.dns.netbox * 15:51 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:50 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:48 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:43 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:31 moritzm: installing Linux 5.10.226 on bullseye hosts * 15:24 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db2136 gradually with 4 steps - cloned on db2236 * 15:18 mutante: gitlab1004 - systemctl start wmf_auto_restart_ssh-gitlab (because it had failed with "Service ssh-gitlab not present or not running") but now it's just fine and exits with "No restart necessary" [[phab:T379166|T379166]] * 15:13 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 15:12 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] (duration: 38m 45s) * 15:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2136.codfw.wmnet onto db2236.codfw.wmnet * 15:00 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 14:59 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:51 moritzm: installing php7.4 security updates * 14:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1046.eqiad.wmnet * 14:48 moritzm: installing usb.ids updates from Bookworm point release * 14:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1046.eqiad.wmnet * 14:42 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1046 * 14:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1046 * 14:33 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087877{{!}}Document available wbformatvalue options (T323778)]] * 14:31 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] (duration: 15m 01s) * 14:31 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:31 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:27 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Continuing with sync * 14:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1045.eqiad.wmnet * 14:20 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet * 14:19 sukhe: depool cp2031 * 14:19 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1045.eqiad.wmnet * 14:16 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085572{{!}}Cleanup for logo related file]] * 14:16 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1045 * 14:14 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1045 * 14:02 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 14:02 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, [[phab:T378453|T378453]]] * 13:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:52 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:47 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 13:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:43 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain * 13:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 13:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1041.eqiad.wmnet * 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 13:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 13:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2136.codfw.wmnet onto db2236.codfw.wmnet * 12:58 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Cloning db2136 in db2236 for [[phab:T373579|T373579]]', diff saved to https://phabricator.wikimedia.org/P70964 and previous config saved to /var/cache/conftool/dbconfig/20241106-125648-arnaudb.json * 12:55 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: START - Cookbook sre.mysql.depool db2136 - depooling db2136 to clone on db2236 * 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - [[phab:T373579|T373579]] * 12:52 slyngs: IDP/CAS-SSO Enable Redis TGT backend * 12:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 12:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 12:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:25 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - test {{Gerrit|1087895}} * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 depool to test cookbook hotfix on CR 1087895', diff saved to https://phabricator.wikimedia.org/P70960 and previous config saved to /var/cache/conftool/dbconfig/20241106-122348-arnaudb.json * 12:23 marostegui: Migrate db1125 to MariaDB 10.6.20 [[phab:T378940|T378940]] * 12:23 arnaudb@cumin1002: dbctl commit (dc=all): '"db1206 pending"', diff saved to https://phabricator.wikimedia.org/P70959 and previous config saved to /var/cache/conftool/dbconfig/20241106-122318-arnaudb.json * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:09 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db1206 quickly with 2 steps - repool * 12:09 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - repool * 12:06 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 12:06 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply * 12:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db1206', diff saved to https://phabricator.wikimedia.org/P70957 and previous config saved to /var/cache/conftool/dbconfig/20241106-120536-arnaudb.json * 12:03 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 12:03 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply * 12:02 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:32 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1041.eqiad.wmnet * 11:08 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet * 10:50 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 10:43 fabfur: rolling out haproxykafka on all ULSFO cp hosts (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087862) ([[phab:T378578|T378578]]) * 10:43 elukey: depool maps1005 to test an nginx config - [[phab:T378944|T378944]] * 10:41 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 10:32 XioNoX: push new pfw policies - [[phab:T379127|T379127]] * 10:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:27 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain * 10:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 10:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd * 09:59 jnuche@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] (duration: 08m 03s) * 09:55 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B * 09:54 jnuche@deploy2002: jnuche: Continuing with sync * 09:54 jnuche@deploy2002: jnuche: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 09:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:52 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B * 09:51 jnuche@deploy2002: Started scap sync-world: Backport for [[gerrit:1087863{{!}}Fix automatic category creations by FuzzyBot (T285463)]] * 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1044.eqiad.wmnet * 09:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1044.eqiad.wmnet * 09:38 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1043.eqiad.wmnet * 09:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1043.eqiad.wmnet * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1044 * 09:28 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1044 * 09:27 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1043 * 09:25 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1043 * 09:20 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye * 09:10 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye * 08:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:46 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 08:12 volans: manually cleared /root/.ssh/known_hosts on the cumin hosts - [[phab:T336485|T336485]] * 05:52 kart_: Updated cxserver to 2024-10-25-044319-production ([[phab:T377160|T377160]], [[phab:T375102|T375102]], [[phab:T371420|T371420]]) * 05:38 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 05:38 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 05:37 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 05:36 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply * 05:34 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 05:33 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply * 01:30 zabe@deploy2002: Finished scap sync-world: [[phab:T378260|T378260]] (duration: 07m 34s) * 01:23 zabe@deploy2002: Started scap sync-world: [[phab:T378260|T378260]] * 00:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es1021 gradually with 4 steps - Maint over * 00:21 ryankemper: [[phab:T377594|T377594]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087598; ran puppet on `snapshot101[0-7]*`. These dumps should be re-enabled now * 00:02 ebernhardson@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] (duration: 08m 48s) == 2024-11-05 == * 23:59 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool es1021 gradually with 4 steps - Maint over * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2134.codfw.wmnet with OS bookworm * 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 ebernhardson@deploy2002: ebernhardson: Continuing with sync * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:56 ebernhardson@deploy2002: ebernhardson: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2132.codfw.wmnet with OS bookworm * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:55 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2130.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2133.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2131.codfw.wmnet with OS bookworm * 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:53 ebernhardson@deploy2002: Started scap sync-world: Backport for [[gerrit:1087592{{!}}TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593{{!}}TextPassDumper: refresh content address on failure (T377594)]] * 23:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:44 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage * 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage * 23:17 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage * 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2135.codfw.wmnet with OS bookworm * 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2134.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2133.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2132.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2131.codfw.wmnet with OS bookworm * 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2130.codfw.wmnet with OS bookworm * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2135'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2134'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2133'] * 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2132'] * 22:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2130'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2135'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2134'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2133'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2132'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2131'] * 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2130'] * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2134 * 22:30 jhancock@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host wikikube-worker2135 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2133 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2132 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2131 * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2130 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2135 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2134 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2133 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2131 * 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2130 * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002" * 22:29 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132 * 22:26 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 21:47 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] (duration: 12m 39s) * 21:34 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087560{{!}}AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561{{!}}AbstractProvider: Normalize top level config correctly (T379094)]] * 21:33 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] (duration: 31m 18s) * 21:11 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:06 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:02 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087540{{!}}cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] * 21:01 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 21:00 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:56 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 20:56 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:14 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002" * 20:07 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 20:07 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 20:02 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002" * 19:57 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 19:57 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:56 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:52 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet * 19:20 eileen: civicrm upgraded from {{Gerrit|26d8013c}} to {{Gerrit|65a8de90}} * 18:45 cmooney@cumin1002: START - Cookbook sre.dns.netbox * 18:10 Amir1: gradual delete of thumbs in fawiki local images in both dcs * 18:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1021 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70948 and previous config saved to /var/cache/conftool/dbconfig/20241105-180013-ladsgroup.json * 18:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:59 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance * 17:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70947 and previous config saved to /var/cache/conftool/dbconfig/20241105-175851-ladsgroup.json * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 17:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70946 and previous config saved to /var/cache/conftool/dbconfig/20241105-174344-ladsgroup.json * 17:42 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 17:41 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:39 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 17:36 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:34 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:33 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:32 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 17:32 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 17:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70945 and previous config saved to /var/cache/conftool/dbconfig/20241105-172837-ladsgroup.json * 17:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70943 and previous config saved to /var/cache/conftool/dbconfig/20241105-171330-ladsgroup.json * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1028 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70942 and previous config saved to /var/cache/conftool/dbconfig/20241105-170636-ladsgroup.json * 17:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance * 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70941 and previous config saved to /var/cache/conftool/dbconfig/20241105-170609-ladsgroup.json * 16:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70940 and previous config saved to /var/cache/conftool/dbconfig/20241105-165103-ladsgroup.json * 16:37 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] (duration: 08m 02s) * 16:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70939 and previous config saved to /var/cache/conftool/dbconfig/20241105-163556-ladsgroup.json * 16:34 cdanis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync * 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 16:32 cdanis@cumin1002: START - Cookbook sre.dns.netbox * 16:29 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1087507{{!}}Fixup paths to moved resources (T379080)]] * 16:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70938 and previous config saved to /var/cache/conftool/dbconfig/20241105-162048-ladsgroup.json * 16:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1031 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70937 and previous config saved to /var/cache/conftool/dbconfig/20241105-161455-ladsgroup.json * 16:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance * 16:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70936 and previous config saved to /var/cache/conftool/dbconfig/20241105-161340-ladsgroup.json * 16:01 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 16:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet * 15:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70935 and previous config saved to /var/cache/conftool/dbconfig/20241105-155833-ladsgroup.json * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1014.eqiad.wmnet * 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet * 15:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B * 15:51 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:50 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B * 15:48 moritzm: remove ganeti1013 from active ganeti nodes [[phab:T378921|T378921]] * 15:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 15:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70934 and previous config saved to /var/cache/conftool/dbconfig/20241105-154326-ladsgroup.json * 15:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:37 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 15:32 hashar: Switched PCC workers to Java 17 via https://horizon.wikimedia.org/project/prefixpuppet/?tab=prefix_puppet__puppet-pcc-worker # [[phab:T359795|T359795]] * 15:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70933 and previous config saved to /var/cache/conftool/dbconfig/20241105-152819-ladsgroup.json * 15:27 hashar: Switched deployment-deploy04.deployment-prep.eqiad1.wikimedia.cloud to Java 17 # [[phab:T359795|T359795]] * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1033 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70932 and previous config saved to /var/cache/conftool/dbconfig/20241105-152139-ladsgroup.json * 15:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance * 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70931 and previous config saved to /var/cache/conftool/dbconfig/20241105-152114-ladsgroup.json * 15:20 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 15:18 hashar: Switched WMCS integration instances from Java 11 to Java 17 via Horizon project wide config. That was forgotten in [[phab:T359795|T359795]] and blocks today Jenkins upgrade ( [[phab:T379059|T379059]] ) * 15:15 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * 15:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70929 and previous config saved to /var/cache/conftool/dbconfig/20241105-150607-ladsgroup.json * 15:02 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply * 15:02 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply * 15:01 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply * 15:01 hashar: Upgrading CI Jenkins {{!}} [[phab:T379059|T379059]] * 14:53 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70928 and previous config saved to /var/cache/conftool/dbconfig/20241105-145059-ladsgroup.json * 14:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:48 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 14:44 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply * 14:44 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply * 14:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70927 and previous config saved to /var/cache/conftool/dbconfig/20241105-143552-ladsgroup.json * 14:34 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 14:33 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm * away: UTC afternoon deploys done * 14:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1026 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70926 and previous config saved to /var/cache/conftool/dbconfig/20241105-142959-ladsgroup.json * 14:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance * 14:29 vgutierrez: upload liberica 0.3 to apt.wm.o (bookworm-wikimedia) * 14:28 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] (duration: 17m 24s) * 14:24 tgr@deploy2002: tgr: Continuing with sync * 14:16 tgr@deploy2002: tgr: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:11 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087455{{!}}JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] * 14:10 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 14:10 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 14:09 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage * 14:08 moritzm: installing PHP 7.4 security updates on bullseye (as packaged in Debian) * 14:08 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 14:07 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:57 moritzm: installed libapache2-mod-auth-openidc bugfix updates from Bookworm point release * 13:54 arnaudb: reimage pc1017 [[phab:T378068|T378068]] * 13:53 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm * 13:52 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:52 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:44 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 13:41 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:39 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 13:34 moritzm: imported jenkins 2.479.1 to thirdparty/ci for bullseye-wikimedia [[phab:T379059|T379059]] * 13:29 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 13:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:16 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: [[phab:T378068|T378068]], host is not pooled * 13:10 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1042.eqiad.wmnet * 13:10 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox * 13:09 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary * 13:09 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary * 13:08 moritzm: installing php7.4 security updates on remaining non-wikikube servers [[phab:T378173|T378173]] * 13:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1042.eqiad.wmnet * 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1041.eqiad.wmnet * 12:50 kharlan@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] (duration: 11m 46s) * 12:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1041.eqiad.wmnet * 12:46 kharlan@deploy2002: kharlan: Continuing with sync * 12:42 kharlan@deploy2002: kharlan: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 12:40 fnegri@cumin1002: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0) * 12:39 kharlan@deploy2002: Started scap sync-world: Backport for [[gerrit:1087424{{!}}Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:35 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:34 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93) * 12:34 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views * 12:33 urbanecm: eswiki,x1: `delete from growthexperiments_link_recommendations where gelr_page=10598298;` (to verify updates are flowing in; [[phab:T378983|T378983]]) * 12:33 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:33 urbanecm: mwmaint2002: kill all instances of refreshLinkRecommendation ([[phab:T378983|T378983]]) * 12:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet * 12:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet * 12:23 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] (duration: 07m 39s) * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing * 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:17 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing * 12:16 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087407{{!}}CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] * 12:10 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 07m 43s) * 12:04 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B * 12:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 12:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1040.eqiad.wmnet * 11:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1040.eqiad.wmnet * 11:53 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1042 * 11:53 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70922 and previous config saved to /var/cache/conftool/dbconfig/20241105-115301-ladsgroup.json * 11:52 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1042 * 11:49 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1041 * 11:47 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1041 * 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1040 * 11:46 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1040 * 11:39 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] (duration: 36m 28s) * 11:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70921 and previous config saved to /var/cache/conftool/dbconfig/20241105-113754-ladsgroup.json * 11:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70920 and previous config saved to /var/cache/conftool/dbconfig/20241105-112246-ladsgroup.json * 11:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70919 and previous config saved to /var/cache/conftool/dbconfig/20241105-110739-ladsgroup.json * 11:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1029 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70918 and previous config saved to /var/cache/conftool/dbconfig/20241105-110139-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70917 and previous config saved to /var/cache/conftool/dbconfig/20241105-110115-ladsgroup.json * 10:46 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70916 and previous config saved to /var/cache/conftool/dbconfig/20241105-104608-ladsgroup.json * 10:44 jnuche@deploy2002: install-world aborted: (no justification provided) (duration: 03m 09s) * 10:41 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts * 10:41 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:40 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70915 and previous config saved to /var/cache/conftool/dbconfig/20241105-103101-ladsgroup.json * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70914 and previous config saved to /var/cache/conftool/dbconfig/20241105-101553-ladsgroup.json * 10:11 elukey: set proxy timeouts of docker registry's nginx instances from 300s to 180s - [[phab:T378618|T378618]] * 10:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1032 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70913 and previous config saved to /var/cache/conftool/dbconfig/20241105-100953-ladsgroup.json * 10:09 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:09 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance * 10:07 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1013.eqiad.wmnet with OS bookworm * 10:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 10:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 09:49 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:45 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 09:33 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 09:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:31 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: [[phab:T373037|T373037]], host is not pooled * 09:22 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 09:21 _joe_: restarted rsyslog on deploy2002 [[phab:T379044|T379044]] * 08:57 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087373{{!}}Revert "temp accounts: Enable temp account creation on second-round pilots"]] * 08:24 vgutierrez: uploaded ipip-multiqueue-optimizer 0.3+deb12u1 to apt.wm.o (bookworm) * 08:10 tchanders@deploy2002: Started scap sync-world: Backport for [[gerrit:1087195{{!}}temp accounts: Enable temp account creation on second-round pilots (T378336)]] * 08:06 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 2828 * 08:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 14593 * 07:55 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 14593 * 07:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 11414 * 07:39 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 11414 * 05:10 mwpresync@deploy2002: Pruned MediaWiki: 1.43.0-wmf.27 (duration: 10m 37s) * 04:03 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs [[phab:T375661|T375661]] * 00:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:10 rzl@deploy2002: Finished scap sync-world: {{Gerrit|1085506}} (duration: 02m 50s) * 00:08 rzl@deploy2002: Started scap sync-world: {{Gerrit|1085506}} * 00:04 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2024-11-04 == * 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host mc-gp2006 * 23:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host mc-gp2006 * 23:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mc-gp2006.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2005.codfw.wmnet with OS bookworm * 23:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2004.codfw.wmnet with OS bookworm * 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:15 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage * 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2005.codfw.wmnet with OS bookworm * 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2004.codfw.wmnet with OS bookworm * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2006'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2005'] * 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2004'] * 22:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2006'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2005'] * 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2004'] * 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:22 damilare: civicrm upgraded from {{Gerrit|31f5cbdb}} to {{Gerrit|26d8013c}} * 22:22 damilare: SmashPig upgraded from {{Gerrit|be47dddd}} to {{Gerrit|601405dc}} * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:16 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002" * 22:12 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 22:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2003.codfw.wmnet with OS bookworm * 22:00 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 22:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70912 and previous config saved to /var/cache/conftool/dbconfig/20241104-220026-ladsgroup.json * 22:00 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2004.codfw.wmnet with OS bookworm * 21:58 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 21:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70911 and previous config saved to /var/cache/conftool/dbconfig/20241104-214519-ladsgroup.json * away: UTC late deploys done * 21:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:41 tgr@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] (duration: 08m 40s) * 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:36 tgr@deploy2002: tgr, kemayo: Continuing with sync * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage * 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage * 21:35 tgr@deploy2002: tgr, kemayo: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 21:32 tgr@deploy2002: Started scap sync-world: Backport for [[gerrit:1087207{{!}}Set Flow to read-only on remaining phase 0 wikis (T377990)]] * 21:31 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70910 and previous config saved to /var/cache/conftool/dbconfig/20241104-213012-ladsgroup.json * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2004.codfw.wmnet with OS bookworm * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2003.codfw.wmnet with OS bookworm * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2003'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2004'] * 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2003'] * 21:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70909 and previous config saved to /var/cache/conftool/dbconfig/20241104-211505-ladsgroup.json * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1226 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70908 and previous config saved to /var/cache/conftool/dbconfig/20241104-210800-ladsgroup.json * 21:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 21:05 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002" * 21:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance * 21:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70907 and previous config saved to /var/cache/conftool/dbconfig/20241104-210224-ladsgroup.json * 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:47 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002 * 20:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70906 and previous config saved to /var/cache/conftool/dbconfig/20241104-204717-ladsgroup.json * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1013.eqiad.wmnet * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 eevans@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002" * 20:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70905 and previous config saved to /var/cache/conftool/dbconfig/20241104-203210-ladsgroup.json * 20:27 eevans@cumin1002: START - Cookbook sre.dns.netbox * 20:26 swfrench-wmf: zero-replica "migration" releases created for all shellbox instances - [[phab:T375243|T375243]] * 20:23 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply * 20:23 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply * 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 20:21 eevans@cumin1002: START - Cookbook sre.hosts.decommission for hosts aqs1013.eqiad.wmnet * 20:21 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 20:21 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply * 20:20 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply * 20:19 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox: apply * 20:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70904 and previous config saved to /var/cache/conftool/dbconfig/20241104-201703-ladsgroup.json * 20:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1214 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70903 and previous config saved to /var/cache/conftool/dbconfig/20241104-200905-ladsgroup.json * 20:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 20:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70902 and previous config saved to /var/cache/conftool/dbconfig/20241104-200840-ladsgroup.json * 20:00 urbanecm@deploy2002: Finished scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] (duration: 09m 12s) * 19:55 urbanecm@deploy2002: urbanecm: Continuing with sync * 19:54 urbanecm@deploy2002: urbanecm: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 19:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70901 and previous config saved to /var/cache/conftool/dbconfig/20241104-195333-ladsgroup.json * 19:51 urbanecm@deploy2002: Started scap sync-world: Backport for [[gerrit:1087231{{!}}Message: Downgrade exception on bool/null param to warning (T378876)]] * 19:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70900 and previous config saved to /var/cache/conftool/dbconfig/20241104-193826-ladsgroup.json * 19:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70899 and previous config saved to /var/cache/conftool/dbconfig/20241104-192319-ladsgroup.json * 19:23 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply * 19:22 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply * 19:21 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 19:20 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 19:19 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply * 19:18 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply * 19:17 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox: apply * 19:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1211 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70898 and previous config saved to /var/cache/conftool/dbconfig/20241104-191519-ladsgroup.json * 19:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance * 19:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70897 and previous config saved to /var/cache/conftool/dbconfig/20241104-191454-ladsgroup.json * 19:09 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:09 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply * 19:04 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 19:03 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70896 and previous config saved to /var/cache/conftool/dbconfig/20241104-185947-ladsgroup.json * 18:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-video: apply * 18:57 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply * 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 18:55 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply * 18:54 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply * 18:53 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply * 18:47 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:47 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer * 18:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70895 and previous config saved to /var/cache/conftool/dbconfig/20241104-184440-ladsgroup.json * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs2013.codfw.wmnet * 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez * 18:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70894 and previous config saved to /var/cache/conftool/dbconfig/20241104-182933-ladsgroup.json * 18:25 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1209 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70893 and previous config saved to /var/cache/conftool/dbconfig/20241104-182140-ladsgroup.json * 18:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance * 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70892 and previous config saved to /var/cache/conftool/dbconfig/20241104-182125-ladsgroup.json * 18:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70891 and previous config saved to /var/cache/conftool/dbconfig/20241104-180618-ladsgroup.json * 18:01 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:56 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 17:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70890 and previous config saved to /var/cache/conftool/dbconfig/20241104-175111-ladsgroup.json * 17:43 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 17:43 vgutierrez: upload liberica 0.2 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 17:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 17:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70889 and previous config saved to /var/cache/conftool/dbconfig/20241104-173604-ladsgroup.json * 17:35 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host lvs1013.eqiad.wmnet * 17:35 vgutierrez@cumin1002: START - Cookbook sre.puppet.migrate-host for host lvs1013.eqiad.wmnet * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1203 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70888 and previous config saved to /var/cache/conftool/dbconfig/20241104-172638-ladsgroup.json * 17:26 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70887 and previous config saved to /var/cache/conftool/dbconfig/20241104-172612-ladsgroup.json * 17:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:20 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 17:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70886 and previous config saved to /var/cache/conftool/dbconfig/20241104-171105-ladsgroup.json * 17:07 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 17:06 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 16:59 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm * 16:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70885 and previous config saved to /var/cache/conftool/dbconfig/20241104-165558-ladsgroup.json * 16:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70883 and previous config saved to /var/cache/conftool/dbconfig/20241104-164051-ladsgroup.json * 16:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1192 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70882 and previous config saved to /var/cache/conftool/dbconfig/20241104-163129-ladsgroup.json * 16:31 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70881 and previous config saved to /var/cache/conftool/dbconfig/20241104-163104-ladsgroup.json * 16:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:21 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage * 16:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70880 and previous config saved to /var/cache/conftool/dbconfig/20241104-161557-ladsgroup.json * 16:15 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply * 16:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235 * 16:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply * 16:05 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm * 16:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2135.codfw.wmnet onto db2235.codfw.wmnet * 16:01 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70879 and previous config saved to /var/cache/conftool/dbconfig/20241104-160050-ladsgroup.json * 16:00 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 16:00 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235 * 15:58 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:54 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:51 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage * 15:47 pt1979@cumin2002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 15:46 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 15:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70878 and previous config saved to /var/cache/conftool/dbconfig/20241104-154543-ladsgroup.json * 15:40 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm * 15:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70877 and previous config saved to /var/cache/conftool/dbconfig/20241104-153613-ladsgroup.json * 15:36 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 vgutierrez: upload liberica 0.1 to apt.wm.o (bookworm) - [[phab:T377127|T377127]] * 15:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 15:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70876 and previous config saved to /var/cache/conftool/dbconfig/20241104-153548-ladsgroup.json * 15:29 sukhe: running authdns-update to move CN traffic to eqsin from ulsfo: [[phab:T378744|T378744]] * 15:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70874 and previous config saved to /var/cache/conftool/dbconfig/20241104-152041-ladsgroup.json * 15:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70873 and previous config saved to /var/cache/conftool/dbconfig/20241104-150534-ladsgroup.json * 14:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70872 and previous config saved to /var/cache/conftool/dbconfig/20241104-145027-ladsgroup.json * 14:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70871 and previous config saved to /var/cache/conftool/dbconfig/20241104-144101-ladsgroup.json * 14:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 14:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70870 and previous config saved to /var/cache/conftool/dbconfig/20241104-144037-ladsgroup.json * 14:38 Lucas_WMDE: UTC afternoon backport+config window done * 14:36 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] (duration: 23m 39s) * 14:28 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Continuing with sync * 14:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70869 and previous config saved to /var/cache/conftool/dbconfig/20241104-142530-ladsgroup.json * 14:24 moritzm: uploaded php7.4 7.4.33-1+0~20221108.73+debian10~1.gbpa00350a+wmf10u2+icu67u3 to component/icu67 (backports of latest security fixes to our PHP 7.4 build) * 14:23 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 14:12 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1084765{{!}}Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)]] * 14:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70868 and previous config saved to /var/cache/conftool/dbconfig/20241104-141023-ladsgroup.json * 13:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70867 and previous config saved to /var/cache/conftool/dbconfig/20241104-135516-ladsgroup.json * 13:51 marostegui: Start schema change on redacteddb1001:s8 [[phab:T367856|T367856]] (this will make replication in s8 lag for around 2-3 days) * 13:50 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:50 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change [[phab:T367856|T367856]] * 13:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70866 and previous config saved to /var/cache/conftool/dbconfig/20241104-134605-ladsgroup.json * 13:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance * 13:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70865 and previous config saved to /var/cache/conftool/dbconfig/20241104-134021-ladsgroup.json * 13:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70864 and previous config saved to /var/cache/conftool/dbconfig/20241104-132513-ladsgroup.json * 13:24 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 13:11 Dreamy_Jazz: Started slow MediaModeration scan for commonswiki to be scanning as close to upload as possible - https://wikitech.wikimedia.org/wiki/MediaModeration * 13:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70862 and previous config saved to /var/cache/conftool/dbconfig/20241104-131006-ladsgroup.json * 13:06 Dreamy_Jazz: Started MediaModeration scan on all wikis other than s4 (commonswiki + testcommonswiki) - https://wikitech.wikimedia.org/wiki/MediaModeration * 12:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70861 and previous config saved to /var/cache/conftool/dbconfig/20241104-125459-ladsgroup.json * 12:49 XioNoX: deploy "Add temporary LVS community for liberica test" - [[phab:T378453|T378453]] * 12:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70860 and previous config saved to /var/cache/conftool/dbconfig/20241104-124533-ladsgroup.json * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 12:35 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:34 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 12:24 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 12:20 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 12:11 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:11 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B * 12:10 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 12:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 12:08 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 12:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 11:58 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70859 and previous config saved to /var/cache/conftool/dbconfig/20241104-115514-ladsgroup.json * 11:45 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:44 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70858 and previous config saved to /var/cache/conftool/dbconfig/20241104-114008-ladsgroup.json * 11:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70857 and previous config saved to /var/cache/conftool/dbconfig/20241104-112501-ladsgroup.json * 11:22 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:12 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70856 and previous config saved to /var/cache/conftool/dbconfig/20241104-110953-ladsgroup.json * 11:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70855 and previous config saved to /var/cache/conftool/dbconfig/20241104-110141-ladsgroup.json * 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance * 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70854 and previous config saved to /var/cache/conftool/dbconfig/20241104-110113-ladsgroup.json * 10:54 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:52 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:48 XioNoX: eqiad: Prefer Lumen to reach ATT - [[phab:T377844|T377844]] * 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70853 and previous config saved to /var/cache/conftool/dbconfig/20241104-104606-ladsgroup.json * 10:42 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:41 moritzm: installing libtool updates from Bookworm point release * 10:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:31 moritzm: installing libseccomp updates from Bookworm point release * 10:31 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70852 and previous config saved to /var/cache/conftool/dbconfig/20241104-103059-ladsgroup.json * 10:20 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:17 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70851 and previous config saved to /var/cache/conftool/dbconfig/20241104-101552-ladsgroup.json * 10:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70850 and previous config saved to /var/cache/conftool/dbconfig/20241104-100813-ladsgroup.json * 10:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance * 10:06 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:02 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 10:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 10:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance * 09:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:56 volans: deploying spicerack v8.15.2 to cumin[12]002 * 09:55 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:42 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:37 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 09:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: reboots for nftables * 09:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables * 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet * 08:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet * 08:57 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:51 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2014.codfw.wmnet * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:22 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:21 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:21 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization [[phab:T373579|T373579]] * 08:16 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:15 XioNoX: push Drop labtestwikitech return traffic term to eqiad routers - CR1083589 * 08:12 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2014.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2013.codfw.wmnet * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:09 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 08:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 08:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 08:03 jmm@cumin2002: START - Cookbook sre.dns.netbox * 07:59 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2013.codfw.wmnet == 2024-11-02 == * 15:48 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] (duration: 12m 09s) * 15:44 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Continuing with sync * 15:38 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:36 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [[gerrit:1085922{{!}}Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)]] * 15:26 reedy@deploy2002: Finished scap sync-world: use statemnts (duration: 07m 13s) * 15:19 reedy@deploy2002: Started scap sync-world: use statemnts * 15:13 reedy@deploy2002: Synchronized wmf-config/: Comment updates (duration: 07m 31s) == 2024-11-01 == * 20:27 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1016.eqiad.wmnet with OS bullseye * 19:47 inflatador: bking@an-presto[1016:1020].eqiad.wmnet temporarily install perccli to check disk status without requiring reboot [[phab:T374924|T374924]] * 19:34 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:31 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage * 19:16 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1016.eqiad.wmnet with OS bullseye * 19:12 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 19:07 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 19:02 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1016.eqiad.wmnet'] * 18:56 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet'] * 18:51 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:51 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:47 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:46 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:44 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:44 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:43 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:42 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:39 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:39 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:38 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:34 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:33 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:33 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 18:32 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:25 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:19 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART * 18:11 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:10 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1018.eqiad.wmnet'] * 18:09 bking@cumin2002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:07 dancy@deploy2002: Installation of scap version "4.120.0" completed for 1 hosts * 18:07 bking@cumin2002: START - Cookbook sre.puppet.renew-cert for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002 * 18:06 dancy@deploy2002: Installing scap version "4.120.0" for 1 hosts * 18:04 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 17:00 Dreamy_Jazz: Ran `/usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/all.dblist extensions/WikimediaEvents/maintenance/UpdatePeriodicMetrics.php --verbose` * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage * 16:18 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 16:17 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:17 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill * 16:16 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:16 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production * 16:05 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 16:05 thcipriani@deploy2002: Finished scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] (duration: 07m 46s) * 16:00 thcipriani@deploy2002: thcipriani: Continuing with sync * 16:00 thcipriani@deploy2002: thcipriani: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) * 15:57 thcipriani@deploy2002: Started scap sync-world: Backport for [[gerrit:1085597{{!}}Revert "Dummy commit for testing"]] * 15:55 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1020.eqiad.wmnet'] * 15:55 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 15:19 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2003.codfw.wmnet * 15:05 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2003.codfw.wmnet * 14:54 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:40 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye * 14:29 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye * 14:27 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host an-presto1020.eqiad.wmnet with OS bookworm * 14:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2190 gradually with 4 steps - Maint over * 13:55 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bookworm * 13:43 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:38 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:33 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool db2190 gradually with 4 steps - Maint over * 12:43 cmooney@cumin1002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1025.eqiad.wmnet * 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet * 12:42 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1025.eqiad.wmnet * 12:28 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti1025.eqiad.wmnet * 12:28 topranks: rebooting ganeti1025 as VMs are unresponsive and will not shutdown or move * 10:38 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' . * off: sudo cumin -b4 "A:cp and A:magru" "run-puppet-agent" to pick up CR {{Gerrit|1085569}} * 02:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance * 02:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70840 and previous config saved to /var/cache/conftool/dbconfig/20241101-022447-ladsgroup.json * 02:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70839 and previous config saved to /var/cache/conftool/dbconfig/20241101-020940-ladsgroup.json * 01:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1019.eqiad.wmnet with OS bullseye * 01:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70838 and previous config saved to /var/cache/conftool/dbconfig/20241101-015433-ladsgroup.json * 01:42 urandom: Decommissioning Cassandra/aqs1013-<nowiki>{</nowiki>a,b<nowiki>}</nowiki> — [[phab:T378725|T378725]] * 01:41 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:40 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — [[phab:T378725|T378725]] * 01:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70837 and previous config saved to /var/cache/conftool/dbconfig/20241101-013926-ladsgroup.json * 01:39 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs1022.eqiad.wmnet * 01:39 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for aqs1022.eqiad.wmnet * 01:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2195 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70836 and previous config saved to /var/cache/conftool/dbconfig/20241101-013102-ladsgroup.json * 01:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance * 01:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70835 and previous config saved to /var/cache/conftool/dbconfig/20241101-013035-ladsgroup.json * 01:25 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage * 01:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70834 and previous config saved to /var/cache/conftool/dbconfig/20241101-011528-ladsgroup.json * 01:07 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1019.eqiad.wmnet with OS bullseye * 01:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70833 and previous config saved to /var/cache/conftool/dbconfig/20241101-010021-ladsgroup.json * 00:54 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:54 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1019.eqiad.wmnet'] * 00:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70832 and previous config saved to /var/cache/conftool/dbconfig/20241101-004514-ladsgroup.json * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2181 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70831 and previous config saved to /var/cache/conftool/dbconfig/20241101-003546-ladsgroup.json * 00:35 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance * 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 ([[phab:T376905|T376905]])', diff saved to https://phabricator.wikimedia.org/P70830 and previous config saved to /var/cache/conftool/dbconfig/20241101-003520-ladsgroup.json * 00:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70829 and previous config saved to /var/cache/conftool/dbconfig/20241101-002013-ladsgroup.json * 00:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70828 and previous config saved to /var/cache/conftool/dbconfig/20241101-000506-ladsgroup.json <noinclude> ==Other archives == {{:Server Admin Log/Archives}} [[Category:SAL]] [[Category:Operations]] </noinclude> oowbrzkr8hcxe7ltjs69thzo1me8r7y Nova Resource:Tools/SAL/Archive 5 498 456245 2249717 2024-12-01T00:34:33Z JrandWP 37706 archive 5 (2022-2023) 2249717 wikitext text/x-wiki === 2023-12-30 === * 12:43 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.grid.cleanup_queue_errors (exit_code=0) * 12:43 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.grid.cleanup_queue_errors === 2023-12-29 === * 21:39 andrewbogott: rebooting tools-sgeweblight-10-28.tools.eqiad1.wikimedia.cloud because previous reset didn't get the queue out of error state * 19:31 andrewbogott: restarting sge_execd on tools-sgeweblight-10-28.tools.eqiad1.wikimedia.cloud in response to error state alert === 2023-12-28 === * 21:03 andrewbogott: "docker-compose restart" on tools-harbor-1 * 19:18 andrewbogott: rebooting tools-harbor-1.tools.eqiad1.wikimedia.cloud, unresponsive === 2023-12-23 === * 18:24 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 18:24 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api === 2023-12-21 === * 15:48 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_grid_node for tools-sgeexec-10-16 === 2023-12-20 === * 11:22 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_grid_node for tools-sgeexec-10-14, tools-sgeexec-10-15, tools-sgeweblight-10-18, tools-sgeweblight-10-24 * 10:01 taavi: rebooting tools-sgeweblight-10-18, -24, -25, to get rid of a large number of jobs in deleting status === 2023-12-19 === * 15:39 dhinus: restarting toolsdb to apply a config change [[phab:T353093|T353093]] * 13:18 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 13:17 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api === 2023-12-18 === * 16:15 taavi: reboot tools-sgeexec-10-15, -23 due to stuck NFS processes * 14:43 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 14:42 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 14:40 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 14:39 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers === 2023-12-16 === * 22:01 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.grid.cleanup_queue_errors (exit_code=0) * 22:01 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.grid.cleanup_queue_errors * 20:54 bd808: Rebuilding all containers to pick up lighttpd config fix and normal package updates ([[phab:T293552|T293552]]) * 08:14 dhinus: restarting toolsdb with jemalloc * 05:32 andrewbogott: restarting mariadb on toolsdb-1 because it's just about to go oom (or possibly just did) * 00:21 dhinus: restarting toolsdb again as it's again low in free mem [[phab:T353093|T353093]] === 2023-12-15 === * 20:26 andrewbogott: restarting toolsdb to avoid upcoming oom crash * 16:49 dhinus: restarting toolsdb before it's about to go OOM, enabling performance_schema for debugging * 14:40 dcaro: deploy toolforge-builds-cli 0.0.10 ([[phab:T341067|T341067]]) * 13:33 wm-bot2: dcaro@urcuchillay END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api ([[phab:T341067|T341067]]) * 13:32 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api ([[phab:T341067|T341067]]) * 12:19 dhinus: restarting toolsdb again to apply a config fix [[phab:T353093|T353093]] * 10:48 dhinus: restarting toolsdb to apply new config [[phab:T353093|T353093]] === 2023-12-14 === * 23:02 andrewbogott: rebooting tools-db-1 yet again * 17:42 taavi: reboot tools-sgewebgen-10-3 * 02:20 andrewbogott: restarting tools-db-1, oomkiller killed mariadb === 2023-12-13 === * 19:53 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for all workers * 18:53 andrewbogott: rebooting tools-nfs-2 server to resolve weird file locking issues * 16:23 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.scale_grid_exec * 14:23 wm-bot2: dcaro@urcuchillay END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder ([[phab:T352774|T352774]]) * 14:22 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder ([[phab:T352774|T352774]]) * 14:22 wm-bot2: dcaro@urcuchillay END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api ([[phab:T352774|T352774]]) * 14:22 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api ([[phab:T352774|T352774]]) * 13:54 dcaro: deploy toolforge-builds-cli version 0.0.9 (with envvars support) * 13:32 wm-bot2: dcaro@urcuchillay END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api ([[phab:T338142|T338142]]) * 13:31 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api ([[phab:T338142|T338142]]) * 11:44 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.grid.cleanup_queue_errors (exit_code=0) * 11:44 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.grid.cleanup_queue_errors * 11:17 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_grid_node for tools-sgeweblight-10-16 * 10:48 wm-bot2: dcaro@urcuchillay END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-admission ([[phab:T338142|T338142]]) * 10:48 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-admission ([[phab:T338142|T338142]]) * 09:49 wm-bot2: dcaro@urcuchillay END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api * 09:49 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api === 2023-12-12 === * 17:32 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster * 17:29 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_grid_node for tools-sgeexec-10-18 * 17:27 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_grid_node for tools-sgeexec-10-19 * 17:24 taavi: reboot tools-sgeexec-10-14 * 15:58 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster * 15:54 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_grid_node for tools-sgeweblight-10-14, tools-sgeexec-10-8 * 15:51 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker role in the tools cluster * 15:51 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster * 15:36 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster * 13:16 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 13:15 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 12:17 wm-bot2: dcaro@urcuchillay END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api ([[phab:T352774|T352774]]) * 12:16 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api ([[phab:T352774|T352774]]) === 2023-12-11 === * 15:36 wm-bot2: dcaro@urcuchillay END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api ([[phab:T352774|T352774]]) * 15:36 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api ([[phab:T352774|T352774]]) * 13:43 wm-bot2: dcaro@urcuchillay END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api ([[phab:T352774|T352774]]) * 13:42 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api ([[phab:T352774|T352774]]) * 13:29 wm-bot2: dcaro@urcuchillay END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-admission ([[phab:T352774|T352774]]) * 13:28 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-admission ([[phab:T352774|T352774]]) === 2023-12-09 === * 16:45 dcaro: set toolsdb back as read-write * 16:35 andrewbogott: rebooting tools-db-1.tools.eqiad1.wikimedia.cloud yet again * 07:23 dcaro: set toolsdb back as read-write * 00:54 taavi: set toolsdb back as read-write === 2023-12-08 === * 11:03 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 11:03 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api === 2023-12-07 === * 04:42 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_grid_node for tools-sgeweblight-10-26 === 2023-12-05 === * 21:09 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.grid.cleanup_queue_errors (exit_code=0) * 21:09 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.grid.cleanup_queue_errors * 19:16 andrewbogott: rebooting tools-sgeweblight-10-26.tools.eqiad1.wikimedia.cloud; can't log in even with root key * 11:25 wm-bot2: dcaro@urcuchillay END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 11:21 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.openstack.cloudvirt.vm_console * 11:20 wm-bot2: dcaro@urcuchillay END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 11:20 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.openstack.cloudvirt.vm_console * 11:20 wm-bot2: dcaro@urcuchillay END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=255) * 11:20 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.openstack.cloudvirt.vm_console * 11:20 wm-bot2: dcaro@urcuchillay END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=255) * 11:20 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.openstack.cloudvirt.vm_console * 11:15 wm-bot2: dcaro@urcuchillay END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=255) * 11:15 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.openstack.cloudvirt.vm_console * 11:01 dcaro: rebooting tools-sgeweblight-10-25 due to memory allocation issue ([[phab:T352753|T352753]]) * 04:51 andrewbogott: rebooting tools-sgeweblight-10-27, tools-sgeweblight-10-17 and tools-sgeweblight-10-30; their filesystems seem locked up and I suspect NFS somehow === 2023-12-04 === * 09:15 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 09:15 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api === 2023-12-02 === * 11:18 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster * 11:15 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_grid_node for tools-sgeweblight-10-22 * 11:06 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_grid_node for tools-sgeexec-10-13, tools-sgeweblight-10-20 * 10:50 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster * 10:39 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster * 00:08 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster * 00:08 taavi@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=97) for a worker role in the tools cluster * 00:07 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster * 00:05 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster * 00:04 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker role in the tools cluster * 00:04 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster * 00:02 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster === 2023-12-01 === * 23:55 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster * 23:52 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster * 23:50 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster * 22:51 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for all workers * 21:22 andrewbogott: rebooting tools-sgeweblight-10-[18,21,32].tools.eqiad1.wikimedia.cloud to recover from nfs lockup * 21:16 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for all workers * 15:49 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.grid.cleanup_queue_errors (exit_code=0) * 15:49 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.grid.cleanup_queue_errors === 2023-11-29 === * 23:11 bd808: Drained and hard rebooted tools-k8s-worker-40. K8s was showing inconsistent status of the node (offline per k8s-status tool, online per kubectl) * 22:35 bd808: Hard reboot of tools-k8s-worker-81 * 22:33 bd808: Soft reboot of tools-k8s-worker-81 * 22:26 bd808: Cordon, drain, and restart tools-k8s-worker-81. Instance appears to have pods from tools.cluebotng that are unresponsive to kubectl commands. === 2023-11-27 === * 14:46 andrewbogott: shuffling toolforge etcd nodes all over the place in order to reimage cloudvirtlocal hosts * 11:09 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 11:09 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers === 2023-11-23 === * 10:45 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 10:45 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api === 2023-11-22 === * 11:26 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 11:26 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 11:01 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 11:01 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 10:57 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers ([[phab:T350873|T350873]]) * 10:57 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers ([[phab:T350873|T350873]]) * 10:57 taavi: deploy maintain-kubeusers patch to manage quotas from the git config [[phab:T350873|T350873]] * 09:29 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 09:28 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api === 2023-11-21 === * 10:28 taavi: restart replication on tools-db-2 === 2023-11-20 === * 15:01 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 15:00 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 14:48 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 14:48 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 14:47 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 14:47 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 13:04 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 13:04 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 10:02 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.apt.copy_to_main_repo (exit_code=0) for package 'toolforge-cli' version '0.3.5' * 10:02 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.apt.copy_to_main_repo for package 'toolforge-cli' version '0.3.5' === 2023-11-17 === * 15:51 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.apt.copy_to_main_repo (exit_code=0) for package 'toolforge-builds-cli' version '0.0.5' * 15:50 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.apt.copy_to_main_repo for package 'toolforge-builds-cli' version '0.0.5' * 15:50 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 15:49 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api === 2023-11-16 === * 21:08 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for all workers * 19:54 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for all workers * 13:47 taavi: reboot tools-sgecron-2 with very high load average === 2023-11-14 === * 19:03 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 19:03 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 10:11 taavi: reboot unresponsive tools-sgeexec-10-22 === 2023-11-13 === * 22:21 taavi: reboot! tools-sgewebgen-10-3, tools-sgeweblight-10-21, tools-sgeweblight-10-32, tools-sgeexec-10-16 due to high load average and/or stuck jobs * 16:37 taavi: drain tools-k8s-worker-84 tools-k8s-worker-85 === 2023-11-09 === * 11:49 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 11:49 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers === 2023-11-07 === * 11:45 taavi: reboot tools-sgeexec-10-8 which had high load average === 2023-11-02 === * 13:13 taavi: wiping data directory from tools-prometheus-7 so we have least one working server [[phab:T350227|T350227]] === 2023-11-01 === * 14:19 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component wmcs-k8s-metrics * 14:19 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component wmcs-k8s-metrics * 09:06 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.grid.cleanup_queue_errors (exit_code=0) * 09:06 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.grid.cleanup_queue_errors * 08:47 taavi: restart puppetdb === 2023-10-30 === * 14:03 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component wmcs-k8s-metrics * 14:02 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component wmcs-k8s-metrics === 2023-10-29 === * 17:46 andrewbogott: running SET GLOBAL read_only=OFF; for mariadb on tools-db-1.tools.eqiad1.wikimedia.cloud * 17:37 andrewbogott: rebooting tools-db-1.tools.eqiad1.wikimedia.cloud to recover from the oom-killer firing === 2023-10-26 === * 08:29 taavi: root@tools-sgeweblight-10-21:~# sudo dpkg --configure -a * 08:18 taavi: restart sssd on tools-nfs-2 === 2023-10-25 === * 09:08 blancadesal: harbor up again and upgraded from 2.5 to 2.9 ([[phab:T346241|T346241]]) * 08:31 blancadesal: taking harbor down for upgrade ([[phab:T346241|T346241]]) === 2023-10-24 === * 16:02 taavi: reboot tools-sgeweblight-10-28 * 09:49 taavi: reboot tools-sgebastion-11 due to high load * 09:35 taavi: make ToolsDBWritableState alert paging, match icinga check removed in https://gerrit.wikimedia.org/r/c/operations/puppet/+/956071 === 2023-10-23 === * 15:40 wm-bot2: dcaro@urcuchillay END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 15:39 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 14:18 dcaro: release toolforge-builds-cli 0.0.4 * 08:22 taavi: reboot tools-sgeweblight-10-14, 24 [[phab:T349425|T349425]] === 2023-10-19 === * 12:48 taavi: flush queued webgrid jobs that had been waiting in the queue since the nfs issues last week === 2023-10-18 === * 12:21 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 12:21 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 12:05 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-70 from 1.22.17 to 1.23.17 * 12:03 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-88 from 1.22.17 to 1.23.17 * 12:03 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-70 from 1.22.17 to 1.23.17 * 12:03 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-69 from 1.22.17 to 1.23.17 * 12:03 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-50 from 1.22.17 to 1.23.17 * 12:02 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-88 from 1.22.17 to 1.23.17 * 12:02 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-87 from 1.22.17 to 1.23.17 * 12:02 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-69 from 1.22.17 to 1.23.17 * 12:02 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-68 from 1.22.17 to 1.23.17 * 12:01 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-50 from 1.22.17 to 1.23.17 * 12:01 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-49 from 1.22.17 to 1.23.17 * 12:01 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-87 from 1.22.17 to 1.23.17 * 12:01 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-86 from 1.22.17 to 1.23.17 * 12:00 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-68 from 1.22.17 to 1.23.17 * 12:00 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-67 from 1.22.17 to 1.23.17 * 11:59 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-49 from 1.22.17 to 1.23.17 * 11:59 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-48 from 1.22.17 to 1.23.17 * 11:59 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-86 from 1.22.17 to 1.23.17 * 11:59 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-85 from 1.22.17 to 1.23.17 * 11:58 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-67 from 1.22.17 to 1.23.17 * 11:58 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-66 from 1.22.17 to 1.23.17 * 11:58 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-48 from 1.22.17 to 1.23.17 * 11:58 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-47 from 1.22.17 to 1.23.17 * 11:58 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-85 from 1.22.17 to 1.23.17 * 11:58 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-84 from 1.22.17 to 1.23.17 * 11:57 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-66 from 1.22.17 to 1.23.17 * 11:57 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-65 from 1.22.17 to 1.23.17 * 11:56 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-47 from 1.22.17 to 1.23.17 * 11:56 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-46 from 1.22.17 to 1.23.17 * 11:56 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-84 from 1.22.17 to 1.23.17 * 11:56 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-83 from 1.22.17 to 1.23.17 * 11:56 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-ingress-6 from 1.22.17 to 1.23.17 * 11:55 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-65 from 1.22.17 to 1.23.17 * 11:55 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-64 from 1.22.17 to 1.23.17 * 11:55 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-83 from 1.22.17 to 1.23.17 * 11:55 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-82 from 1.22.17 to 1.23.17 * 11:55 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-46 from 1.22.17 to 1.23.17 * 11:55 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-45 from 1.22.17 to 1.23.17 * 11:55 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-ingress-6 from 1.22.17 to 1.23.17 * 11:54 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-64 from 1.22.17 to 1.23.17 * 11:54 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-62 from 1.22.17 to 1.23.17 * 11:54 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-ingress-5 from 1.22.17 to 1.23.17 * 11:54 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-82 from 1.22.17 to 1.23.17 * 11:54 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-81 from 1.22.17 to 1.23.17 * 11:54 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-45 from 1.22.17 to 1.23.17 * 11:53 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-44 from 1.22.17 to 1.23.17 * 11:53 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-ingress-5 from 1.22.17 to 1.23.17 * 11:52 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-81 from 1.22.17 to 1.23.17 * 11:52 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-80 from 1.22.17 to 1.23.17 * 11:52 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-62 from 1.22.17 to 1.23.17 * 11:52 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-61 from 1.22.17 to 1.23.17 * 11:52 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-44 from 1.22.17 to 1.23.17 * 11:52 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-43 from 1.22.17 to 1.23.17 * 11:51 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-ingress-4 from 1.22.17 to 1.23.17 * 11:51 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-80 from 1.22.17 to 1.23.17 * 11:51 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-79 from 1.22.17 to 1.23.17 * 11:51 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-61 from 1.22.17 to 1.23.17 * 11:51 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-60 from 1.22.17 to 1.23.17 * 11:51 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-43 from 1.22.17 to 1.23.17 * 11:51 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-42 from 1.22.17 to 1.23.17 * 11:50 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-ingress-4 from 1.22.17 to 1.23.17 * 11:49 dcaro: deploy toolforge-builds-cli 0.3.0 ([[phab:T348866|T348866]]) * 11:49 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-42 from 1.22.17 to 1.23.17 * 11:49 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-41 from 1.22.17 to 1.23.17 * 11:48 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-59 from 1.22.17 to 1.23.17 * 11:48 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-58 from 1.22.17 to 1.23.17 * 11:48 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-78 from 1.22.17 to 1.23.17 * 11:48 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-77 from 1.22.17 to 1.23.17 * 11:48 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-41 from 1.22.17 to 1.23.17 * 11:48 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-40 from 1.22.17 to 1.23.17 * 11:47 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-58 from 1.22.17 to 1.23.17 * 11:47 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-57 from 1.22.17 to 1.23.17 * 11:47 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-77 from 1.22.17 to 1.23.17 * 11:47 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-76 from 1.22.17 to 1.23.17 * 11:47 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-40 from 1.22.17 to 1.23.17 * 11:46 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-39 from 1.22.17 to 1.23.17 * 11:45 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-57 from 1.22.17 to 1.23.17 * 11:45 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-56 from 1.22.17 to 1.23.17 * 11:45 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-76 from 1.22.17 to 1.23.17 * 11:45 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-75 from 1.22.17 to 1.23.17 * 11:45 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-39 from 1.22.17 to 1.23.17 * 11:45 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-38 from 1.22.17 to 1.23.17 * 11:44 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-56 from 1.22.17 to 1.23.17 * 11:44 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-55 from 1.22.17 to 1.23.17 * 11:44 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-38 from 1.22.17 to 1.23.17 * 11:44 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-37 from 1.22.17 to 1.23.17 * 11:44 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-75 from 1.22.17 to 1.23.17 * 11:44 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-74 from 1.22.17 to 1.23.17 * 11:43 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-55 from 1.22.17 to 1.23.17 * 11:43 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-54 from 1.22.17 to 1.23.17 * 11:42 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-37 from 1.22.17 to 1.23.17 * 11:42 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-36 from 1.22.17 to 1.23.17 * 11:42 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-74 from 1.22.17 to 1.23.17 * 11:42 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-73 from 1.22.17 to 1.23.17 * 11:41 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-54 from 1.22.17 to 1.23.17 * 11:41 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-53 from 1.22.17 to 1.23.17 * 11:41 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-36 from 1.22.17 to 1.23.17 * 11:41 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-35 from 1.22.17 to 1.23.17 * 11:41 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-73 from 1.22.17 to 1.23.17 * 11:41 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-72 from 1.22.17 to 1.23.17 * 11:40 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-53 from 1.22.17 to 1.23.17 * 11:40 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-52 from 1.22.17 to 1.23.17 * 11:39 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-35 from 1.22.17 to 1.23.17 * 11:39 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-34 from 1.22.17 to 1.23.17 * 11:39 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-72 from 1.22.17 to 1.23.17 * 11:39 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-71 from 1.22.17 to 1.23.17 * 11:38 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-52 from 1.22.17 to 1.23.17 * 11:38 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-51 from 1.22.17 to 1.23.17 * 11:38 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-34 from 1.22.17 to 1.23.17 * 11:38 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-33 from 1.22.17 to 1.23.17 * 11:38 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-71 from 1.22.17 to 1.23.17 * 11:37 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-51 from 1.22.17 to 1.23.17 * 11:37 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-33 from 1.22.17 to 1.23.17 * 11:35 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-32 from 1.22.17 to 1.23.17 * 11:34 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-32 from 1.22.17 to 1.23.17 * 11:34 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-31 from 1.22.17 to 1.23.17 * 11:32 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-31 from 1.22.17 to 1.23.17 * 11:31 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-30 from 1.22.17 to 1.23.17 * 11:30 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-30 from 1.22.17 to 1.23.17 * 11:29 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-6 from 1.22.17 to 1.23.17 * 11:25 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-6 from 1.22.17 to 1.23.17 * 11:23 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-5 from 1.22.17 to 1.23.17 * 11:16 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-5 from 1.22.17 to 1.23.17 * 11:16 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-4 from 1.22.17 to 1.23.17 * 11:07 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-4 from 1.22.17 to 1.23.17 * 11:04 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.prepare_upgrade (exit_code=0) for cluster tools upgrade from 1.22.17 to 1.23.17 ([[phab:T298005|T298005]]) * 11:03 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.prepare_upgrade for cluster tools upgrade from 1.22.17 to 1.23.17 ([[phab:T298005|T298005]]) === 2023-10-16 === * 09:04 dcaro: rebooting tools-k8s-worker-45 due to stuck nfs processes === 2023-10-13 === * 13:29 wm-bot2: dcaro@urcuchillay END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 13:28 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 09:48 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 09:48 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 09:07 wm-bot2: dcaro@urcuchillay END (PASS) - Cookbook wmcs.toolforge.grid.cleanup_queue_errors (exit_code=0) * 09:07 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.toolforge.grid.cleanup_queue_errors * 08:57 dcaro: rebooting tools-sgeexec-10-8 as the host is stuck/unreachable * 07:43 dcaro: rebooting tools-sgeweblight-10-26 as it fails to allocate memory === 2023-10-12 === * 15:07 taavi: reboot tools-k8s-worker-70 * 14:01 taavi: deploy jobs-cli v15 [[phab:T348250|T348250]] * 13:10 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.apt.copy_to_main_repo (exit_code=0) for package 'toolforge-jobs-framework-cli' version '15' * 13:10 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.apt.copy_to_main_repo for package 'toolforge-jobs-framework-cli' version '15' * 12:21 dcaro: rebooting sgeexec-10-17 * 12:02 taavi: also reboot tools-sgeweblight-10-30 * 12:00 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.grid.cleanup_queue_errors (exit_code=0) * 12:00 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.grid.cleanup_queue_errors * 11:52 taavi: reboot tools-sgeweblight-10-22, 28 === 2023-10-11 === * 19:47 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for all workers * 17:10 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for all workers * 14:41 wm-bot2: dcaro@urcuchillay END (FAIL) - Cookbook wmcs.toolforge.grid.reboot_workers (exit_code=99) for weblight nodes ([[phab:T348634|T348634]]) * 14:24 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.toolforge.grid.reboot_workers for weblight nodes ([[phab:T348634|T348634]]) * 14:21 wm-bot2: dcaro@urcuchillay END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 14:20 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.openstack.cloudvirt.vm_console * 14:19 wm-bot2: dcaro@urcuchillay END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 14:19 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.openstack.cloudvirt.vm_console * 14:16 dcaro: rebooting tools-sgeweblight-10-16 due to stuck NFS ([[phab:T348634|T348634]]) * 12:11 taavi: reboot k8s workers 48, 60, 65, 68, 70, 76 [[phab:T348634|T348634]] * 12:04 taavi: reboot k8s workers 72, 75, 82 [[phab:T348634|T348634]] * 12:01 taavi: reboot tools-sgecron-2 [[phab:T348634|T348634]] * 11:49 taavi: reboot tools-sgeexec-10-19 === 2023-10-10 === * 08:30 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 08:30 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api === 2023-10-09 === * 10:29 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.apt.copy_to_main_repo (exit_code=0) for package 'python3-toolforge-weld' version '1.4.0' * 10:29 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.apt.copy_to_main_repo for package 'python3-toolforge-weld' version '1.4.0' * 08:15 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 08:15 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 07:14 taavi: deploy jobs-framework-cli v14 * 07:13 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.apt.copy_to_main_repo (exit_code=0) for package 'toolforge-jobs-framework-cli' version '14' * 07:13 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.apt.copy_to_main_repo for package 'toolforge-jobs-framework-cli' version '14' === 2023-10-05 === * 09:37 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.apt.copy_to_main_repo (exit_code=0) for package 'toolforge-jobs-framework-cli' version '13' * 09:37 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.apt.copy_to_main_repo for package 'toolforge-jobs-framework-cli' version '13' * 07:18 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 07:18 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder === 2023-10-04 === * 16:54 wm-bot2: dcaro@urcuchillay END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api * 16:54 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api * 16:20 wm-bot2: dcaro@urcuchillay END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 16:20 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 13:16 taavi: rollout toolforge-weld 1.3.0 * 13:08 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.apt.copy_to_main_repo (exit_code=0) for package 'python3-toolforge-weld' version '1.3.0' * 13:08 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.apt.copy_to_main_repo for package 'python3-toolforge-weld' version '1.3.0' * 13:05 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 13:05 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 07:40 wm-bot2: dcaro@urcuchillay END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api * 07:40 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api === 2023-10-03 === * 13:07 wm-bot2: dcaro@urcuchillay END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 13:07 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 12:10 wm-bot2: dcaro@urcuchillay END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 12:10 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 09:27 wm-bot2: dcaro@urcuchillay END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component ingress-admission * 09:26 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.component.deploy for component ingress-admission === 2023-10-02 === * 08:38 dcaro: rollout toolforge-cli 0.3.4 === 2023-10-01 === * 14:43 andrewbogott: rebooting tools-sgegrid-shadow because it's fussing about nfs === 2023-09-29 === * 10:48 wm-bot2: dcaro@urcuchillay END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for all workers ([[phab:T347665|T347665]]) * 10:20 wm-bot2: taavi@runko END (PASS) - Cookbook wmcs.toolforge.grid.reboot_workers (exit_code=0) for exec nodes * 10:14 wm-bot2: taavi@runko END (PASS) - Cookbook wmcs.toolforge.grid.reboot_workers (exit_code=0) for weblight nodes * 09:59 wm-bot2: taavi@runko START - Cookbook wmcs.toolforge.grid.reboot_workers for exec nodes * 09:58 wm-bot2: taavi@runko END (FAIL) - Cookbook wmcs.toolforge.grid.reboot_workers (exit_code=99) for exec nodes * 09:58 wm-bot2: taavi@runko START - Cookbook wmcs.toolforge.grid.reboot_workers for exec nodes * 09:57 wm-bot2: taavi@runko END (FAIL) - Cookbook wmcs.toolforge.grid.reboot_workers (exit_code=99) for exec nodes * 09:56 wm-bot2: taavi@runko START - Cookbook wmcs.toolforge.grid.reboot_workers for exec nodes * 09:55 wm-bot2: taavi@runko END (FAIL) - Cookbook wmcs.toolforge.grid.reboot_workers (exit_code=99) for exec nodes * 09:52 wm-bot2: taavi@runko END (PASS) - Cookbook wmcs.toolforge.grid.reboot_workers (exit_code=0) for webgen nodes * 09:51 wm-bot2: taavi@runko START - Cookbook wmcs.toolforge.grid.reboot_workers for exec nodes * 09:51 wm-bot2: taavi@runko END (FAIL) - Cookbook wmcs.toolforge.grid.reboot_workers (exit_code=99) for exec nodes * 09:15 wm-bot2: dcaro@urcuchillay END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-76 ([[phab:T347665|T347665]]) * 09:06 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-76 ([[phab:T347665|T347665]]) * 09:06 wm-bot2: dcaro@urcuchillay END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-72 ([[phab:T347665|T347665]]) * 09:04 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-72 ([[phab:T347665|T347665]]) === 2023-09-27 === * 12:33 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component image-config * 12:32 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component image-config === 2023-09-26 === * 00:07 andrewbogott: rebooting tools-puppetdb-1 in case that straightens out the puppet failures === 2023-09-25 === * 09:39 dcaro: deploying builds-builder 0.0.71 * 07:18 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component ingress-nginx * 07:18 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component ingress-nginx === 2023-09-22 === * 10:17 taavi: reboot tools-prometheus-6 * 10:17 wm-bot2: dcaro@urcuchillay END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 09:32 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.openstack.cloudvirt.vm_console === 2023-09-21 === * 16:16 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.create_instance_with_prefix (exit_code=0) with prefix 'tools-db' ([[phab:T344717|T344717]]) * 16:03 fnegri@cloudcumin1001: START - Cookbook wmcs.vps.create_instance_with_prefix with prefix 'tools-db' ([[phab:T344717|T344717]]) * 16:02 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) * 16:02 fnegri@cloudcumin1001: START - Cookbook wmcs.openstack.quota_increase * 15:46 fnegri@cloudcumin1001: END (FAIL) - Cookbook wmcs.vps.create_instance_with_prefix (exit_code=99) with prefix 'tools-db' ([[phab:T344717|T344717]]) * 15:45 fnegri@cloudcumin1001: START - Cookbook wmcs.vps.create_instance_with_prefix with prefix 'tools-db' ([[phab:T344717|T344717]]) * 15:24 fnegri@cloudcumin1001: END (FAIL) - Cookbook wmcs.vps.create_instance_with_prefix (exit_code=99) with prefix 'tools-db' ([[phab:T344717|T344717]]) * 15:23 fnegri@cloudcumin1001: START - Cookbook wmcs.vps.create_instance_with_prefix with prefix 'tools-db' ([[phab:T344717|T344717]]) === 2023-09-20 === * 19:55 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.apt.copy_to_main_repo (exit_code=0) for package 'toolforge-webservice' version '0.103' * 19:54 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.apt.copy_to_main_repo for package 'toolforge-webservice' version '0.103' * 11:04 taavi: deploying toolforge-webservice 0.102 * 11:01 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.apt.copy_to_main_repo (exit_code=0) for package 'toolforge-webservice' version '0.102' * 11:01 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.apt.copy_to_main_repo for package 'toolforge-webservice' version '0.102' * 06:34 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component ingress-nginx * 06:34 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component ingress-nginx * 06:20 taavi: reboot tools-sgebastion-11 due to stuck NFS handles === 2023-09-19 === * 15:12 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component ingress-nginx * 15:12 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component ingress-nginx * 14:53 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component volume-admission * 14:53 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component volume-admission * 09:54 wm-bot2: dcaro@urcuchillay END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 09:54 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.openstack.cloudvirt.vm_console * 09:51 wm-bot2: dcaro@urcuchillay END (FAIL) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=99) * 09:51 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.openstack.cloudvirt.vm_console === 2023-09-18 === * 10:41 dhinus: restarted stuck pod (webservice stop+start) [[phab:T346126|T346126]] * 07:37 wm-bot2: dcaro@urcuchillay END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-64 ([[phab:T346123|T346123]]) * 07:35 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-64 ([[phab:T346123|T346123]]) === 2023-09-17 === * 18:12 taavi: reboot tools-sgeexec-10-22 === 2023-09-15 === * 12:32 wm-bot2: dcaro@urcuchillay END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 12:31 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 12:06 wm-bot2: dcaro@urcuchillay END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-65 ([[phab:T346123|T346123]]) * 11:58 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-65 ([[phab:T346123|T346123]]) * 11:55 wm-bot2: dcaro@urcuchillay END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-34 ([[phab:T346123|T346123]]) * 11:46 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-34 ([[phab:T346123|T346123]]) * 10:10 wm-bot2: dcaro@urcuchillay END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-52 ([[phab:T346123|T346123]]) * 10:02 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-52 ([[phab:T346123|T346123]]) * 10:01 wm-bot2: dcaro@urcuchillay END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-48 ([[phab:T346123|T346123]]) * 09:53 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-48 ([[phab:T346123|T346123]]) * 09:52 wm-bot2: dcaro@urcuchillay END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-75 ([[phab:T346123|T346123]]) * 09:43 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-75 ([[phab:T346123|T346123]]) * 09:28 dcaro: rebooting tools-sge-cron-2 ([[phab:T346123|T346123]]) * 09:21 wm-bot2: dcaro@urcuchillay END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-70 ([[phab:T346123|T346123]]) * 09:13 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-70 ([[phab:T346123|T346123]]) * 09:10 wm-bot2: dcaro@urcuchillay END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-worker-69 ([[phab:T346123|T346123]]) * 09:09 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-69 ([[phab:T346123|T346123]]) * 08:49 wm-bot2: dcaro@urcuchillay END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-worker-78 ([[phab:T346123|T346123]]) * 08:48 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-78 ([[phab:T346123|T346123]]) * 08:38 wm-bot2: dcaro@urcuchillay END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-76 ([[phab:T346126|T346126]]) * 08:36 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-76 ([[phab:T346126|T346126]]) === 2023-09-14 === * 16:11 dcaro: increasing secrets quota to 30 ([[phab:T339916|T339916]]) * 12:13 wm-bot2: dcaro@urcuchillay END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api * 12:13 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api * 12:07 wm-bot2: dcaro@urcuchillay END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-emailer * 12:06 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-emailer * 12:01 wm-bot2: dcaro@urcuchillay END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-admission * 12:00 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-admission * 10:12 dcaro: deploy bulids-api 0.0.96 * 09:18 wm-bot2: dcaro@urcuchillay END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component volume-admission * 09:17 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.component.deploy for component volume-admission * 08:10 wm-bot2: dcaro@urcuchillay END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component registry-admissionNone ([[phab:T341084|T341084]]) * 08:09 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.component.deploy for component registry-admissionNone ([[phab:T341084|T341084]]) === 2023-09-13 === * 17:14 wm-bot2: dcaro@urcuchillay END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusersNone ([[phab:T341084|T341084]]) * 17:13 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusersNone ([[phab:T341084|T341084]]) * 12:51 wm-bot2: dcaro@urcuchillay END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component maintain-kubeusersNone ([[phab:T341084|T341084]]) * 12:51 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusersNone ([[phab:T341084|T341084]]) * 12:41 wm-bot2: dcaro@urcuchillay END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component maintain-kubeusersNone ([[phab:T341084|T341084]]) * 12:41 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusersNone ([[phab:T341084|T341084]]) * 12:40 wm-bot2: dcaro@urcuchillay END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component maintain-kubeusersNone ([[phab:T341084|T341084]]) * 12:40 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusersNone ([[phab:T341084|T341084]]) * 10:41 wm-bot2: dcaro@urcuchillay END (ERROR) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=97) for component maintain-kubeusersNone ([[phab:T341084|T341084]]) * 10:41 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusersNone ([[phab:T341084|T341084]]) * 10:38 wm-bot2: dcaro@urcuchillay END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusersNone ([[phab:T341084|T341084]]) * 10:38 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusersNone ([[phab:T341084|T341084]]) * 10:35 wm-bot2: dcaro@urcuchillay END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusersNone ([[phab:T341084|T341084]]) * 10:34 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusersNone ([[phab:T341084|T341084]]) === 2023-09-12 === * 15:25 andrewbogott: rebooting tools-sgeweblight-10-26.tools.eqiad1.wikimedia.cloud, oom * 09:02 taavi: restart a bunch of sge nodes due to NFS lockups * 08:43 taavi: reboot tools-sgebastion-10 due to stuck NFS mounts === 2023-09-11 === * 12:34 dcaro: deploy kubernetes-metrics ([[phab:T341084|T341084]]) === 2023-09-05 === * 13:31 wm-bot2: dcaro@urcuchillay END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component registry-admissionNone ([[phab:T341462|T341462]]) * 13:31 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.component.deploy for component registry-admissionNone ([[phab:T341462|T341462]]) * 11:00 dhinus: restarting mariadb on toolsdb-2 (replica) to test slave_parallel_threads ([[phab:T345450|T345450]]) === 2023-09-01 === * 12:12 wm-bot2: dcaro@urcuchillay END (FAIL) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=99) * 12:12 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.openstack.cloudvirt.vm_console * 12:12 wm-bot2: dcaro@urcuchillay END (FAIL) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=99) * 12:12 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.openstack.cloudvirt.vm_console * 11:54 taavi: reboot unresponsible tools-sgeweblight-10-21 * 09:03 wm-bot2: dcaro@urcuchillay END (FAIL) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=99) * 09:03 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.openstack.cloudvirt.vm_console === 2023-08-31 === * 13:06 wm-bot2: dcaro@urcuchillay END (PASS) - Cookbook wmcs.toolforge.grid.cleanup_queue_errors (exit_code=0) * 13:05 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.toolforge.grid.cleanup_queue_errors * 12:52 wm-bot2: dcaro@urcuchillay END (PASS) - Cookbook wmcs.toolforge.grid.get_cluster_status (exit_code=0) * 12:51 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.toolforge.grid.get_cluster_status * 09:50 wm-bot2: deployed kubernetes component api-gateway ({{Gerrit|c0faf0f}}) ([[phab:T341462|T341462]]) - cookbook ran by dcaro@urcuchillay * 09:49 wm-bot2: deployed kubernetes component jobs-api ({{Gerrit|9c9bee0}}) ([[phab:T341462|T341462]]) - cookbook ran by dcaro@urcuchillay * 09:48 wm-bot2: deployed kubernetes component api-gateway ({{Gerrit|9c9bee0}}) ([[phab:T341462|T341462]]) - cookbook ran by dcaro@urcuchillay * 09:43 wm-bot2: deployed kubernetes component api-gateway ({{Gerrit|485046b}}) ([[phab:T341462|T341462]]) - cookbook ran by dcaro@urcuchillay * 09:41 wm-bot2: deployed kubernetes component api-gateway ({{Gerrit|c0faf0f}}) ([[phab:T341462|T341462]]) - cookbook ran by dcaro@urcuchillay === 2023-08-30 === * 10:06 dcaro: upgrade toolforge-weld to 1.2.1 ([[phab:T344155|T344155]]) * 08:59 dcaro: restarting harbor to flush caches ([[phab:T344435|T344435]]) * 08:43 dcaro: cleaning up empty harbor projects ([[phab:T344435|T344435]]) === 2023-08-29 === * 14:17 wm-bot2: deployed kubernetes component jobs-api ({{Gerrit|485046b}}) ([[phab:T341462|T341462]]) - cookbook ran by dcaro@urcuchillay * 13:06 wm-bot2: deployed kubernetes component jobs-emailer ({{Gerrit|6f9c8cf}}) - cookbook ran by taavi@runko === 2023-08-28 === * 14:58 wm-bot2: deployed kubernetes component envvars-api ({{Gerrit|90055b5}}) ([[phab:T344502|T344502]]) - cookbook ran by dcaro@urcuchillay === 2023-08-25 === * 02:00 bd808: Reboot of login.toolforge.org hung until a hard reboot was triggered via horizon * 01:51 bd808: Scheduled reboot of login.toolforge.org for 2023-08-25 01:56:08 UTC === 2023-08-22 === * 15:27 taavi: fix broken k8s config files [[phab:T344289|T344289]]#9110359 * 14:31 wm-bot2: deployed kubernetes component https://gerrit.wikimedia.org/r/labs/tools/maintain-kubeusers ({{Gerrit|27328a4}}) ([[phab:T344668|T344668]]) - cookbook ran by taavi@runko * 14:17 wm-bot2: build & push docker image docker-registry.tools.wmflabs.org/maintain-kubeusers:eaeb46b from https://gerrit.wikimedia.org/r/labs/tools/maintain-kubeusers ({{Gerrit|eaeb46b}}) ([[phab:T344668|T344668]]) - cookbook ran by taavi@runko === 2023-08-18 === * 13:46 wm-bot2: deployed kubernetes component envvars-api ({{Gerrit|06c26be}}) ([[phab:T341462|T341462]]) - cookbook ran by dcaro@urcuchillay * 12:55 taavi: reboot frozen tools-sgebastion-10 * 12:48 wm-bot2: deployed kubernetes component builds-api ({{Gerrit|727e6a7}}) ([[phab:T341462|T341462]]) - cookbook ran by dcaro@urcuchillay === 2023-08-17 === * 12:19 dcaro: deploy builds-api builds-api-0.0.85-20230817105952-{{Gerrit|25c2b55f}} === 2023-08-15 === * 23:29 bd808: Rebooted tools-db-1.tools.eqiad1.wikimedia.cloud for [[phab:T344298|T344298]] === 2023-07-26 === * 09:30 wm-bot2: deployed kubernetes component image-config ({{Gerrit|06066ba}}) - cookbook ran by taavi@runko === 2023-07-25 === * 13:03 wm-bot2: deployed kubernetes component image-config ({{Gerrit|0eb287a}}) - cookbook ran by taavi@runko * 13:03 taavi: add php8.2 image [[phab:T335352|T335352]] [[phab:T335507|T335507]] === 2023-07-24 === * 21:45 bd808: Rebuilding container images for refactored config and new PHP 8.2 image ([[phab:T335352|T335352]]) * 17:31 taavi: hard reboot tools-harbor-1, unresponsible === 2023-07-23 === * 14:17 taavi: hard reboot tools-sgeexec-10-15 === 2023-07-20 === * 15:19 arturo: deploying https://gitlab.wikimedia.org/repos/cloud/toolforge/buildservice/-/merge_requests/6 again with newer image ([[phab:T342338|T342338]], [[phab:T321188|T321188]]) * 13:09 wm-bot2: updating docker-registry.tools.wmflabs.org/toolforge-distroless-base-debug:latest ([[phab:T321188|T321188]]) - cookbook ran by arturo@nostromo * 11:27 wm-bot2: updating docker-registry.tools.wmflabs.org/toolforge-distroless-base:debug ([[phab:T321188|T321188]]) - cookbook ran by arturo@endurance * 11:25 wm-bot2: updating docker-registry.tools.wmflabs.org/toolforge-distroless-base:latest ([[phab:T321188|T321188]]) - cookbook ran by arturo@endurance === 2023-07-19 === * 16:34 wm-bot2: updating docker-registry.tools.wmflabs.org/toolforge-distroless-base@sha256:77051c1e40d180d0695b5a9ba7a15161ecac7220ea8c1ed6721bd1c8329b1b2f ([[phab:T321188|T321188]]) - cookbook ran by arturo@nostromo * 16:30 wm-bot2: updating docker-registry.tools.wmflabs.org/toolforge-distroless-base@sha256:eebb155bd1116e3b67e2ce43244f9c9958df0cbb75a84c231565fae2ed87c9f4 ([[phab:T321188|T321188]]) - cookbook ran by arturo@nostromo * 16:05 wm-bot2: updating docker-registry.tools.wmflabs.org/toolforge-distroless-base@sha256:c11cf17ee8a54dd3a44908ed3f38ffbfb41f1c8c6a2264de9b3e2f5ef4576006 ([[phab:T321188|T321188]]) - cookbook ran by arturo@nostromo * 15:38 arturo: root@tools-docker-registry-05:~# docker-registry garbage-collect /etc/docker/registry/config.yml ([[phab:T321188|T321188]]) * 15:37 arturo: root@tools-docker-registry-05:~# curl -sS -X DELETE localhost:5000/v2/toolforge-distroless-base/manifests/sha256:2d4d28e45bbe4e38177fd4fdc922dbfaf95e607b06bbc4187a90410d895b4491 ([[phab:T321188|T321188]]) * 15:09 arturo: try to rescue docker-registry.tools.wmflabs.org/toolforge-distroless-base@sha256:eebb155bd1116e3b67e2ce43244f9c9958df0cbb75a84c231565fae2ed87c9f4 back into the registry from a k8s worker local cache ([[phab:T321188|T321188]]) === 2023-07-18 === * 11:02 arturo: redeploy jobs-emailer 0.0.41-20230718103342-{{Gerrit|3dddcfb8}} into k8s ([[phab:T341084|T341084]]) === 2023-07-14 === * 22:45 taavi: reboot tools-sgebastion-11 (dev.toolforge.org) to recover from stuck NFS client causing a high load average * 09:48 dcaro: deploy builds-api 0.0.78, ci rebuild === 2023-07-13 === * 14:40 wm-bot2: deployed kubernetes component https://gerrit.wikimedia.org/r/labs/tools/maintain-kubeusers ({{Gerrit|75db740}}) - cookbook ran by taavi@runko * 14:30 wm-bot2: build & push docker image docker-registry.tools.wmflabs.org/maintain-kubeusers:87c3616 from https://gerrit.wikimedia.org/r/labs/tools/maintain-kubeusers ({{Gerrit|87c3616}}) - cookbook ran by taavi@runko * 08:45 dcaro: rebooting tools-sgeexec-10-22 due to nfs lockup === 2023-07-12 === * 12:46 arturo: deployed builds-admission 0.0.63-20230712120152-{{Gerrit|2ef80a7c}} ([[phab:T341084|T341084]]) * 10:06 dcaro: deployed api-gateway 0.0.16, no changes, ci rebuild ([[phab:T341084|T341084]]) === 2023-07-11 === * 10:14 dcaro: deploy ingress-admission 0.0.38, ci rebuild ([[phab:T341084|T341084]]) === 2023-07-10 === * 20:39 taavi: freeing up disk space usage on tools docker-registry with `taavi@tools-docker-registry-05:~$ sudo sudo -u docker-registry docker-registry garbage-collect /etc/docker/registry/config.yml --delete-untagged` * 13:01 dcaro: deploy envvars-api 0.0.22 ([[phab:T341462|T341462]]) * 09:27 dcaro: deploying calico-0.0.6-20230710081103-{{Gerrit|dcbbe692}}, just a rebuild ([[phab:T341084|T341084]]) === 2023-07-09 === * 13:26 wm-bot2: cleaned up grid queue errors on tools-sgegrid-master - cookbook ran by taavi@runko === 2023-07-05 === * 16:32 dcaro: deploy image-config 0.0.14 (no real changes, just ci rebuild) * 07:39 taavi: deploying jobs-api 0.0.213-20230705073411-{{Gerrit|09895639}} === 2023-07-04 === * 17:06 taavi: deploy tools-webservice 0.101 for [[phab:T341088|T341088]] * 16:38 dcaro: deploy volume-admission 0.0.40 (no real changes, just ci rebuild) * 11:44 dcaro: deploy jobs-api 0.0.212 === 2023-07-03 === * 19:09 wm-bot2: deployed kubernetes component https://gitlab.wikimedia.org/repos/cloud/toolforge/image-config.git ({{Gerrit|561b4d9}}) - cookbook ran by taavi@runko * 13:49 dcaro: deploy envvars-api 0.0.21 (no real changes, ci rebuild) * 13:29 dcaro: deploy builds-api 0.0.75 (no real changes, just ci rebuild) * 13:17 dcaro: deploy envvars-admission 0.0.8 * 12:17 wm-bot2: Copied Apt package python3-toolforge-weld 1.1.1 to the tools Apt repo on bookworm, bullseye, buster - cookbook ran by taavi@runko * 12:16 wm-bot2: Copied Apt package python-toolforge-weld 1.1.1 to the tools Apt repo on - cookbook ran by taavi@runko * 12:12 taavi: deploy jobs-api 0.1.5 * 12:01 dcaro: deploy builds-api 0.0.74 * 09:24 dcaro: deploy envvars-api 0.0.20 === 2023-06-30 === * 18:21 taavi: deploy new jobs-api release to fix [[phab:T340829|T340829]] === 2023-06-29 === * 10:19 dcaro: deploy toolforge-cli 0.3.2 === 2023-06-27 === * 16:48 taavi: building initial set of bookworm based images: node18, ruby31, python311 ([[phab:T335507|T335507]]) * 09:01 wm-bot2: cleaned up grid queue errors on tools-sgegrid-master - cookbook ran by arturo@endurance * 08:54 arturo: force-reboot tools-sgeexec-10-15 (unresponsive) === 2023-06-23 === * 15:42 dcaro: deploy builds-api 0.3.2 ([[phab:T337025|T337025]]) === 2023-06-22 === * 11:57 taavi: update toolforge-jobs-framework-cli to 12 * 09:57 dcaro: deploy builds-api 0.3.1 * 09:32 dcaro: deploy builds-api 0.3.0 === 2023-06-21 === * 11:57 dcaro: deploy bulids-api 0.2.0 ([[phab:T337025|T337025]]) === 2023-06-20 === * 14:21 taavi: fix gitlab merge settings for tools-webservice to match the agreed values (fast-forward, squash encouraged) * 12:11 dcaro: deploy toolforge-envvars-cli (upgrades pthyon3-toolforge-weld) ([[phab:T337538|T337538]]) * 12:04 dcaro: deployed api-gateway with envvars endpoint support ([[phab:T337538|T337538]]) * 11:59 dcaro: deploy buildservice with aptfile support ([[phab:T336669|T336669]]) === 2023-06-16 === * 16:26 andrewbogott: restarting apache2 on toolserver-proxy-01.tools.eqiad1.wikimedia.cloud in hopes of stopping a flapping alert * 08:15 dcaro: deployed latest builds-api 0.1.0 === 2023-06-15 === * 14:05 wm-bot2: cleaned up grid queue errors on tools-sgegrid-master - cookbook ran by andrew@bullseye === 2023-06-13 === * 14:27 dcaro: rebooted tools-harbor-1 as it was not responding === 2023-06-12 === * 09:03 wm-bot2: cleaned up grid queue errors on tools-sgegrid-master - cookbook ran by arturo@nostromo === 2023-06-09 === * 19:57 andrewbogott: rebooting tools-sgeweblight-10-18 to see if it helps with [[phab:T338644|T338644]] * 19:38 andrewbogott: rebooting tools-sgeweblight-10-28 for [[phab:T337806|T337806]] === 2023-06-08 === * 20:21 bd808: Rebuilding container images ([[phab:T337897|T337897]]) * 14:16 dcaro: restart tools-sgeweblight-10-17.tools.eqiad1.wikimedia.cloud due to nfs hiccup * 14:07 dcaro: restarting the tools-sgeexec-10-17 node due to nfs hiccup * 14:00 dcaro: restarting the tools-sgegrid-master node due to nfs hiccup * 12:00 dcaro: powering off tools-k8s-etcd-18 ([[phab:T334644|T334644]]) * 07:18 wm-bot2: deployed kubernetes component https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|24e7828}}) - cookbook ran by taavi@runko === 2023-06-07 === * 12:45 wm-bot2: build & push docker image docker-registry.tools.wmflabs.org/toolforge-jobs-framework-api:a5eb7dc from https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|a5eb7dc}}) - cookbook ran by taavi@runko === 2023-06-05 === * 07:53 wm-bot2: cleaned up grid queue errors on tools-sgegrid-master - cookbook ran by dcaro@vulcanus === 2023-06-01 === * 10:07 wm-bot2: deployed kubernetes component https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-builds-api ({{Gerrit|7e57832}}) ([[phab:T337218|T337218]]) - cookbook ran by dcaro@vulcanus * 09:21 wm-bot2: deployed kubernetes component https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-builds-api ({{Gerrit|0f4076a}}) ([[phab:T336130|T336130]]) - cookbook ran by dcaro@vulcanus * 09:18 wm-bot2: deployed kubernetes component https://gitlab.wikimedia.org/repos/cloud/toolforge/buildpack-admission-controller ({{Gerrit|ef7f103}}) ([[phab:T336130|T336130]]) - cookbook ran by dcaro@vulcanus * 07:52 dcaro: rebooted tools-package-builder-04 (stuck not letting me log in with my user) === 2023-05-31 === * 02:38 andrewbogott: rebooted tools-sgeweblight-10-16, [[phab:T337806|T337806]] === 2023-05-30 === * 00:22 andrewbogott: rebooted tools-sgeweblight-10-30, oom * 00:16 andrewbogott: rebooted tools-sgeweblight-10-24, seems to be oom === 2023-05-26 === * 13:13 wm-bot2: deployed kubernetes component https://gitlab.wikimedia.org/repos/cloud/toolforge/buildpack-admission-controller ({{Gerrit|ef7f103}}) ([[phab:T337218|T337218]]) - cookbook ran by dcaro@vulcanus * 12:59 dcaro: rebooting tools-sgeexec-10-16.tools.eqiad1.wikimedia.cloud for stale NFS handles (D processes) === 2023-05-24 === * 12:28 dcaro: deploy latest buildservice ([[phab:T335865|T335865]]) * 12:28 dcaro: deploy latest buildservice ([[phab:T336050|T336050]]) === 2023-05-23 === * 14:40 wm-bot2: deployed kubernetes component https://github.com/toolforge/buildpack-admission-controller ({{Gerrit|0c7b25b}}) - cookbook ran by fran@wmf3169 === 2023-05-22 === * 10:06 arturo: hard-reboot tools-sgeexec-10-18 (monitoring reporting it as down) === 2023-05-19 === * 13:38 arturo: uncordon tools-k8s-worker-47/48/64/75 * 08:46 bd808: Building new perl532-sssd/<nowiki>{</nowiki>base,web<nowiki>}</nowiki> images ([[phab:T323522|T323522]], [[phab:T320904|T320904]]) === 2023-05-17 === * 16:05 dcaro: release toolforge-cli 0.3.0 ([[phab:T336225|T336225]]) * 12:48 wm-bot2: deployed kubernetes component https://gitlab.wikimedia.org/repos/cloud/toolforge/api-gateway ({{Gerrit|fa8ed2c}}) ([[phab:T336225|T336225]]) - cookbook ran by dcaro@vulcanus * 12:48 wm-bot2: rebooted k8s node tools-k8s-worker-71 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus * 12:45 wm-bot2: deployed kubernetes component https://gitlab.wikimedia.org/repos/cloud/toolforge/api-gateway ({{Gerrit|d1bb238}}) ([[phab:T336225|T336225]]) - cookbook ran by dcaro@vulcanus * 12:43 wm-bot2: deployed kubernetes component https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-builds-api ({{Gerrit|8d21314}}) - cookbook ran by dcaro@vulcanus * 10:54 wm-bot2: build & push docker image docker-registry.tools.wmflabs.org/toolforge-buildpack-admission-controller:7199a9e from https://github.com/toolforge/buildpack-admission-controller ({{Gerrit|7199a9e}}) - cookbook ran by fran@wmf3169 * 08:49 wm-bot2: rebooted k8s node tools-k8s-worker-55 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus * 08:33 wm-bot2: rebooted k8s node tools-k8s-worker-64 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus * 08:32 wm-bot2: rebooted k8s node tools-k8s-worker-75 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus * 08:25 wm-bot2: rebooted k8s node tools-k8s-worker-74 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus * 08:17 wm-bot2: rebooted k8s node tools-k8s-worker-61 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus * 08:10 wm-bot2: rebooted k8s node tools-k8s-worker-70 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus * 08:03 wm-bot2: rebooted k8s node tools-k8s-worker-66 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus * 07:54 wm-bot2: rebooted k8s node tools-k8s-worker-72 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus * 07:46 wm-bot2: rebooted k8s node tools-k8s-worker-47 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus * 07:45 wm-bot2: rebooted k8s node tools-k8s-worker-48 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus * 07:42 wm-bot2: rebooted k8s node tools-k8s-worker-69 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus * 07:29 wm-bot2: rebooted k8s node tools-k8s-worker-76 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus === 2023-05-16 === * 23:24 bd808: kubectl uncordon tools-k8s-worker-69 * 23:22 bd808: Force reboot tools-k8s-worker-69 via Horizon * 23:18 bd808: kubectl drain --ignore-daemonsets --delete-emptydir-data --force tools-k8s-worker-69 * 23:17 bd808: kubectl cordon tools-k8s-worker-69 * 14:37 wm-bot2: build & push docker image docker-registry.tools.wmflabs.org/builds-api:35b57c6 from https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-builds-api.git ({{Gerrit|35b57c6}}) - cookbook ran by dcaro@vulcanus * 13:05 wm-bot2: deployed kubernetes component https://gerrit.wikimedia.org/r/cloud/toolforge/volume-admission-controller ({{Gerrit|df52a39}}) ([[phab:T334081|T334081]]) - cookbook ran by dcaro@vulcanus * 12:54 wm-bot2: deployed kubernetes component https://gerrit.wikimedia.org/r/cloud/toolforge/volume-admission-controller ({{Gerrit|ad5b2b5}}) ([[phab:T334081|T334081]]) - cookbook ran by dcaro@vulcanus * 11:52 dcaro: release toolforge-weld 0.2.0 and toolforge-webservice 0.98 * 08:08 dcaro: reboot tools-mail-03 ([[phab:T316544|T316544]]) * 08:07 dcaro: reboot tools-sgebastion-10 ([[phab:T316544|T316544]]) === 2023-05-15 === * 22:50 bd808: Rebuilding bullseye and buster docker containers to pick up make package addition ([[phab:T320343|T320343]]) * 22:09 wm-bot2: rebooted k8s node tools-k8s-worker-66 ([[phab:T316544|T316544]]) - cookbook ran by andrew@bullseye * 22:07 wm-bot2: rebooted k8s node tools-k8s-worker-65 ([[phab:T316544|T316544]]) - cookbook ran by andrew@bullseye * 22:06 wm-bot2: rebooted k8s node tools-k8s-worker-64 ([[phab:T316544|T316544]]) - cookbook ran by andrew@bullseye * 22:04 wm-bot2: rebooted k8s node tools-k8s-worker-62 ([[phab:T316544|T316544]]) - cookbook ran by andrew@bullseye * 22:02 wm-bot2: rebooted k8s node tools-k8s-worker-61 ([[phab:T316544|T316544]]) - cookbook ran by andrew@bullseye * 21:58 wm-bot2: rebooted k8s node tools-k8s-worker-60 ([[phab:T316544|T316544]]) - cookbook ran by andrew@bullseye * 21:56 wm-bot2: rebooted k8s node tools-k8s-worker-59 ([[phab:T316544|T316544]]) - cookbook ran by andrew@bullseye * 21:54 wm-bot2: rebooted k8s node tools-k8s-worker-58 ([[phab:T316544|T316544]]) - cookbook ran by andrew@bullseye * 21:52 wm-bot2: rebooted k8s node tools-k8s-worker-57 ([[phab:T316544|T316544]]) - cookbook ran by andrew@bullseye * 21:51 wm-bot2: rebooted k8s node tools-k8s-worker-56 ([[phab:T316544|T316544]]) - cookbook ran by andrew@bullseye * 21:50 wm-bot2: rebooted k8s node tools-k8s-worker-55 ([[phab:T316544|T316544]]) - cookbook ran by andrew@bullseye * 21:49 wm-bot2: rebooted k8s node tools-k8s-worker-54 ([[phab:T316544|T316544]]) - cookbook ran by andrew@bullseye * 21:47 wm-bot2: rebooted k8s node tools-k8s-worker-53 ([[phab:T316544|T316544]]) - cookbook ran by andrew@bullseye * 21:44 wm-bot2: rebooted k8s node tools-k8s-worker-52 ([[phab:T316544|T316544]]) - cookbook ran by andrew@bullseye * 21:42 wm-bot2: rebooted k8s node tools-k8s-worker-51 ([[phab:T316544|T316544]]) - cookbook ran by andrew@bullseye * 21:41 wm-bot2: rebooted k8s node tools-k8s-worker-50 ([[phab:T316544|T316544]]) - cookbook ran by andrew@bullseye * 21:40 wm-bot2: rebooted k8s node tools-k8s-worker-49 ([[phab:T316544|T316544]]) - cookbook ran by andrew@bullseye * 21:38 wm-bot2: rebooted k8s node tools-k8s-worker-48 ([[phab:T316544|T316544]]) - cookbook ran by andrew@bullseye * 21:37 wm-bot2: rebooted k8s node tools-k8s-worker-47 ([[phab:T316544|T316544]]) - cookbook ran by andrew@bullseye * 21:33 wm-bot2: cleaned up grid queue errors on tools-sgegrid-master - cookbook ran by andrew@bullseye * 21:16 wm-bot2: rebooted k8s node tools-k8s-worker-45 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus * 21:15 wm-bot2: rebooted k8s node tools-k8s-worker-44 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus * 21:13 wm-bot2: rebooted k8s node tools-k8s-worker-43 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus * 21:12 wm-bot2: rebooted k8s node tools-k8s-worker-42 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus * 21:09 wm-bot2: rebooted k8s node tools-k8s-worker-41 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus * 21:03 wm-bot2: rebooted k8s node tools-k8s-worker-40 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus * 20:58 wm-bot2: rebooted k8s node tools-k8s-worker-39 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus * 20:52 wm-bot2: rebooted k8s node tools-k8s-worker-38 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus * 20:50 wm-bot2: rebooted k8s node tools-k8s-worker-37 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus * 20:49 wm-bot2: rebooted k8s node tools-k8s-worker-36 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus * 20:48 wm-bot2: rebooted k8s node tools-k8s-worker-35 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus * 20:47 wm-bot2: rebooted k8s node tools-k8s-worker-34 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus * 20:42 wm-bot2: rebooted k8s node tools-k8s-worker-33 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus * 20:41 andrewbogott: rebooting frozen VMs: tools-k8s-worker-65, tools-sgeweblight-10-27, tools-k8s-worker-45, tools-k8s-worker-36, tools-sgewebgen-10-3 (fallout from earlier nfs outage) * 20:36 wm-bot2: rebooted k8s node tools-k8s-worker-32 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus * 20:32 wm-bot2: rebooted k8s node tools-k8s-worker-31 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus * 20:24 wm-bot2: rebooted k8s node tools-k8s-worker-30 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus * 19:04 wm-bot2: rebooted k8s node tools-k8s-worker-67 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus * 18:56 wm-bot2: rebooted k8s node tools-k8s-worker-68 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus * 18:49 wm-bot2: rebooted k8s node tools-k8s-worker-69 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus * 18:46 bd808: Hard reboot tools-static-14 via Horizon per IRC report of unresponsive requests * 18:44 wm-bot2: rebooted k8s node tools-k8s-worker-70 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus * 18:42 wm-bot2: rebooted k8s node tools-k8s-worker-71 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus * 18:39 wm-bot2: rebooted k8s node tools-k8s-worker-72 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus * 18:34 wm-bot2: rebooted k8s node tools-k8s-worker-73 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus * 18:28 wm-bot2: rebooted k8s node tools-k8s-worker-74 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus * 18:22 wm-bot2: rebooted k8s node tools-k8s-worker-75 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus * 18:22 taavi: clear mail queue * 18:21 wm-bot2: rebooted k8s node tools-k8s-worker-76 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus * 18:15 wm-bot2: rebooted k8s node tools-k8s-worker-77 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus * 18:08 wm-bot2: rebooted k8s node tools-k8s-worker-80 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus * 18:06 wm-bot2: rebooted k8s node tools-k8s-worker-81 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus * 18:05 wm-bot2: rebooted k8s node tools-k8s-worker-82 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus * 17:57 wm-bot2: rebooted k8s node tools-k8s-worker-83 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus * 17:48 wm-bot2: rebooted k8s node tools-k8s-worker-84 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus * 17:47 wm-bot2: rebooted k8s node tools-k8s-worker-85 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus * 17:38 wm-bot2: rebooted k8s node tools-k8s-worker-86 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus * 17:37 wm-bot2: rebooted k8s node tools-k8s-worker-87 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus * 17:35 wm-bot2: rebooted k8s node tools-k8s-worker-88 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus * 17:34 wm-bot2: rebooting all the workers of tools k8s cluster (64 nodes) ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus * 17:20 wm-bot2: rebooted k8s node tools-k8s-worker-87 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus * 17:19 wm-bot2: rebooted k8s node tools-k8s-worker-88 ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus * 17:17 bd808: Rebuilding bullseye and buster docker containers to pick up openssh-client package addition ([[phab:T258841|T258841]]) * 17:12 wm-bot2: rebooting the whole tools k8s cluster (64 nodes) ([[phab:T316544|T316544]]) - cookbook ran by dcaro@vulcanus * 17:06 dcaro: rebooting tools-sgegrid-shadow ([[phab:T316544|T316544]]) * 17:00 dcaro: rebooting tools-sgegrid-master ([[phab:T316544|T316544]]) * 16:55 dcaro: rebooting tools-sgeexec-10-20 ([[phab:T316544|T316544]]) * 16:53 dcaro: rebooting tools-sgeweblight-10-18 ([[phab:T316544|T316544]]) * 16:53 dcaro: rebooting tools-sgeweblight-10-25 ([[phab:T316544|T316544]]) * 16:53 dcaro: rebooting tools-sgeweblight-10-20 ([[phab:T316544|T316544]]) * 16:52 dcaro: rebooting tools-sgeweblight-10-21 ([[phab:T316544|T316544]]) * 16:52 dcaro: rebooting tools-sgeexec-10-22 ([[phab:T316544|T316544]]) * 16:51 dcaro: rebooting tools-sgeweblight-10-28 ([[phab:T316544|T316544]]) * 16:50 dcaro: rebooting tools-sgeexec-10-17 ([[phab:T316544|T316544]]) * 16:48 dcaro: rebooting tools-sgeexec-10-21 ([[phab:T316544|T316544]]) * 16:47 dcaro: rebooting tools-sgeexec-10-19 ([[phab:T316544|T316544]]) * 16:45 dcaro: rebooting tools-sgeexec-10-8 ([[phab:T316544|T316544]]) * 16:45 dcaro: rebooting tools-sgeweblight-10-24 ([[phab:T316544|T316544]]) * 16:44 dcaro: rebooting tools-sgewebgen-10-2 ([[phab:T316544|T316544]]) * 16:44 dcaro: rebooting tools-sgeweblight-10-16 ([[phab:T316544|T316544]]) * 16:43 dcaro: rebooting tools-sgeweblight-10-30 ([[phab:T316544|T316544]]) * 16:43 dcaro: rebooting tools-sgeexec-10-18 ([[phab:T316544|T316544]]) * 16:42 dcaro: rebooting tools-sgeexec-10-16 ([[phab:T316544|T316544]]) * 16:42 dcaro: rebooting tools-sgeexec-10-14 ([[phab:T316544|T316544]]) * 16:41 dcaro: rebooting tools-sgeweblight-10-32 ([[phab:T316544|T316544]]) * 16:40 dcaro: rebooting tools-sgeweblight-10-22 ([[phab:T316544|T316544]]) * 16:39 dcaro: rebooting tools-sgeweblight-10-17 ([[phab:T316544|T316544]]) * 16:32 dcaro: rebooting tools-sgeexec-10-13.tools.eqiad1.wikimedia.cloud ([[phab:T316544|T316544]]) * 16:23 dcaro: rebooting tools-sgeweblight-10-26 ([[phab:T316544|T316544]]) * 16:15 bd808: Hard reboot of tools-sgebastion-11 via Horizon (done circa 16:11Z) * 16:14 arturo: rebooted a bunch of nodes to cleanup D procs and high load avg because NFS outage (result of [[phab:T316544|T316544]]) * 12:36 wm-bot2: build & push docker image docker-registry.tools.wmflabs.org/builds-api:09f3b49-dev from https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-builds-api.git ({{Gerrit|32a8ae9}}) - cookbook ran by dcaro@vulcanus * 09:12 wm-bot2: build & push docker image docker-registry.tools.wmflabs.org/volume-admission:c64da5a from https://gerrit.wikimedia.org/r/cloud/toolforge/volume-admission-controller ({{Gerrit|c64da5a}}) - cookbook ran by dcaro@vulcanus === 2023-05-13 === * 09:13 taavi: reboot tools-sgeexec-10-15,17,18,21 === 2023-05-11 === * 15:48 bd808: Rebooted tools-sgebastion-10 for [[phab:T336510|T336510]] * 15:31 bd808: Sent `wall` for reboot of tools-sgebastion-10 circa 15:40Z === 2023-05-09 === * 16:36 taavi: delegated beta.toolforge.org domain to toolsbeta per [[phab:T257386|T257386]] * 09:35 wm-bot2: deployed kubernetes component https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|ad4fa2a}}) - cookbook ran by taavi@runko === 2023-05-08 === * 09:12 arturo: force-reboot tools-sgeexec-10-13 (reported as down by the monitoring, no SSH) === 2023-05-07 === * 16:06 taavi: remove inbound 25/tcp rule from the toolserver legacy server [[phab:T136225|T136225]] === 2023-05-05 === * 22:21 bd808: Added "RepoLookoutBot" to hiera key "dynamicproxy::blocked_user_agent_regex" to stop unnecessary scans by https://www.repo-lookout.org/ * 22:20 bd808: Added * 11:30 wm-bot2: build & push docker image docker-registry.tools.wmflabs.org/toolforge-jobs-framework-api:811164e from https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|811164e}}) - cookbook ran by taavi@runko * 09:13 dcaro: rebooted tools-sgeexec-10-16 as it was stuck ([[phab:T335009|T335009]]) === 2023-05-04 === * 15:15 wm-bot2: removed instance tools-k8s-etcd-15 - cookbook ran by andrew@bullseye * 14:13 wm-bot2: removed instance tools-k8s-etcd-14 - cookbook ran by andrew@bullseye === 2023-05-03 === * 12:41 wm-bot2: removed instance tools-k8s-etcd-13 - cookbook ran by andrew@bullseye === 2023-05-02 === * 00:29 wm-bot2: deployed kubernetes component https://github.com/toolforge/buildpack-admission-controller ({{Gerrit|7199a9e}}) - cookbook ran by raymond@ubuntu === 2023-05-01 === * 23:17 wm-bot2: build & push docker image docker-registry.tools.wmflabs.org/toolforge-buildpack-admission-controller:3b3803f from https://github.com/toolforge/buildpack-admission-controller ({{Gerrit|3b3803f}}) - cookbook ran by raymond@ubuntu === 2023-04-28 === * 15:01 arturo: force reboot tools-k8s-worker-79, unresponsive * 08:27 dcaro: rebooting tools-sgeweblight-10-28 ([[phab:T335336|T335336]]) * 07:20 dcaro: rebooting tools-sgegrid-shadow due to stale nfs mount * 00:09 bd808: `kubectl uncordon tools-k8s-worker-67` ([[phab:T335543|T335543]]) * 00:07 bd808: Hard reboot tools-k8s-worker-67.tools.eqiad1.wikimedia.cloud via horizon ([[phab:T335543|T335543]]) * 00:04 bd808: Rebooting tools-k8s-worker-67.tools.eqiad1.wikimedia.cloud ([[phab:T335543|T335543]]) === 2023-04-27 === * 23:59 bd808: `kubectl drain --ignore-daemonsets --delete-emptydir-data --force tools-k8s-worker-67` ([[phab:T335543|T335543]]) * 20:50 bd808: Started process to rebuild all buster and bullseye based container images again. Prior problem seems to have been stale images in local cache on the build server. * 20:42 bd808: Container image rebuild failed with GPG errors in buster-sssd base image. Will investigate and attempt to restart once resolved in a local dev environment. * 20:33 bd808: Started process to rebuild all buster and bullseye based container images per https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Kubernetes#Building_toolforge_specific_images === 2023-04-18 === * 16:46 dcaro: force-rebooting tools-sgeweblight-10-25/26/27 as they got stuck stopping the grid_exec process * 16:35 dcaro: rebooting root@tools-sgeweblight-10-27 due to stuck exec daemon not releasing port 6445 * 16:35 dcaro: rebooting root@tools-sgeweblight-10-25 due to stuck exec daemon not releasing port 6445 * 16:32 dcaro: rebooting root@tools-sgeweblight-10-26 due to stuck exec daemon not releasing port 6445 * 16:26 dcaro: rebooting root@tools-sgeexec-10-14 due to stuck exec daemon not releasing port 6445 === 2023-04-17 === * 13:10 dcaro: rebooting tools-sgegrid-master node ([[phab:T334847|T334847]]) * 02:43 legoktm: manual restart of apache2 on toolserver-proxy-1 to completely pick up renewed TLS cert (alert was flapping) === 2023-04-11 === * 16:11 wm-bot2: deployed kubernetes component https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|b65439b}}) - cookbook ran by arturo@nostromo * 15:46 arturo: upload toolforge-jobs-framework-cli v11 to aptly * 14:17 wm-bot2: deployed kubernetes component https://gerrit.wikimedia.org/r/cloud/toolforge/volume-admission-controller.git ({{Gerrit|d878e49}}) ([[phab:T324834|T324834]]) - cookbook ran by dcaro@vulcanus * 13:19 wm-bot2: build & push docker image docker-registry.tools.wmflabs.org/toolforge-jobs-framework-api:c6c693c from https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|c6c693c}}) - cookbook ran by arturo@nostromo * 12:09 wm-bot2: build & push docker image docker-registry.tools.wmflabs.org/volume-admission:40bd3b3 from https://gerrit.wikimedia.org/r/cloud/toolforge/volume-admission-controller ({{Gerrit|40bd3b3}}) - cookbook ran by dcaro@vulcanus * 10:34 wm-bot2: deployed kubernetes component https://gitlab.wikimedia.org/repos/cloud/toolforge/ingress-nginx ({{Gerrit|9aed7e5}}) - cookbook ran by taavi@runko * 09:15 wm-bot2: deployed kubernetes component https://gitlab.wikimedia.org/repos/cloud/toolforge/calico ({{Gerrit|c6a3e29}}) ([[phab:T329677|T329677]]) - cookbook ran by taavi@runko * 08:45 wm-bot2: Adding a new k8s worker node - cookbook ran by taavi@runko === 2023-04-10 === * 10:46 taavi: patch existing PSP roles to use policy/v1beta1 [[phab:T331619|T331619]] * 09:16 arturo: upgrading k8s cluster to 1.22 ([[phab:T286856|T286856]]) === 2023-04-07 === * 14:34 wm-bot2: drained, depooled and removed k8s control node tools-k8s-control-3 ([[phab:T333929|T333929]]) - cookbook ran by taavi@runko * 14:30 wm-bot2: removed instance tools-k8s-control-2 - cookbook ran by taavi@runko === 2023-04-05 === * 15:16 wm-bot2: deployed kubernetes component https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|5ea5992}}) - cookbook ran by taavi@runko * 15:10 wm-bot2: build & push docker image docker-registry.tools.wmflabs.org/toolforge-jobs-framework-api:3569803 from https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|3569803}}) - cookbook ran by taavi@runko * 14:56 wm-bot2: Added a new k8s worker tools-k8s-worker-88.tools.eqiad1.wikimedia.cloud to the cluster ([[phab:T333972|T333972]]) - cookbook ran by taavi@runko * 14:42 wm-bot2: Adding a new k8s worker node ([[phab:T333972|T333972]]) - cookbook ran by taavi@runko * 14:42 wm-bot2: Added a new k8s worker tools-k8s-worker-87.tools.eqiad1.wikimedia.cloud to the cluster ([[phab:T333972|T333972]]) - cookbook ran by taavi@runko * 14:28 wm-bot2: Adding a new k8s worker node ([[phab:T333972|T333972]]) - cookbook ran by taavi@runko * 14:28 wm-bot2: Added a new k8s worker tools-k8s-worker-86.tools.eqiad1.wikimedia.cloud to the cluster ([[phab:T333972|T333972]]) - cookbook ran by taavi@runko * 14:15 wm-bot2: Adding a new k8s worker node ([[phab:T333972|T333972]]) - cookbook ran by taavi@runko * 14:15 wm-bot2: Added a new k8s worker tools-k8s-worker-85.tools.eqiad1.wikimedia.cloud to the cluster ([[phab:T333972|T333972]]) - cookbook ran by taavi@runko * 14:01 wm-bot2: Adding a new k8s worker node ([[phab:T333972|T333972]]) - cookbook ran by taavi@runko * 14:01 wm-bot2: Added a new k8s worker tools-k8s-worker-84.tools.eqiad1.wikimedia.cloud to the cluster ([[phab:T333972|T333972]]) - cookbook ran by taavi@runko * 13:47 wm-bot2: Adding a new k8s worker node ([[phab:T333972|T333972]]) - cookbook ran by taavi@runko * 13:47 wm-bot2: Added a new k8s worker tools-k8s-worker-83.tools.eqiad1.wikimedia.cloud to the cluster ([[phab:T333972|T333972]]) - cookbook ran by taavi@runko * 13:34 wm-bot2: Adding a new k8s worker node ([[phab:T333972|T333972]]) - cookbook ran by taavi@runko * 13:33 wm-bot2: removed instance tools-k8s-worker-83 - cookbook ran by taavi@runko * 13:15 wm-bot2: Adding a new k8s worker node ([[phab:T333972|T333972]]) - cookbook ran by taavi@runko * 13:06 wm-bot2: removing grid node tools-sgeweblight-10-31.tools.eqiad1.wikimedia.cloud ([[phab:T333972|T333972]]) - cookbook ran by taavi@runko * 13:02 wm-bot2: removing grid node tools-sgeweblight-10-29.tools.eqiad1.wikimedia.cloud ([[phab:T333972|T333972]]) - cookbook ran by taavi@runko * 13:00 wm-bot2: removing grid node tools-sgeexec-10-9.tools.eqiad1.wikimedia.cloud ([[phab:T333972|T333972]]) - cookbook ran by taavi@runko * 12:58 wm-bot2: removing grid node tools-sgeweblight-10-15.tools.eqiad1.wikimedia.cloud ([[phab:T333972|T333972]]) - cookbook ran by taavi@runko * 12:54 wm-bot2: removing grid node tools-sgeexec-10-7.tools.eqiad1.wikimedia.cloud ([[phab:T333972|T333972]]) - cookbook ran by taavi@runko * 12:52 wm-bot2: removing grid node tools-sgeweblight-10-13.tools.eqiad1.wikimedia.cloud ([[phab:T333972|T333972]]) - cookbook ran by taavi@runko * 12:34 wm-bot2: drained, depooled and removed k8s control node tools-k8s-control-1 - cookbook ran by taavi@runko * 12:07 wm-bot2: Added a new k8s control tools-k8s-control-6.tools.eqiad1.wikimedia.cloud to the cluster - cookbook ran by taavi@runko * 11:53 wm-bot2: Adding a new k8s control node - cookbook ran by taavi@runko * 11:51 wm-bot2: removed instance tools-k8s-control-6 - cookbook ran by taavi@runko * 11:39 wm-bot2: Adding a new k8s control node ([[phab:T333929|T333929]]) - cookbook ran by taavi@runko * 11:38 wm-bot2: removed instance tools-k8s-control-6 - cookbook ran by taavi@runko * 11:21 wm-bot2: Adding a new k8s control node ([[phab:T333929|T333929]]) - cookbook ran by taavi@runko * 11:21 wm-bot2: removed instance tools-k8s-control-6 - cookbook ran by taavi@runko * 11:09 wm-bot2: Adding a new k8s control node ([[phab:T333929|T333929]]) - cookbook ran by taavi@runko * 10:53 wm-bot2: removed instance tools-k8s-control-6 - cookbook ran by taavi@runko * 10:41 wm-bot2: Adding a new k8s control node ([[phab:T333929|T333929]]) - cookbook ran by taavi@runko * 10:41 wm-bot2: removed instance tools-k8s-control-6 - cookbook ran by taavi@runko * 10:16 wm-bot2: Adding a new k8s control node ([[phab:T333929|T333929]]) - cookbook ran by taavi@runko === 2023-04-04 === * 19:00 wm-bot2: Adding a new k8s control node ([[phab:T333929|T333929]]) - cookbook ran by taavi@runko * 18:59 wm-bot2: removed instance tools-k8s-control-5 - cookbook ran by taavi@runko * 18:46 wm-bot2: Adding a new k8s control node ([[phab:T333929|T333929]]) - cookbook ran by taavi@runko * 18:45 wm-bot2: Adding a new k8s CONTROL node ([[phab:T333929|T333929]]) - cookbook ran by taavi@runko * 10:15 wm-bot2: cleaned up grid queue errors on tools-sgegrid-master - cookbook ran by arturo@nostromo * 09:28 arturo: hard-reboot the 3 k8s control nodes === 2023-04-03 === * 17:13 wm-bot2: rebooted k8s node tools-k8s-worker-31 - cookbook ran by taavi@runko * 17:11 wm-bot2: rebooted k8s node tools-k8s-worker-32 - cookbook ran by taavi@runko * 17:09 wm-bot2: rebooted k8s node tools-k8s-worker-33 - cookbook ran by taavi@runko * 17:07 wm-bot2: rebooted k8s node tools-k8s-worker-34 - cookbook ran by taavi@runko * 17:05 wm-bot2: rebooted k8s node tools-k8s-worker-35 - cookbook ran by taavi@runko * 17:04 wm-bot2: rebooted k8s node tools-k8s-worker-36 - cookbook ran by taavi@runko * 17:02 wm-bot2: rebooted k8s node tools-k8s-worker-37 - cookbook ran by taavi@runko * 17:00 wm-bot2: rebooted k8s node tools-k8s-worker-38 - cookbook ran by taavi@runko * 16:58 wm-bot2: rebooted k8s node tools-k8s-worker-39 - cookbook ran by taavi@runko * 16:56 wm-bot2: rebooted k8s node tools-k8s-worker-40 - cookbook ran by taavi@runko * 16:55 wm-bot2: rebooted k8s node tools-k8s-worker-41 - cookbook ran by taavi@runko * 16:53 wm-bot2: rebooted k8s node tools-k8s-worker-42 - cookbook ran by taavi@runko * 16:51 wm-bot2: rebooted k8s node tools-k8s-worker-43 - cookbook ran by taavi@runko * 16:49 wm-bot2: rebooted k8s node tools-k8s-worker-44 - cookbook ran by taavi@runko * 16:45 wm-bot2: rebooted k8s node tools-k8s-worker-45 - cookbook ran by taavi@runko * 16:43 wm-bot2: rebooted k8s node tools-k8s-worker-46 - cookbook ran by taavi@runko * 16:41 wm-bot2: rebooted k8s node tools-k8s-worker-47 - cookbook ran by taavi@runko * 16:40 wm-bot2: rebooted k8s node tools-k8s-worker-48 - cookbook ran by taavi@runko * 16:38 wm-bot2: rebooted k8s node tools-k8s-worker-49 - cookbook ran by taavi@runko * 16:36 wm-bot2: rebooted k8s node tools-k8s-worker-50 - cookbook ran by taavi@runko * 16:35 wm-bot2: rebooted k8s node tools-k8s-worker-51 - cookbook ran by taavi@runko * 16:33 wm-bot2: rebooted k8s node tools-k8s-worker-52 - cookbook ran by taavi@runko * 16:31 wm-bot2: rebooted k8s node tools-k8s-worker-53 - cookbook ran by taavi@runko * 16:28 wm-bot2: rebooted k8s node tools-k8s-worker-54 - cookbook ran by taavi@runko * 16:27 wm-bot2: rebooted k8s node tools-k8s-worker-55 - cookbook ran by taavi@runko * 16:25 wm-bot2: rebooted k8s node tools-k8s-worker-56 - cookbook ran by taavi@runko * 16:23 wm-bot2: rebooted k8s node tools-k8s-worker-57 - cookbook ran by taavi@runko * 16:21 wm-bot2: rebooted k8s node tools-k8s-worker-58 - cookbook ran by taavi@runko * 16:20 wm-bot2: rebooted k8s node tools-k8s-worker-59 - cookbook ran by taavi@runko * 16:18 wm-bot2: rebooted k8s node tools-k8s-worker-60 - cookbook ran by taavi@runko * 16:09 wm-bot2: rebooted k8s node tools-k8s-worker-61 - cookbook ran by taavi@runko * 16:07 wm-bot2: rebooted k8s node tools-k8s-worker-62 - cookbook ran by taavi@runko * 16:01 wm-bot2: rebooted k8s node tools-k8s-worker-64 - cookbook ran by taavi@runko * 16:00 wm-bot2: rebooting the whole tools k8s cluster (58 nodes) - cookbook ran by taavi@runko * 15:58 wm-bot2: rebooted k8s node tools-k8s-worker-65 - cookbook ran by taavi@runko * 15:56 wm-bot2: rebooted k8s node tools-k8s-worker-66 - cookbook ran by taavi@runko * 15:48 wm-bot2: rebooted k8s node tools-k8s-worker-67 - cookbook ran by taavi@runko * 15:38 wm-bot2: rebooted k8s node tools-k8s-worker-68 - cookbook ran by taavi@runko * 15:36 wm-bot2: rebooted k8s node tools-k8s-worker-69 - cookbook ran by taavi@runko * 15:34 wm-bot2: rebooted k8s node tools-k8s-worker-70 - cookbook ran by taavi@runko * 15:32 wm-bot2: rebooted k8s node tools-k8s-worker-71 - cookbook ran by taavi@runko * 15:30 wm-bot2: rebooted k8s node tools-k8s-worker-72 - cookbook ran by taavi@runko * 15:28 wm-bot2: rebooted k8s node tools-k8s-worker-73 - cookbook ran by taavi@runko * 15:26 wm-bot2: rebooted k8s node tools-k8s-worker-74 - cookbook ran by taavi@runko * 15:24 wm-bot2: rebooted k8s node tools-k8s-worker-75 - cookbook ran by taavi@runko * 15:22 wm-bot2: rebooting the whole tools k8s cluster (58 nodes) - cookbook ran by taavi@runko * 15:17 wm-bot2: rebooted k8s node tools-k8s-worker-75 - cookbook ran by taavi@runko * 15:14 wm-bot2: rebooted k8s node tools-k8s-worker-76 - cookbook ran by taavi@runko * 15:12 wm-bot2: rebooted k8s node tools-k8s-worker-77 - cookbook ran by taavi@runko * 15:10 wm-bot2: rebooted k8s node tools-k8s-worker-78 - cookbook ran by taavi@runko * 15:08 wm-bot2: rebooted k8s node tools-k8s-worker-79 - cookbook ran by taavi@runko * 15:06 wm-bot2: rebooted k8s node tools-k8s-worker-80 - cookbook ran by taavi@runko * 14:59 wm-bot2: rebooted k8s node tools-k8s-worker-81 - cookbook ran by taavi@runko * 14:41 wm-bot2: rebooted k8s node tools-k8s-worker-82 - cookbook ran by taavi@runko * 14:38 wm-bot2: rebooting the whole tools k8s cluster (58 nodes) - cookbook ran by taavi@runko * 14:13 andrewbogott: test log to see if stashbot is back working * 13:19 andrewbogott: forcing puppet run on all toolforge VMs * 08:28 taavi: stop exim4.service on tools-sgecron-2 [[phab:T333477|T333477]] * 06:52 taavi: stop jobs-framework-emailer to prevent spam due to NFS being read-only [[phab:T333477|T333477]] === 2023-03-29 === * 16:07 wm-bot2: deployed kubernetes component https://gerrit.wikimedia.org/r/labs/tools/registry-admission-webhook ({{Gerrit|dc26f52}}) - cookbook ran by raymond@ubuntu * 15:21 wm-bot2: build & push docker image docker-registry.tools.wmflabs.org/registry-admission:24115c7 from https://gerrit.wikimedia.org/r/labs/tools/registry-admission-webhook ({{Gerrit|24115c7}}) - cookbook ran by raymond@ubuntu === 2023-03-28 === * 19:43 wm-bot2: deployed kubernetes component https://github.com/toolforge/buildpack-admission-controller ({{Gerrit|e1b9815}}) - cookbook ran by raymond@ubuntu === 2023-03-27 === * 22:51 wm-bot2: build & push docker image docker-registry.tools.wmflabs.org/toolforge-buildpack-admission-controller:70d550a from https://github.com/toolforge/buildpack-admission-controller ({{Gerrit|70d550a}}) - cookbook ran by raymond@ubuntu === 2023-03-26 === * 20:28 wm-bot2: cleaned up grid queue errors on tools-sgegrid-master - cookbook ran by taavi@runko === 2023-03-24 === * 14:13 wm-bot2: cleaned up grid queue errors on tools-sgegrid-master - cookbook ran by arturo@endurance === 2023-03-21 === * 08:11 wm-bot2: cleaned up grid queue errors on tools-sgegrid-master - cookbook ran by taavi@runko === 2023-03-20 === * 13:39 wm-bot2: cleaned up grid queue errors on tools-sgegrid-master - cookbook ran by taavi@runko * 10:57 wm-bot2: cleaned up grid queue errors on tools-sgegrid-master - cookbook ran by arturo@endurance === 2023-03-19 === * 09:32 wm-bot2: cleaned up grid queue errors on tools-sgegrid-master - cookbook ran by taavi@runko === 2023-03-17 === * 15:56 andrewbogott: truncating .out, .err, and .log files to 10MB in anticipation of moving the NFS volumes === 2023-03-13 === * 09:50 wm-bot2: build & push docker image docker-registry.tools.wmflabs.org/toolforge-buildpack-admission-controller:f90bd8f from https://github.com/toolforge/buildpack-admission-controller ({{Gerrit|f90bd8f}}) - cookbook ran by dcaro@vulcanus === 2023-03-12 === * 13:40 taavi: restart haproxy on tools-k8s-haproxy-3 === 2023-03-11 === * 18:38 wm-bot2: removing grid node tools-sgeexec-10-11.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko * 18:36 wm-bot2: removing grid node tools-sgeexec-10-11.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko * 18:34 wm-bot2: removing grid node tools-sgeexec-10-11.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko * 18:31 taavi: reboot misbehaving tools-sgeexec-10-11 === 2023-03-10 === * 16:36 wm-bot2: deployed kubernetes component https://gerrit.wikimedia.org/r/labs/tools/maintain-kubeusers ({{Gerrit|8b42b15}}) - cookbook ran by taavi@runko === 2023-03-09 === * 10:13 wm-bot2: deployed kubernetes component https://gerrit.wikimedia.org/r/labs/tools/maintain-kubeusers ({{Gerrit|53e7f81}}) - cookbook ran by taavi@runko * 10:04 wm-bot2: build & push docker image docker-registry.tools.wmflabs.org/maintain-kubeusers:834807c from https://gerrit.wikimedia.org/r/labs/tools/maintain-kubeusers ({{Gerrit|834807c}}) - cookbook ran by taavi@runko === 2023-03-08 === * 22:31 bd808: Live hacked user-maintainer clusterrole to work around breakage in [[phab:T331572|T331572]] === 2023-03-07 === * 11:34 wm-bot2: Increased quotas by 2 volumes - cookbook ran by fran@wmf3169 * 11:09 wm-bot2: Increased quotas by 6 snapshots - cookbook ran by fran@wmf3169 * 11:07 wm-bot2: Increased quotas by 4000 gigabytes - cookbook ran by fran@wmf3169 === 2023-03-06 === * 12:51 wm-bot2: deployed kubernetes component https://gerrit.wikimedia.org/r/labs/tools/registry-admission-webhook ({{Gerrit|6688477}}) - cookbook ran by taavi@runko * 12:33 wm-bot2: build & push docker image docker-registry.tools.wmflabs.org/registry-admission:e916fee from https://gerrit.wikimedia.org/r/labs/tools/registry-admission-webhook ({{Gerrit|e916fee}}) - cookbook ran by taavi@runko * 12:16 arturo: delete calico deployment, redeploy from https://gitlab.wikimedia.org/repos/cloud/toolforge/calico ([[phab:T328539|T328539]]) === 2023-03-05 === * 15:43 wm-bot2: deployed kubernetes component https://gerrit.wikimedia.org/r/labs/tools/maintain-kubeusers ({{Gerrit|3e04025}}) - cookbook ran by taavi@runko === 2023-03-02 === * 11:32 arturo: aborrero@tools-k8s-control-2:~$ sudo -i kubectl apply -f /etc/kubernetes/toolforge-tool-roles.yaml (https://gerrit.wikimedia.org/r/c/operations/puppet/+/889836) === 2023-03-01 === * 13:18 wm-bot2: deployed kubernetes component https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|13eda9d}}) - cookbook ran by taavi@runko === 2023-02-28 === * 17:19 wm-bot2: deployed kubernetes component https://gitlab.wikimedia.org/repos/cloud/toolforge/api-gateway ({{Gerrit|9252af7}}) - cookbook ran by taavi@runko * 17:04 wm-bot2: deployed kubernetes component https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|e46da83}}) - cookbook ran by taavi@runko === 2023-02-23 === * 18:07 wm-bot2: deployed kubernetes component https://gitlab.wikimedia.org/repos/cloud/toolforge/api-gateway ({{Gerrit|efb60b3}}) - cookbook ran by taavi@runko * 09:33 wm-bot2: build & push docker image docker-registry.tools.wmflabs.org/buildpack-admission:b34e2f8 from https://github.com/toolforge/buildpack-admission-controller.git ({{Gerrit|b34e2f8}}) - cookbook ran by taavi@runko === 2023-02-21 === * 09:37 arturo: hard-reboot tools-sgeexec-10-11 (unresponsive to ssh) === 2023-02-20 === * 11:24 taavi: redeploy volume-admission with helm and cert-manager certificates [[phab:T329530|T329530]] [[phab:T292238|T292238]] * 11:15 wm-bot2: build & push docker image docker-registry.tools.wmflabs.org/volume-admission:7fd13ac from https://gerrit.wikimedia.org/r/cloud/toolforge/volume-admission-controller ({{Gerrit|ede8bd0}}) - cookbook ran by taavi@runko * 11:05 wm-bot2: build & push docker image docker-registry.tools.wmflabs.org/toolforge-volume-admission-controller:7fd13ac from https://gerrit.wikimedia.org/r/cloud/toolforge/volume-admission-controller ({{Gerrit|7fd13ac}}) - cookbook ran by taavi@runko * 10:39 wm-bot2: Increased quotas by 4000 gigabytes - cookbook ran by fran@wmf3169 * 09:20 wm-bot2: cleaned up grid queue errors on tools-sgegrid-master - cookbook ran by arturo@nostromo === 2023-02-19 === * 09:16 taavi: uncordon tools-k8s-worker-[80-82] after fixing security groups [[phab:T329378|T329378]] === 2023-02-17 === * 11:32 wm-bot2: deployed kubernetes component https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|eeeea4c}}) - cookbook ran by arturo@endurance * 11:31 wm-bot2: deployed kubernetes component https://gitlab.wikimedia.org/repos/cloud/toolforge/image-config ({{Gerrit|7729b18}}) ([[phab:T254636|T254636]]) - cookbook ran by arturo@endurance * 11:26 wm-bot2: build & push docker image docker-registry.tools.wmflabs.org/toolforge-jobs-framework-api:8a9b97e from https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|eeeea4c}}) - cookbook ran by arturo@endurance * 11:24 wm-bot2: build & push docker image docker-registry.tools.wmflabs.org/toolforge-jobs-framework-api:8a9b97e from https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-framework-api ({{Gerrit|618ab29}}) - cookbook ran by arturo@endurance * 10:25 arturo: build and push mariadb-sssd/base docker image for Toolforge ([[phab:T320178|T320178]], [[phab:T254636|T254636]]) === 2023-02-16 === * 15:58 wm-bot2: Increased quotas by 4000 gigabytes - cookbook ran by fran@wmf3169 * 15:30 wm-bot2: deployed kubernetes component https://gitlab.wikimedia.org/repos/cloud/toolforge/cert-manager ({{Gerrit|d71994e}}) - cookbook ran by arturo@nostromo * 13:52 wm-bot2: deployed kubernetes component https://gerrit.wikimedia.org/r/cloud/toolforge/ingress-admission-controller ({{Gerrit|7191997}}) - cookbook ran by taavi@runko * 13:44 wm-bot2: build & push docker image docker-registry.tools.wmflabs.org/ingress-admission:1fe8ec4 from https://gerrit.wikimedia.org/r/cloud/toolforge/ingress-admission-controller ({{Gerrit|1fe8ec4}}) - cookbook ran by taavi@runko * 12:47 wm-bot2: build & push docker image docker-registry.tools.wmflabs.org/ingress-admission:e9b9920 from https://gerrit.wikimedia.org/r/cloud/toolforge/ingress-admission-controller ({{Gerrit|e9b9920}}) - cookbook ran by taavi@runko * 10:35 arturo: aborrero@tools-k8s-control-1:~$ sudo -i kubectl apply -f /etc/kubernetes/psp/base-pod-security-policies.yaml * 09:48 arturo: grid engine was failed over to shadow server, manually put it back into normal https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Grid#GridEngine_Master * 09:39 arturo: aborrero@tools-sgegrid-shadow:~$ sudo truncate -s 1G /var/log/syslog (was 17G, full root disk) === 2023-02-15 === * 18:03 taavi: deployed https://gerrit.wikimedia.org/r/c/operations/puppet/+/889585/ to increase amount of haproxy max connections * 15:19 wm-bot2: cleaned up grid queue errors on tools-sgegrid-master - cookbook ran by arturo@nostromo * 09:50 wm-bot2: deployed kubernetes component https://gitlab.wikimedia.org/repos/cloud/toolforge/cert-manager.git ({{Gerrit|e3f3ce1}}) ([[phab:T329453|T329453]]) - cookbook ran by taavi@runko * 09:30 wm-bot2: cleaned up grid queue errors on tools-sgegrid-master - cookbook ran by arturo@nostromo === 2023-02-14 === * 15:07 taavi: import cert-manager components to local docker registry [[phab:T329453|T329453]] * 12:12 arturo: the fixed webservicemonitor is starting a bunch of grid webservices ([[phab:T329611|T329611]]) * 12:10 arturo: included tools-manifests 0.25 in tools-buster aptly repo, deploying it now! ([[phab:T329611|T329611]], [[phab:T329467|T329467]], [[phab:T244809|T244809]]) === 2023-02-13 === * 16:05 wm-bot2: Increased quotas by 4000 gigabytes - cookbook ran by fran@wmf3169 * 16:03 taavi: update maintain-kubeusers deployment to use helm * 15:05 taavi: deploy jobs-api updates, improving some status messages * 15:04 wm-bot2: deployed kubernetes component https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|13d87c4}}) - cookbook ran by taavi@runko * 15:00 wm-bot2: build & push docker image docker-registry.tools.wmflabs.org/toolforge-jobs-framework-api:390ed64 from https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|390ed64}}) - cookbook ran by taavi@runko * 13:14 wm-bot2: build & push docker image docker-registry.tools.wmflabs.org/maintain-kubeusers:aac195b from https://gerrit.wikimedia.org/r/labs/tools/maintain-kubeusers ({{Gerrit|aac195b}}) - cookbook ran by taavi@runko === 2023-02-10 === * 15:45 taavi: reboot tools-k8s-worker-82 to troubleshoot network issues * 12:44 wm-bot2: Added a new k8s worker tools-k8s-worker-82.tools.eqiad1.wikimedia.cloud to the worker pool ([[phab:T329357|T329357]]) - cookbook ran by taavi@runko * 12:31 wm-bot2: Adding a new k8s worker node ([[phab:T329357|T329357]]) - cookbook ran by taavi@runko * 12:29 wm-bot2: Added a new k8s worker tools-k8s-worker-81.tools.eqiad1.wikimedia.cloud to the worker pool ([[phab:T329357|T329357]]) - cookbook ran by taavi@runko * 12:15 wm-bot2: Adding a new k8s worker node ([[phab:T329357|T329357]]) - cookbook ran by taavi@runko * 11:53 wm-bot2: Adding a new k8s worker node ([[phab:T329357|T329357]]) - cookbook ran by taavi@runko * 11:44 wm-bot2: removing grid node tools-sgeweblight-10-23.tools.eqiad1.wikimedia.cloud ([[phab:T329357|T329357]]) - cookbook ran by taavi@runko * 11:42 wm-bot2: removing grid node tools-sgeexec-10-5.tools.eqiad1.wikimedia.cloud ([[phab:T329357|T329357]]) - cookbook ran by taavi@runko * 11:39 wm-bot2: removing grid node tools-sgeweblight-10-19.tools.eqiad1.wikimedia.cloud ([[phab:T329357|T329357]]) - cookbook ran by taavi@runko * 11:26 wm-bot2: removing grid node tools-sgeweblight-10-12.tools.eqiad1.wikimedia.cloud ([[phab:T329357|T329357]]) - cookbook ran by taavi@runko * 11:24 wm-bot2: removing grid node tools-sgeexec-10-1.tools.eqiad1.wikimedia.cloud ([[phab:T329357|T329357]]) - cookbook ran by taavi@runko === 2023-02-01 === * 16:03 taavi: deployed tools-webservice 0.89 * 15:43 wm-bot2: deployed kubernetes component https://gitlab.wikimedia.org/repos/cloud/toolforge/image-config ({{Gerrit|372037f}}) - cookbook ran by taavi@runko === 2023-01-26 === * 15:05 taavi: drain and reboot tools-k8s-worker-74 which seems to have some issues with nfs * 14:37 wm-bot2: deployed kubernetes component https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|307f302}}) - cookbook ran by taavi@runko * 14:30 wm-bot2: build & push docker image docker-registry.tools.wmflabs.org/toolforge-jobs-framework-api:05966c6 from https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|05966c6}}) - cookbook ran by taavi@runko === 2023-01-24 === * 12:04 taavi: deploying toolforge-jobs-framework-cli v10 [[phab:T327775|T327775]] * 10:07 taavi: publish toolforge-jobs-framework-cli v9 === 2023-01-23 === * 11:31 wm-bot2: deployed kubernetes component https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|d5ae229}}) - cookbook ran by taavi@runko * 11:23 wm-bot2: build & push docker image docker-registry.tools.wmflabs.org/toolforge-jobs-framework-api:d085c50 from https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|d085c50}}) - cookbook ran by taavi@runko * 11:17 wm-bot2: deployed kubernetes component https://gitlab.wikimedia.org/repos/cloud/toolforge/image-config ({{Gerrit|864171a}}) - cookbook ran by taavi@runko === 2023-01-20 === * 23:24 andrewbogott: truncating logfiles with find . -name '*.err' -size +1G -exec truncate --size=100M <nowiki>{</nowiki><nowiki>}</nowiki> \; * 21:24 andrewbogott: truncating logfiles with find . -name '*.out' -size +1G -exec truncate --size=100M <nowiki>{</nowiki><nowiki>}</nowiki> \; * 01:06 andrewbogott: truncating logfiles with find . -name '*.log' -size +1G -exec truncate --size=100M <nowiki>{</nowiki><nowiki>}</nowiki> \; === 2023-01-19 === * 11:46 arturo: `aborrero@tools-k8s-control-1:~$ sudo -i kubectl delete clusterrolebinding jobs-api-psp` (cleanup unused stuff) === 2023-01-18 === * 15:42 wm-bot2: deployed kubernetes component https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|0ad4c66}}) - cookbook ran by arturo@nostromo * 15:29 wm-bot2: build & push docker image docker-registry.tools.wmflabs.org/toolforge-jobs-framework-api:54cc15e from https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|54cc15e}}) - cookbook ran by arturo@nostromo === 2023-01-17 === * 13:55 wm-bot2: deployed kubernetes component https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|8cf38a1}}) - cookbook ran by arturo@endurance * 13:51 wm-bot2: deployed kubernetes component https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|0d0a882}}) - cookbook ran by arturo@endurance * 13:34 wm-bot2: build & push docker image docker-registry.tools.wmflabs.org/toolforge-jobs-framework-api:3a58c1d from https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|3a58c1d}}) - cookbook ran by arturo@endurance === 2023-01-10 === * 11:55 wm-bot2: deployed kubernetes component https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|8e0a2f9}}) - cookbook ran by arturo@endurance * 11:52 wm-bot2: build & push docker image docker-registry.tools.wmflabs.org/toolforge-jobs-framework-api:9514b00 from https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|8e0a2f9}}) - cookbook ran by arturo@endurance * 11:36 wm-bot2: deployed kubernetes component https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|0243967}}) - cookbook ran by arturo@endurance === 2023-01-03 === * 17:17 andrewbogott: find -name '*.log' -size +1G -exec truncate --size=1G <nowiki>{</nowiki><nowiki>}</nowiki> \; === 2022-12-20 === * 09:07 wm-bot2: cleaned up grid queue errors on tools-sgegrid-master - cookbook ran by arturo@nostromo === 2022-12-12 === * 14:36 wm-bot2: cleaned up grid queue errors on tools-sgegrid-master - cookbook ran by dcaro@vulcanus === 2022-12-09 === * 07:20 taavi: change the canonical tools-mail external hostname to use mail.tools.wmcloud.org and add valid spf to toolforge.org [[phab:T324809|T324809]] === 2022-12-05 === * 11:06 wm-bot2: cleaned up grid queue errors on tools-sgegrid-master - cookbook ran by dcaro@vulcanus === 2022-11-30 === * 10:39 wm-bot2: deployed kubernetes component https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|bc3529d}}) - cookbook ran by arturo@nostromo * 10:17 wm-bot2: build & push docker image docker-registry.tools.wmflabs.org/toolforge-jobs-framework-api:c360d54 from https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|c360d54}}) - cookbook ran by arturo@nostromo === 2022-11-29 === * 19:52 taavi: clear puppet failure emails from exim queues === 2022-11-09 === * 08:58 wm-bot2: cleaned up grid queue errors on tools-sgegrid-master - cookbook ran by arturo@nostromo === 2022-11-05 === * 19:28 andrewbogott: cleaning up nfs share with root@labstore1004:/srv/tools/shared/tools# find -name '*.err' -size +1G -exec truncate --size=1G <nowiki>{</nowiki><nowiki>}</nowiki> \; * 13:26 andrewbogott: cleaning up nfs share with root@labstore1004:/srv/tools/shared/tools# find -name '*.log' -size +1G -exec truncate --size=1G <nowiki>{</nowiki><nowiki>}</nowiki> \; === 2022-11-04 === * 20:41 andrewbogott: cleaning up nfs share with root@labstore1004:/srv/tools/shared/tools# find -name '*.err' -not -newermt "Nov 1, 2021" -exec rm <nowiki>{</nowiki><nowiki>}</nowiki> \; * 14:02 andrewbogott: cleaning up nfs share with root@labstore1004:/srv/tools/shared/tools# find -name '*.log' -not -newermt "Nov 1, 2021" -exec rm <nowiki>{</nowiki><nowiki>}</nowiki> \; * 12:20 wm-bot2: deployed kubernetes component https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|d464be4}}) ([[phab:T304900|T304900]]) - cookbook ran by arturo@nostromo * 12:12 wm-bot2: build & push docker image docker-registry.tools.wmflabs.org/toolforge-jobs-framework-api:2b800f5 from https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|2b800f5}}) ([[phab:T304900|T304900]]) - cookbook ran by arturo@nostromo === 2022-11-01 === * 09:37 wm-bot2: cleaned up grid queue errors on tools-sgegrid-master ([[phab:T322110|T322110]]) - cookbook ran by dcaro@vulcanus === 2022-10-26 === * 08:45 dcaro: depooling and rebooting tools-sgeexec-10-22 to get nfs scratch working again === 2022-10-25 === * 16:14 wm-bot2: Increased quotas by 5120 gigabytes - cookbook ran by fran@wmf3169 * 15:26 dcaro: pushed a newer docker-registry.tools.wmflabs.org/python:3.9-slim-bullseye (from upstream pthyon:3.9-slim-bullseye) === 2022-10-20 === * 16:54 andrewbogott: rebooting tools-package-builder-04 * 16:49 andrewbogott: rebooting redis nodes (one at a time) * 10:54 taavi: rebuild mono68-sssd image with the expired DST Root CA X3 removed [[phab:T311466|T311466]] === 2022-10-18 === * 11:52 taavi: deploy toolforge-jobs-framework-cli deb v8 * 10:30 wm-bot2: deployed kubernetes component https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-emailer ({{Gerrit|64385e9}}) ([[phab:T320405|T320405]]) - cookbook ran by arturo@nostromo * 10:27 wm-bot2: build & push docker image docker-registry.tools.wmflabs.org/toolforge-jobs-framework-api:9be2272 from https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|9be2272}}) - cookbook ran by taavi@runko * 10:18 wm-bot2: build & push docker image docker-registry.tools.wmflabs.org/toolforge-jobs-framework-emailer:latest from https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-emailer ({{Gerrit|64385e9}}) ([[phab:T320405|T320405]]) - cookbook ran by arturo@nostromo === 2022-10-17 === * 07:25 taavi: push updated perl532 images [[phab:T320824|T320824]] === 2022-10-14 === * 07:54 wm-bot2: deployed kubernetes component https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|0cc020e}}) ([[phab:T311466|T311466]]) - cookbook ran by taavi@runko === 2022-10-13 === * 15:10 arturo: restart jobs-emailer pod === 2022-10-12 === * 23:25 bd808: Rebuilding all Toolforge docker images ([[phab:T278436|T278436]], [[phab:T311466|T311466]], [[phab:T293552|T293552]]) * 20:43 bd808: Rebuilding all Toolforge docker images to pick up bug and security fix packages. Third try seems to be working. ([[phab:T316554|T316554]]) * 20:31 bd808: Rebuilding all Toolforge docker images to pick up bug and security fix packages after fixing bug in building the bullseye base image. ([[phab:T316554|T316554]]) * 16:26 dcaro: deploy the latest registry admission webhook, now for real (image tag {{Gerrit|07bc7db}}) * 12:48 dcaro: deploy the latest registry admission webhook (image tag {{Gerrit|07bc7db}}) * 09:26 wm-bot2: cleaned up grid queue errors on tools-sgegrid-master - cookbook ran by dcaro@vulcanus * 09:19 wm-bot2: cleaned up grid queue errors on tools-sgegrid-master - cookbook ran by dcaro@vulcanus === 2022-10-11 === * 13:52 wm-bot2: build & push docker image docker-registry.tools.wmflabs.org/toolforge-jobs-framework-api:8574c36 from https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|8574c36}}) - cookbook ran by taavi@runko === 2022-10-10 === * 19:30 taavi: rebooting all k8s worker nodes to clean up labstore1006/7 remains * 16:51 taavi: clean up labstore1006/7 mounts from k8s control nodes [[phab:T320425|T320425]] * 11:35 arturo: aborrero@tools-k8s-control-1:~$ sudo -i kubectl -n jobs-emailer rollout restart deployment/jobs-emailer ([[phab:T317998|T317998]]) * 08:44 wm-bot2: deployed kubernetes component https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|afa90ed}}) ([[phab:T320284|T320284]]) - cookbook ran by taavi@runko * 08:39 wm-bot2: build & push docker image docker-registry.tools.wmflabs.org/toolforge-jobs-framework-api:latest from https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|afa90ed}}) - cookbook ran by taavi@runko === 2022-10-09 === * 17:29 taavi: kill 10 idle tmux sessions of user 'hoi' on tools-sgebastion-10 [[phab:T320352|T320352]] === 2022-10-07 === * 13:02 taavi: taavi@cloudcontrol1005 ~ $ sudo mark_tool --disable oncall # [[phab:T320240|T320240]] === 2022-10-06 === * 00:39 bd808: Image rebuild failing with debian apt repo signature issue. Will investigate tomorrow. ([[phab:T316554|T316554]]) * 00:36 bd808: Rebuilding all Toolforge docker images to pick up bug and security fix packages. ([[phab:T316554|T316554]]) * 00:04 bd808: Building new php74-sssd-base & web images ([[phab:T310435|T310435]]) === 2022-10-03 === * 14:36 wm-bot2: build & push docker image docker-registry.tools.wmflabs.org/volume-admission:latest from https://gerrit.wikimedia.org/r/cloud/toolforge/volume-admission-controller ({{Gerrit|8da432b}}) - cookbook ran by taavi@runko === 2022-09-28 === * 21:23 lucaswerkmeister: on tools-sgebastion-10: run-puppet-agent # [[phab:T318858|T318858]] * 21:22 lucaswerkmeister: on tools-sgebastion-10: apt remove emacs-common emacs-bin-common # fix package conflict, [[phab:T318858|T318858]] * 21:15 lucaswerkmeister: added root SSH key for myself, manually ran puppet on tools-sgebastion-10 to apply it (seemingly successfully) === 2022-09-22 === * 12:30 taavi: add TheresNoTime to the 'toollabs-trusted' gerrit group [[phab:T317438|T317438]] * 12:27 taavi: add TheresNoTime as a project admin and to the roots sudo policy [[phab:T317438|T317438]] === 2022-09-10 === * 07:39 wm-bot2: removing instance tools-prometheus-03 - cookbook ran by taavi@runko === 2022-09-07 === * 10:22 dcaro: Pushing the new toolforge builder image based on the new 0.8 buildpacks ([[phab:T316854|T316854]]) === 2022-09-06 === * 08:06 dcaro_away: Published new toolforge-bullseye0-run and toolforge-bullseye0-build images for the toolforge buildpack builder ([[phab:T316854|T316854]]) === 2022-08-25 === * 10:40 taavi: tagged new version of the python39-web container with a shell implementation of webservice-runner [[phab:T293552|T293552]] === 2022-08-24 === * 12:20 wm-bot2: deployed kubernetes component https://gitlab.wikimedia.org/repos/cloud/toolforge/ingress-nginx ({{Gerrit|eba66bc}}) - cookbook ran by taavi@runko * 12:20 taavi: upgrading ingress-nginx to v1.3 === 2022-08-20 === * 07:44 dcaro_away: all k8s nodes ready now \o/ ([[phab:T315718|T315718]]) * 07:43 dcaro_away: rebooted tools-k8s-control-2, seemed stuck trying to wait for tools home (nfs?), after reboot came back up ([[phab:T315718|T315718]]) * 07:41 dcaro_away: cloudvirt1023 down took out 3 workers, 1 control, and a grid exec and a weblight, they are taking long to restart, looking ([[phab:T315718|T315718]]) === 2022-08-18 === * 14:45 andrewbogott: adding lucaswerkmeister as projectadmin ([[phab:T314527|T314527]]) * 14:43 andrewbogott: removing some inactive projectadmins: rush, petrb, mdipietro, jeh, krenair === 2022-08-17 === * 16:34 taavi: kubectl sudo delete cm -n tool-wdml maintain-kubeusers # [[phab:T315459|T315459]] * 08:30 taavi: failing the grid from the shadow back to the master, some disruption expected === 2022-08-16 === * 17:28 taavi: fail over docker-registry, tools-docker-registry-06->docker-registry-05 === 2022-08-11 === * 16:57 wm-bot2: cleaned up grid queue errors on tools-sgegrid-master - cookbook ran by taavi@runko * 16:55 taavi: restart puppetdb on tools-puppetdb-1, crashed during the ceph issues === 2022-08-05 === * 15:08 wm-bot2: removing grid node tools-sgewebgen-10-1.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko * 15:05 wm-bot2: removing grid node tools-sgeexec-10-12.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko * 15:00 wm-bot2: created node tools-sgewebgen-10-3.tools.eqiad1.wikimedia.cloud and added it to the grid - cookbook ran by taavi@runko === 2022-08-03 === * 15:51 dhinus: recreated jobs-api pods to pick up new ConfigMap * 15:02 wm-bot2: deployed kubernetes component https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|c47ac41}}) - cookbook ran by fran@MacBook-Pro.station === 2022-07-20 === * 19:31 taavi: reboot toolserver-proxy-01 to free up disk space probably held by stale file handles * 08:06 wm-bot2: removing grid node tools-sgeexec-10-6.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko === 2022-07-19 === * 17:53 wm-bot2: created node tools-sgeexec-10-21.tools.eqiad1.wikimedia.cloud and added it to the grid - cookbook ran by taavi@runko * 17:00 wm-bot2: removing grid node tools-sgeexec-10-3.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko * 16:58 wm-bot2: removing grid node tools-sgeexec-10-4.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko * 16:24 wm-bot2: created node tools-sgeexec-10-20.tools.eqiad1.wikimedia.cloud and added it to the grid - cookbook ran by taavi@runko * 15:59 taavi: tag current maintain-kubernetes :beta image as: :latest === 2022-07-17 === * 15:52 wm-bot2: removing grid node tools-sgeexec-10-10.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko * 15:43 wm-bot2: removing grid node tools-sgeexec-10-2.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko * 13:26 wm-bot2: created node tools-sgeexec-10-16.tools.eqiad1.wikimedia.cloud and added it to the grid - cookbook ran by taavi@runko === 2022-07-14 === * 13:48 taavi: rebooting tools-sgeexec-10-2 === 2022-07-13 === * 12:09 wm-bot2: cleaned up grid queue errors on tools-sgegrid-master - cookbook ran by dcaro@vulcanus === 2022-07-11 === * 16:06 wm-bot2: Increased quotas by <nowiki>{</nowiki>self.increases<nowiki>}</nowiki> ([[phab:T312692|T312692]]) - cookbook ran by nskaggs@x1carbon === 2022-07-07 === * 07:34 wm-bot2: cleaned up grid queue errors on tools-sgegrid-master - cookbook ran by dcaro@vulcanus === 2022-06-28 === * 17:34 wm-bot2: cleaned up grid queue errors on tools-sgegrid-master ([[phab:T311538|T311538]]) - cookbook ran by dcaro@vulcanus * 15:51 taavi: add 4096G cinder quota [[phab:T311509|T311509]] === 2022-06-27 === * 18:14 taavi: restart calico, appears to have got stuck after the ca replacement operation * 18:02 taavi: switchover active cron server to tools-sgecron-2 [[phab:T284767|T284767]] * 17:54 wm-bot2: removing grid node tools-sgewebgrid-lighttpd-0915.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko * 17:52 wm-bot2: removing grid node tools-sgewebgrid-generic-0902.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko * 17:49 wm-bot2: removing grid node tools-sgeexec-0942.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko * 17:15 taavi: [[phab:T311412|T311412]] updating ca used by k8s-apiserver->etcd communication, breakage may happen * 14:58 taavi: renew puppet ca cert and certificate for tools-puppetmaster-02 [[phab:T311412|T311412]] * 14:50 taavi: backup /var/lib/puppet/server to /root/puppet-ca-backup-2022-06-27.tar.gz on tools-puppetmaster-02 before we do anything else to it [[phab:T311412|T311412]] === 2022-06-23 === * 17:51 wm-bot2: removing grid node tools-sgeexec-0941.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko * 17:49 wm-bot2: removing grid node tools-sgewebgrid-lighttpd-0916.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko * 17:46 wm-bot2: removing grid node tools-sgewebgrid-generic-0901.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko * 17:32 wm-bot2: removing grid node tools-sgeexec-0939.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko * 17:30 wm-bot2: removing grid node tools-sgeexec-0938.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko * 17:27 wm-bot2: removing grid node tools-sgeexec-0937.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko * 17:22 wm-bot2: removing grid node tools-sgeexec-0936.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko * 17:19 wm-bot2: removing grid node tools-sgeexec-0935.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko * 17:17 wm-bot2: removing grid node tools-sgeexec-0934.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko * 17:14 wm-bot2: removing grid node tools-sgeexec-0933.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko * 17:11 wm-bot2: removing grid node tools-sgeexec-0932.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko * 17:09 wm-bot2: removing grid node tools-sgeexec-0920.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko * 15:30 wm-bot2: removing grid node tools-sgeexec-0947.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko * 13:59 taavi: removing remaining continuous jobs from the stretch grid [[phab:T277653|T277653]] === 2022-06-22 === * 15:54 wm-bot2: removing grid node tools-sgewebgrid-lighttpd-0917.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko * 15:51 wm-bot2: removing grid node tools-sgewebgrid-lighttpd-0918.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko * 15:47 wm-bot2: removing grid node tools-sgewebgrid-lighttpd-0919.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko * 15:45 wm-bot2: removing grid node tools-sgewebgrid-lighttpd-0920.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko === 2022-06-21 === * 15:23 wm-bot2: removing grid node tools-sgewebgrid-lighttpd-0914.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko * 15:20 wm-bot2: removing grid node tools-sgewebgrid-lighttpd-0914.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko * 15:18 wm-bot2: removing grid node tools-sgewebgrid-lighttpd-0913.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko * 15:07 wm-bot2: removing grid node tools-sgewebgrid-lighttpd-0912.tools.eqiad1.wikimedia.cloud - cookbook ran by taavi@runko === 2022-06-03 === * 20:07 wm-bot2: created node tools-sgeweblight-10-26.tools.eqiad1.wikimedia.cloud and added it to the grid - cookbook ran by andrew@buster * 19:51 balloons: Scaling webservice nodes to 20, using new 8G swap flavor [[phab:T309821|T309821]] * 19:35 wm-bot2: created node tools-sgeweblight-10-25.tools.eqiad1.wikimedia.cloud and added it to the grid - cookbook ran by andrew@buster * 19:03 wm-bot2: created node tools-sgeweblight-10-20.tools.eqiad1.wikimedia.cloud and added it to the grid - cookbook ran by andrew@buster * 19:01 wm-bot2: created node tools-sgeweblight-10-19.tools.eqiad1.wikimedia.cloud and added it to the grid - cookbook ran by andrew@buster * 19:00 balloons: depooled old nodes, bringing entirely new grid of nodes online [[phab:T309821|T309821]] * 18:22 wm-bot2: created node tools-sgeweblight-10-17.tools.eqiad1.wikimedia.cloud and added it to the grid - cookbook ran by andrew@buster * 17:54 wm-bot2: created node tools-sgeweblight-10-16.tools.eqiad1.wikimedia.cloud and added it to the grid - cookbook ran by andrew@buster * 17:52 wm-bot2: created node tools-sgeweblight-10-15.tools.eqiad1.wikimedia.cloud and added it to the grid - cookbook ran by andrew@buster * 16:59 andrewbogott: building a bunch of new lighttpd nodes (beginning with tools-sgeweblight-10-12) using a flavor with more swap space * 16:56 wm-bot2: created node tools-sgeweblight-10-12.tools.eqiad1.wikimedia.cloud and added it to the grid - cookbook ran by andrew@buster * 15:50 balloons: fix fix g3.cores4.ram8.disk20.swap24.ephem20 flavor to include swap. Convert to fix g3.cores4.ram8.disk20.swap8.ephem20 flavor [[phab:T309821|T309821]] * 15:50 balloons: temp add 1.0G swap to sgeweblight hosts [[phab:T309821|T309821]] * 15:50 balloons: fix fix g3.cores4.ram8.disk20.swap24.ephem20 flavor to include swap. Convert to fix g3.cores4.ram8.disk20.swap8.ephem20 flavor t309821 * 15:49 balloons: temp add 1.0G swap to sgeweblight hosts t309821 * 13:25 bd808: Upgrading fleet to tools-webservice 0.86 ([[phab:T309821|T309821]]) * 13:20 bd808: publish tools-webservice 0.86 ([[phab:T309821|T309821]]) * 12:46 taavi: start webservicemonitor on tools-sgecron-01 [[phab:T309821|T309821]] * 10:36 taavi: draining each sgeweblight node one by one, and removing the jobs stuck in 'deleting' too * 05:05 taavi: removing duplicate (there should be only one per tool) web service jobs from the grid [[phab:T309821|T309821]] * 04:52 taavi: revert bd808's changes to profile::toolforge::active_proxy_host * 03:21 bd808: Cleared queue error states after deploying new toolforge-webservice package ([[phab:T309821|T309821]]) * 03:10 bd808: publish tools-webservice 0.85 with hack for [[phab:T309821|T309821]] === 2022-06-02 === * 22:26 bd808: Rebooting tools-sgeweblight-10-1.tools.eqiad1.wikimedia.cloud. Node is full of jobs that are not tracked by grid master and failing to spawn new jobs sent by the scheduler * 21:56 bd808: Removed legacy "active_proxy_host" hiera setting * 21:55 bd808: Updated hiera to use fqdn of 'tools-proxy-06.tools.eqiad1.wikimedia.cloud' for profile::toolforge::active_proxy_host key * 21:41 bd808: Updated hiera to use fqdn of 'tools-proxy-06.tools.eqiad1.wikimedia.cloud' for active_redis key * 21:23 wm-bot2: created node tools-sgeweblight-10-8.tools.eqiad1.wikimedia.cloud and added it to the grid - cookbook ran by taavi@runko * 12:42 wm-bot2: rebooting stretch exec grid workers - cookbook ran by taavi@runko * 12:13 wm-bot2: created node tools-sgeweblight-10-7.tools.eqiad1.wikimedia.cloud and added it to the grid - cookbook ran by taavi@runko * 12:03 dcaro: refresh prometheus certs ([[phab:T308402|T308402]]) * 11:47 dcaro: refresh registry-admission-controller certs ([[phab:T308402|T308402]]) * 11:42 dcaro: refresh ingress-admission-controller certs ([[phab:T308402|T308402]]) * 11:36 dcaro: refresh volume-admission-controller certs ([[phab:T308402|T308402]]) * 11:24 wm-bot2: created node tools-sgeweblight-10-6.tools.eqiad1.wikimedia.cloud and added it to the grid - cookbook ran by taavi@runko * 11:17 taavi: publish jobutils 1.44 that updates the grid default from stretch to buster [[phab:T277653|T277653]] * 10:16 taavi: publish tools-webservice 0.84 that updates the grid default from stretch to buster [[phab:T277653|T277653]] * 09:54 wm-bot2: created node tools-sgeexec-10-14.tools.eqiad1.wikimedia.cloud and added it to the grid - cookbook ran by taavi@runko === 2022-06-01 === * 11:18 taavi: depool and remove tools-sgeexec-09[07-14] === 2022-05-31 === * 16:51 taavi: delete tools-sgeexec-0904 for [[phab:T309525|T309525]] experimentation === 2022-05-30 === * 08:24 taavi: depool tools-sgeexec-[0901-0909] (7 nodes total) [[phab:T277653|T277653]] === 2022-05-26 === * 15:39 wm-bot2: deployed kubernetes component https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|e6fa299}}) ([[phab:T309146|T309146]]) - cookbook ran by taavi@runko === 2022-05-22 === * 17:04 taavi: failover tools-redis to the updated cluster [[phab:T278541|T278541]] * 16:42 wm-bot2: removing grid node tools-sgeexec-0940.tools.eqiad1.wikimedia.cloud ([[phab:T308982|T308982]]) - cookbook ran by taavi@runko === 2022-05-16 === * 14:02 wm-bot2: deployed kubernetes component https://gitlab.wikimedia.org/repos/cloud/toolforge/ingress-nginx ({{Gerrit|7037eca}}) - cookbook ran by taavi@runko === 2022-05-14 === * 10:47 taavi: hard reboot unresponsible tools-sgeexec-0940 === 2022-05-12 === * 12:36 taavi: re-enable CronJobControllerV2 [[phab:T308205|T308205]] * 09:28 taavi: deploy jobs-api update [[phab:T308204|T308204]] * 09:15 wm-bot2: build & push docker image docker-registry.tools.wmflabs.org/toolforge-jobs-framework-api:latest from https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|e6fa299}}) ([[phab:T308204|T308204]]) - cookbook ran by taavi@runko === 2022-05-10 === * 15:18 taavi: depool tools-k8s-worker-42 for experiments * 13:54 taavi: enable distro-wikimedia unattended upgrades [[phab:T290494|T290494]] === 2022-05-06 === * 19:46 bd808: Rebuilt toolforge-perl532-sssd-base & toolforge-perl532-sssd-web to add liblocale-codes-perl ([[phab:T307812|T307812]]) === 2022-05-05 === * 17:28 taavi: deploy tools-webservice 0.83 [[phab:T307693|T307693]] === 2022-05-03 === * 08:20 taavi: redis: start replication from the old cluster to the new one ([[phab:T278541|T278541]]) === 2022-05-02 === * 08:54 taavi: restart acme-chief.service [[phab:T307333|T307333]] === 2022-04-25 === * 14:56 bd808: Rebuilding all docker images to pick up toolforge-webservice v0.82 ([[phab:T214343|T214343]]) * 14:46 bd808: Building toolforge-webservice v0.82 === 2022-04-23 === * 16:51 bd808: Built new perl532-sssd/<nowiki>{</nowiki>base,web<nowiki>}</nowiki> images and pushed to registry ([[phab:T214343|T214343]]) === 2022-04-20 === * 16:58 taavi: reboot toolserver-proxy-01 to free up disk space from stale file handles(?) * 07:51 wm-bot: build & push docker image docker-registry.tools.wmflabs.org/toolforge-jobs-framework-api:latest from https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|8f37a04}}) - cookbook ran by taavi@runko === 2022-04-16 === * 18:53 wm-bot: deployed kubernetes component https://gitlab.wikimedia.org/repos/cloud/toolforge/kubernetes-metrics ({{Gerrit|2c485e9}}) - cookbook ran by taavi@runko === 2022-04-12 === * 21:32 bd808: Added komla to Gerrit group 'toollabs-trusted' ([[phab:T305986|T305986]]) * 21:27 bd808: Added komla to 'roots' sudoers policy ([[phab:T305986|T305986]]) * 21:24 bd808: Add komla as projectadmin ([[phab:T305986|T305986]]) === 2022-04-10 === * 18:43 taavi: deleted `/tmp/dwl02.out-20210915` on tools-sgebastion-07 (not touched since september, taking up 1.3G of disk space) === 2022-04-09 === * 15:30 taavi: manually prune user.log on tools-prometheus-03 to free up some space on / === 2022-04-08 === * 10:44 arturo: disabled debug mode on the k8s jobs-emailer component === 2022-04-05 === * 07:52 wm-bot: deployed kubernetes component https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|d7d3463}}) - cookbook ran by arturo@nostromo * 07:44 wm-bot: build & push docker image docker-registry.tools.wmflabs.org/toolforge-jobs-framework-api:latest from https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|d7d3463}}) - cookbook ran by arturo@nostromo * 07:21 arturo: deploying toolforge-jobs-framework-cli v7 === 2022-04-04 === * 17:05 wm-bot: deployed kubernetes component https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|cbcfc47}}) - cookbook ran by arturo@nostromo * 16:56 wm-bot: build & push docker image docker-registry.tools.wmflabs.org/toolforge-jobs-framework-api:latest from https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api ({{Gerrit|cbcfc47}}) - cookbook ran by arturo@nostromo * 09:28 arturo: deployed toolforge-jobs-framework-cli v6 into aptly and installed it on buster bastions === 2022-03-28 === * 09:32 wm-bot: cleaned up grid queue errors on tools-sgegrid-master.tools.eqiad1.wikimedia.cloud ([[phab:T304816|T304816]]) - cookbook ran by arturo@nostromo === 2022-03-15 === * 16:57 wm-bot: build & push docker image docker-registry.tools.wmflabs.org/toolforge-jobs-framework-emailer:latest from https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-emailer ({{Gerrit|084ee51}}) - cookbook ran by arturo@nostromo * 11:24 arturo: cleared error state on queue continuous@tools-sgeexec-0939.tools.eqiad.wmflabs (a job took a very long time to be scheduled...) === 2022-03-14 === * 11:44 arturo: deploy jobs-framework-emailer {{Gerrit|9470a5f339fd5a44c97c69ce97239aef30f5ee41}} ([[phab:T286135|T286135]]) * 10:48 dcaro: pushed v0.33.2 tekton control and webhook images, and bashA5.1.4 to the local repo ([[phab:T297090|T297090]]) === 2022-03-10 === * 09:42 arturo: cleaned grid queue error state @ tools-sgewebgrid-generic-0902 === 2022-03-01 === * 13:41 dcaro: rebooting tools-sgeexec-0916 to clear any state ([[phab:T302702|T302702]]) * 12:11 dcaro: Cleared error state queues for sgeexec-0916 ([[phab:T302702|T302702]]) * 10:23 arturo: tools-sgeeex-0913/0916 are depooled, queue errors. Reboot them and clean errors by hand === 2022-02-28 === * 08:02 taavi: reboot sgeexec-0916 * 07:49 taavi: depool tools-sgeexec-0916.tools as it is out of disk space on / === 2022-02-17 === * 08:23 taavi: deleted tools-clushmaster-02 * 08:14 taavi: made tools-puppetmaster-02 its own client to fix `puppet node deactivate` puppetdb access === 2022-02-16 === * 00:12 bd808: Image builds completed. === 2022-02-15 === * 23:17 bd808: Image builds failed in buster php image with an apt error. The error looks transient, so starting builds over. * 23:06 bd808: Started full rebuild of Toolforge containers to pick up webservice 0.81 and other package updates in tmux session on tools-docker-imagebuilder-01 * 22:58 bd808: `sudo apt-get update && sudo apt-get install toolforge-webservice` on all bastions to pick up 0.81 * 22:50 bd808: Built new toollabs-webservice 0.81 * 18:43 bd808: Enabled puppet on tools-proxy-05 * 18:38 bd808: Disabled puppet on tools-proxy-05 for manual testing of nginx config changes * 18:21 taavi: delete tools-package-builder-03 * 11:49 arturo: invalidate sssd cache in all bastions to debug [[phab:T301736|T301736]] * 11:16 arturo: purge debian package `unscd` on tools-sgebastion-10/11 for [[phab:T301736|T301736]] * 11:15 arturo: reboot tools-sgebastion-10 for [[phab:T301736|T301736]] === 2022-02-10 === * 15:07 taavi: shutdown tools-clushmaster-02 [[phab:T298191|T298191]] * 13:25 wm-bot: trying to join node tools-sgewebgen-10-2 to the grid cluster in tools. - cookbook ran by arturo@nostromo * 13:24 wm-bot: trying to join node tools-sgewebgen-10-1 to the grid cluster in tools. - cookbook ran by arturo@nostromo * 13:07 wm-bot: trying to join node tools-sgeweblight-10-5 to the grid cluster in tools. - cookbook ran by arturo@nostromo * 13:06 wm-bot: trying to join node tools-sgeweblight-10-4 to the grid cluster in tools. - cookbook ran by arturo@nostromo * 13:05 wm-bot: trying to join node tools-sgeweblight-10-3 to the grid cluster in tools. - cookbook ran by arturo@nostromo * 13:03 wm-bot: trying to join node tools-sgeweblight-10-2 to the grid cluster in tools. - cookbook ran by arturo@nostromo * 12:54 wm-bot: trying to join node tools-sgeweblight-10-1.tools.eqiad1.wikimedia.cloud to the grid cluster in tools. - cookbook ran by arturo@nostromo * 08:45 taavi: set `profile::base::manage_ssh_keys: true` globally [[phab:T214427|T214427]] * 08:16 taavi: enable puppetdb and re-enable puppet with puppetdb ssh key management disabled (profile::base::manage_ssh_keys: false) - [[phab:T214427|T214427]] * 08:06 taavi: disable puppet globally for enabling puppetdb [[phab:T214427|T214427]] === 2022-02-09 === * 19:29 taavi: installed tools-puppetdb-1, not configured on puppetmaster side yet [[phab:T214427|T214427]] * 18:56 wm-bot: pooled 10 grid nodes tools-sgeweblight-10-[1-5],tools-sgewebgen-10-[1,2],tools-sgeexec-10-[1-10] ([[phab:T277653|T277653]]) - cookbook ran by arturo@nostromo * 18:30 wm-bot: pooled 9 grid nodes tools-sgeexec-10-[2-10],tools-sgewebgen-[3,15] - cookbook ran by arturo@nostromo * 18:25 arturo: ignore last message * 18:24 wm-bot: pooled 9 grid nodes tools-sgeexec-10-[2-10],tools-sgewebgen-[3,15] - cookbook ran by arturo@nostromo * 14:04 taavi: created tools-cumin-1/toolsbeta-cumin-1 [[phab:T298191|T298191]] === 2022-02-07 === * 17:37 taavi: generated authdns_acmechief ssh key and stored password in a text file in local labs/private repository ([[phab:T288406|T288406]]) * 12:52 taavi: updated maintain-kubeusers for [[phab:T301081|T301081]] === 2022-02-04 === * 22:33 taavi: `root@tools-sgebastion-10:/data/project/ru_monuments/.kube# mv config old_config` # experimenting with [[phab:T301015|T301015]] * 21:36 taavi: clear error state from some webgrid nodes === 2022-02-03 === * 09:06 taavi: run `sudo apt-get clean` on login-buster/dev-buster to clean up disk space * 08:01 taavi: restart acme-chief to force renewal of toolserver.org certificate === 2022-01-30 === * 14:41 taavi: created a neutron port with ip 172.16.2.46 for a service ip for toolforge redis automatic failover [[phab:T278541|T278541]] * 14:22 taavi: creating a cluster of 3 bullseye redis hosts for [[phab:T278541|T278541]] === 2022-01-26 === * 18:33 wm-bot: depooled grid node tools-sgeexec-10-10 - cookbook ran by arturo@nostromo * 18:33 wm-bot: depooled grid node tools-sgeexec-10-9 - cookbook ran by arturo@nostromo * 18:33 wm-bot: depooled grid node tools-sgeexec-10-8 - cookbook ran by arturo@nostromo * 18:32 wm-bot: depooled grid node tools-sgeexec-10-7 - cookbook ran by arturo@nostromo * 18:32 wm-bot: depooled grid node tools-sgeexec-10-6 - cookbook ran by arturo@nostromo * 18:31 wm-bot: depooled grid node tools-sgeexec-10-5 - cookbook ran by arturo@nostromo * 18:30 wm-bot: depooled grid node tools-sgeexec-10-4 - cookbook ran by arturo@nostromo * 18:28 wm-bot: depooled grid node tools-sgeexec-10-3 - cookbook ran by arturo@nostromo * 18:27 wm-bot: depooled grid node tools-sgeexec-10-2 - cookbook ran by arturo@nostromo * 18:27 wm-bot: depooled grid node tools-sgeexec-10-1 - cookbook ran by arturo@nostromo * 13:55 arturo: scaling up the buster web grid with 5 lighttd and 2 generic nodes ([[phab:T277653|T277653]]) === 2022-01-25 === * 11:50 wm-bot: reconfiguring the grid by using grid-configurator - cookbook ran by arturo@nostromo * 11:44 arturo: rebooting buster exec nodes * 08:34 taavi: sign puppet certificate for tools-sgeexec-10-4 === 2022-01-24 === * 17:44 wm-bot: reconfiguring the grid by using grid-configurator - cookbook ran by arturo@nostromo * 15:23 arturo: scaling up the grid with 10 buster exec nodes ([[phab:T277653|T277653]]) === 2022-01-20 === * 17:05 arturo: drop 9 of the 10 buster exec nodes created earlier. They didn't get DNS records * 12:56 arturo: scaling up the grid with 10 buster exec nodes ([[phab:T277653|T277653]]) === 2022-01-19 === * 17:34 andrewbogott: rebooting tools-sgeexec-0913.tools.eqiad1.wikimedia.cloud to recover from (presumed) fallout from the scratch/nfs move === 2022-01-14 === * 19:09 taavi: set /var/run/lighttpd as world-writable on all lighttpd webgrid nodes, [[phab:T299243|T299243]] === 2022-01-12 === * 11:27 arturo: created puppet prefix `tools-sgeweblight`, drop `tools-sgeweblig` * 11:03 arturo: created puppet prefix 'tools-sgeweblig' * 11:02 arturo: created puppet prefix 'toolsbeta-sgeweblig' === 2022-01-04 === * 17:18 bd808: tools-acme-chief-01: sudo service acme-chief restart * 08:12 taavi: disable puppet & exim4 on [[phab:T298501|T298501]] </noinclude> <noinclude>[[Category:SAL]]</noinclude> k17pjmh3isajsi7gcve0dbus9iwuzt3